This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Non allocatable reset of NXP LPC2368 using RTX when modifiing variables

Hello everyone,

today I'm asking for hints on a tricky problem. We have a firmware that uses RTX-Kernel running on a NXP LPC2368. Now the device that the firmware is written for should get a new lc display.
My honest mission is to change the firmware in order to use the new display.

I've spent some weeks this year to do so and some time I've had the problem that the controller resets short time after start and again and again...

Everytime this behaviour occured I have deleted one or more obsolete variables (mostly global) or functions. In most cases I solved the problem by searching other obsolete variables and deleting them from source code - try and error. That is really time-killing.

While testing the firmware on wednesday, I tried to make the adopted and modified routine for writing data to display RAM a little faster. I moved an global unsigned int to the function and changed it to static unsigned char because the value it has to carry is 0x0D at a maximum.

After flashing the firmware in the controller, the controller hung at a random short time.

Yesterday I was trying to solve the problem with hanging firmware on random time and found the problem when no task is running: OS calls os_idle_demon() and was not able to return from it. I found a solution in world wide web: Creating an empty low priority task without using any os_wait functions that prevents the OS from calling the idle task. (It has something to do with incorrect interrupt states on retunring from idle task.)

Today I further tried to make the display writing function faster and changed two unsigned char inside the function from static to non-static. After flashing this firmware the controller resets again and again. I will now try to find out why the controller behaves this way.

What I found out is, that no watchdog is enabled by user (is it part of the OS?). The os_stk_overflow an os_idle_demon are not called from OS. I debug the firmware using ULINK2.

Any ideas where to search the problem for?

Best regards

Parents

0 Robert Suess over 14 years ago in reply to Tamiryan Michael

Thank you very much Per and Marc again for all your answers to make me moving in the right direction.

Actually I read 'The Insider's Guide To The NXP LPC 2300/2400 Based Microcontrollers'. I have the feeling that I should have done this earlier, but I did not know anything of the existence of such a document.

I will report here if I can make the reset happen again and I will try to find the handler that leads to the reset.

Best regards
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Robert Suess over 14 years ago in reply to Tamiryan Michael

Thank you very much Per and Marc again for all your answers to make me moving in the right direction.

Actually I read 'The Insider's Guide To The NXP LPC 2300/2400 Based Microcontrollers'. I have the feeling that I should have done this earlier, but I did not know anything of the existence of such a document.

I will report here if I can make the reset happen again and I will try to find the handler that leads to the reset.

Best regards
Cancel
Vote up 0 Vote down

Cancel

Children

0 Robert Suess over 14 years ago in reply to Robert Suess

Good morning everyone.

Things appear a little bit clearer to me now. I am still reading the guide. If I understand all your latest posts right and review the startup code of our firmware, I realize that in our firmware all protection exceptions (Undef, PAbt, DAbt) lead to a reset of the device.

That means if any of these exceptions occurs, the firmware forces a reset. Again and again, if the exceptions source is an error in the source code of our firmware. Because of this, I am not able to find any error if the firmware resets on every exception...

Am I right so far?
Cancel
Vote up 0 Vote down

Cancel
0 Robert Suess over 14 years ago in reply to Robert Suess
And additionally: The included RealTime-Agent is not able to work like it should, because of this line:

DAbt_Addr DCD Reset_Handler ;DAbt_Handler

Right?
Cancel
Vote up 0 Vote down

Cancel
0 Robert Suess over 14 years ago in reply to Robert Suess
I changed back the handlers like Marc suggested, since I know now what this part of the startup file is doing.

I left the DAbt_Handler unchanged for using the RealTime-Agent.

A question regarding the DAbt_Handler: In the following code sequence, what DAbt_Handler would be jumped to in case of a data abort exception?

IMPORT SWI_Handler EXTERN DAbt_Handler Undef_Handler B Undef_Handler ;SWI_Handler B SWI_Handler PAbt_Handler B PAbt_Handler DAbt_Handler B DAbt_Handler IRQ_Handler B IRQ_Handler FIQ_Handler B FIQ_Handler ; Reset Handler EXPORT Reset_Handler Reset_Handler

Would a DAbt force a jump to the external handler or to the endless loop? I would guess the jump goes to EXTERN DAbt_Handler simply because the statement is located earlier in the code.

A second question: What exactly means EXPORT Reset_Handler?
Cancel
Vote up 0 Vote down

Cancel
0 Robert Suess over 14 years ago in reply to Robert Suess

I found out on my own that the default DAbt_Handler has to be commented out. This should always be done if an external label is importet.
Cancel
Vote up 0 Vote down

Cancel
0 John Linq over 14 years ago in reply to Robert Suess

www.keil.com/.../armasmref_Babcjehh.htm

IMPORT
imports the symbol unconditionally.
EXTERN
imports the symbol only if it is referred to in the current assembly.

[EXTERN in assembly] is different from [extern in C].
Cancel
Vote up 0 Vote down

Cancel
0 John Linq over 14 years ago in reply to John Linq
That means if any of these exceptions occurs, the firmware forces a reset. Again and again, if the exceptions source is an error in the source code of our firmware. Because of this, I am not able to find any error if the firmware resets on every exception...

My understanding is:

Assuming that,
for some reasons, your firmware push/pop some data from one of the stacks, causes a Data Abort,
then the processor performs the

LDR PC, DAbt_Addr

and since

DAbt_Addr DCD Reset_Handler

the processor runs the Reset_Handler once again, doing something else,
if the "something else" does not cause another Data Abort, you will not notice anything about the passed Data Abort.
However, the system is already messed up.
Cancel
Vote up 0 Vote down

Cancel
0 John Linq over 14 years ago in reply to John Linq

A Reset runs the Reset_Handler.
But re-run the Reset_Handler is not a Reset, the reason is as what Per has explained.
Cancel
Vote up 0 Vote down

Cancel
0 Robert Suess over 14 years ago in reply to John Linq

Hello John,

thank you for the link to the Assembler Reference. I found the explaination for EXPORT there, but it is not fully clear to me why this directive is used for the Reset_Handler symbol in the startup file.

Reading my thread here again I unterstand the following:

1. Any program exception (DAbt, PAbt, Undef, Reset) leads to a call of the reset handler in the firmware.

2. Calling the reset handler simply equals a jump to the start address of the firmware without setting any reset conditions in the device.

3. Because of this I have to implement a mechanism to force a real reset.

Someone tell me please if I'm right or wrong.
Cancel
Vote up 0 Vote down

Cancel
0 ImPer Westermark over 14 years ago in reply to Robert Suess
Your code does contain jumps to the reset address.

Most startup files do not. It's normal to either supply a real exception handler, or have just a busy-loop like:

PAbt_Handler B PAbt_Handler

For programs that has the watchdog enabled, the above busy-loop will hang the processor in the loop until the watchdog generates a real reset, that does not only jump to the reset address but first performs a full reset of the processor. And full reset here means that all registers gets default values (except the boot reason bits, that will inform that it was a watchdog reset), and all internal state machines gets reset.

So a program should never make an intentional jump to the reset address. If the detected problem can't be solved by explicit code, then the program should let the watchdog force a reset.
Cancel
Vote up 0 Vote down

Cancel
0 Robert Suess over 14 years ago in reply to ImPer Westermark
Thank you for confirming my assumptions Per. It finally sunk in!

I feel a little bit stupid for taking so long until I realized what you have meant.

Since I am informed about what the problem is and I am on my way reading the guide, I found interesting codelines in the firmware.

While searching for enabled interrupt sources (to get a better overview of the firmware) I found an attempt to implement the watchdog in reset mode:

__irq void watchdog(void) { } void Init_Watchdog(void) { // RSIR|= 0x04; VICVectAddr1= (unsigned long)watchdog; // set interrupt vector in 0 VICIntEnable= VICIntEnable | 0x00000001; WDTC= 0x00000FFF; WDCLKSEL= 0x00000001, WDCLKSEL= 0x00000001; WDMOD|= 0x3; os_dly_wait(100); WDFEED= 0xAA; os_dly_wait(100); WDFEED= 0xEE; }

Because of the some errors and unnecessary code lines (red marked) the watchdog never was running, I suppose.
Furthermore it is needless to set the watchdog as vectored interrupt, when setting it to reset mode.

Now I will try to implement the watchdog, including the original endless loops called on program exceptions to get a real reset!

Any further suggestions?
Cancel
Vote up 0 Vote down

Cancel
0 @Marc Crandall over 14 years ago in reply to Robert Suess

Hi Robert,

Don't you still need to determine the cause of your exception/reset?

Before enabling any watchdog I would recommend implementing proper exception handlers (even if they are simple while(1)) and observing your RSID value on reset.

Also, you should note implementing a watchdog in an OS task based firmware in not as straight forward as you might expect.

You will need a way for each task (or relevant tasks, anyway) to flag a intermidiate watchdog flag before you actually feed the hardware watchdog. (otherwise it is meaningless or you are only 'watching' a single task)

I highly recommend figuring out if you have any issues and implementing proper handlers before enabling a watchdog.

M
Cancel
Vote up 0 Vote down

Cancel
0 Robert Suess over 14 years ago in reply to @Marc Crandall

Hello Marc.

You are right, I still need to find the cause of exceptions / reset. But actually I am not able to make the firmware behave so bad! ;-)
Whatever has forced the exceptions / reset, it has apparently temporarily gone away.

And you are right once more, if you tell me to implement proper exception handlers before enabling watchdog.
That's why I wrote
'Now I will try to implement the watchdog, including the original endless loops called on program exceptions to get a real reset!'
in my latest post.

Could you explain a little more detailed, why I need an intermediate watchdog flag when I use RTOS?
I plan to reload the WD in every active task including my idle task. I thought it should be a good plan, because in case of a exception and a call to an endless jump loop no task is able to reload the WD.
Are there mistakes in this plan or something more that I should consider?

Best regards
Robert
Cancel
Vote up 0 Vote down

Cancel
0 @Marc Crandall over 14 years ago in reply to Robert Suess

Hi Robert,

If you 'feed' (kick, reload...) the watchdog in all tasks than you will not know if only one of your tasks gets stuck.

If you use the watchdog as you describe than you are only using it to reset your device when an exception occurs. Generally I think watchdogs provide a bigger function than simply a reset on an exception.

However, this would work as you describe.

If you have access to the external reset I would suggest this as a better mechanism for resetting your device on an exception and I would use the watchdog to ensure all tasks are properly executing.

Regards,

Marc
Cancel
Vote up 0 Vote down

Cancel
0 ImPer Westermark over 14 years ago in reply to @Marc Crandall

A normal way to implement the watchdog function is to have only the lowest-prioritized task kick the watchdog.

This proves that you have enough CPU capacity that you don't starve this low-prio task.

But to verify that all the other tasks works, you normally have them kick internal counters. The low-prio task checks that these counters all get updated - if a counter has stood still for too long time, then all kicking is of the physical watchdog is stopped.

In some situations, you may have to create a table where the low-prio task knows the max time allowed for the different high-prio tasks to update their individual counters.

Also, tasks may no longer perform infinite waits, but should specify a timeout. Just so that they can update their counters even if a serial listener never gets any data to process from an external serial port.

Next step up is to also perform dynamic checking of contents if important configuration and data structures, and stop kicking the watchdog if there is any errors detected.
Cancel
Vote up 0 Vote down

Cancel
0 Robert Suess over 14 years ago in reply to ImPer Westermark

Good morning Marc, good morning Per.

At first I want to thank you again for all the great input.

I understand what you both have explained to me and I decided to implement your recommendations stepwise.

=> The first step is to implement the watchdog to force a real reset on error exceptions. This is a better behaviour than simply jumping to start adress - that's what I have learned. ;-)

This step is nearly finished and it should be enough for the moment, since this is much more than I was instructed to do with the firmware. With this step I should be able to better debug the firmware in case of an error exception.

=> The second step is to implement the following mechanism:

Active tasks do not wait infinite and every task has its own timer.

The watchdog is kicked by the lowest priorized task only (actually my idle task) to ensure that enough cpu capacity is available.
Furthermore, this task checks if all other tasks are running by checking their timers.
If a task got stuck, the checking task can restart it.
If restarting the deadlocked task is not enough (depending on the tasks function) the checking task could reset the whole device via external reset (I'm sure I can get access to it) after error information have been saved to memory (task name, register states, ...).

If a program error exception is detected, the device would be restarted via external reset when the error conditions have been saved to memory by the regarding handlers.

Is the second step consistent to that what you both have suggested?

I have another question regarding the user defined stack size for every task.

The RTOS is initialized by os_sys_init(task1).
I want to have task1 its own user defined stack size using os_sys_init_user(task1, ...), but I do not want to reserve memory for its stack permanently.

How can I provide memory dynamically for task1 that is finished by os_tsk_delete_self()?

Looking forward for your answers.

Best regards
Robert
Cancel
Vote up 0 Vote down

Cancel