Hello everyone,
today I'm asking for hints on a tricky problem. We have a firmware that uses RTX-Kernel running on a NXP LPC2368. Now the device that the firmware is written for should get a new lc display. My honest mission is to change the firmware in order to use the new display.
I've spent some weeks this year to do so and some time I've had the problem that the controller resets short time after start and again and again...
Everytime this behaviour occured I have deleted one or more obsolete variables (mostly global) or functions. In most cases I solved the problem by searching other obsolete variables and deleting them from source code - try and error. That is really time-killing.
While testing the firmware on wednesday, I tried to make the adopted and modified routine for writing data to display RAM a little faster. I moved an global unsigned int to the function and changed it to static unsigned char because the value it has to carry is 0x0D at a maximum.
After flashing the firmware in the controller, the controller hung at a random short time.
Yesterday I was trying to solve the problem with hanging firmware on random time and found the problem when no task is running: OS calls os_idle_demon() and was not able to return from it. I found a solution in world wide web: Creating an empty low priority task without using any os_wait functions that prevents the OS from calling the idle task. (It has something to do with incorrect interrupt states on retunring from idle task.)
Today I further tried to make the display writing function faster and changed two unsigned char inside the function from static to non-static. After flashing this firmware the controller resets again and again. I will now try to find out why the controller behaves this way.
What I found out is, that no watchdog is enabled by user (is it part of the OS?). The os_stk_overflow an os_idle_demon are not called from OS. I debug the firmware using ULINK2.
Any ideas where to search the problem for?
Best regards
It would be up to you to enable any watchdog.
The RTOS can't do it, because the RTOS would not know when to kick the watchdog. A program that makes use of a watchdog should make a lot of attempts to verify that the program is really, really behaving well before deciding to kick. The RTOS can only figure out if it is working ok - not if running threads are doing what they are expected to, or if sleeping threads are really expected to be sleeping.
It sounds like you have uninitialized variables, stack overflow or a buffer overflow (memory overwrites) somewhere in the program. Adding or removing global variables or changing the contents on the stack changes the behaviour you see because your code changes also moves the location of lots of variables. And changes the total amount of stack space needed.
Have you started by making sure you compile your code at maximum warning level?
Have you tried to fill the stacks with a pattern and check how much of the stacks that are getting used?
If Per is right, you probably want to scan your software with a static code analyzer.
The os_stk_overflow an os_idle_demon are not called from OS
Maybe the overflow occurs during the execution of an interrupt (IRQ mode has a separate stack)? Note that RTX cannot warn you about that.
Good mornig Per, Good morning Tamir,
at first I want to thank you for your fast response. I wrote my opening post just before I started my weekend. Thats why I answer that late.
I will now check if the compiler warns me at a maximum level when compiling, thanks for this hint.
After that I will try the water-level method to watch the stack usage, like Per further suggested. If the stack of one or more tasks is at the upper limit I must rise the stack size. With the water-level method I additionally can see, if some tasks are using only a small amount of their reserved stack size - possibly user defined stack sizes would be wise.
Finally a question about static code analyzers: Is there a tool that you can suggest?
View all questions in Keil forum