This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Non allocatable reset of NXP LPC2368 using RTX when modifiing variables

Hello everyone,

today I'm asking for hints on a tricky problem. We have a firmware that uses RTX-Kernel running on a NXP LPC2368. Now the device that the firmware is written for should get a new lc display.
My honest mission is to change the firmware in order to use the new display.

I've spent some weeks this year to do so and some time I've had the problem that the controller resets short time after start and again and again...

Everytime this behaviour occured I have deleted one or more obsolete variables (mostly global) or functions. In most cases I solved the problem by searching other obsolete variables and deleting them from source code - try and error. That is really time-killing.

While testing the firmware on wednesday, I tried to make the adopted and modified routine for writing data to display RAM a little faster. I moved an global unsigned int to the function and changed it to static unsigned char because the value it has to carry is 0x0D at a maximum.

After flashing the firmware in the controller, the controller hung at a random short time.

Yesterday I was trying to solve the problem with hanging firmware on random time and found the problem when no task is running: OS calls os_idle_demon() and was not able to return from it. I found a solution in world wide web: Creating an empty low priority task without using any os_wait functions that prevents the OS from calling the idle task. (It has something to do with incorrect interrupt states on retunring from idle task.)

Today I further tried to make the display writing function faster and changed two unsigned char inside the function from static to non-static. After flashing this firmware the controller resets again and again. I will now try to find out why the controller behaves this way.

What I found out is, that no watchdog is enabled by user (is it part of the OS?). The os_stk_overflow an os_idle_demon are not called from OS. I debug the firmware using ULINK2.

Any ideas where to search the problem for?

Best regards

Parents

0 Robert Suess over 14 years ago in reply to Robert Suess

Oops, I have made a little mistake: Stack size (OS_STKSIZE) in our firmware is defined as 274. That means stack size for each task is 1096 bytes (274 * 4 bytes) at a maximum. Then I am at the beginning now, because a stack overflow seems impossible. To ensure that there is now overflow, I have to implement the water-level method tommorow, right I am?
How do I find out which memory area is used for the 12 * 1096 bytes stack?

Best regards
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Robert Suess over 14 years ago in reply to Robert Suess

Oops, I have made a little mistake: Stack size (OS_STKSIZE) in our firmware is defined as 274. That means stack size for each task is 1096 bytes (274 * 4 bytes) at a maximum. Then I am at the beginning now, because a stack overflow seems impossible. To ensure that there is now overflow, I have to implement the water-level method tommorow, right I am?
How do I find out which memory area is used for the 12 * 1096 bytes stack?

Best regards
Cancel
Vote up 0 Vote down

Cancel

Children

0 John Linq over 14 years ago in reply to Robert Suess

Maybe take a look at the below link first:

http://www.keil.com/forum/16324/

There are different types of stacks. See what Franc Urbanc said.
Cancel
Vote up 0 Vote down

Cancel
0 Robert Suess over 14 years ago in reply to John Linq
Ok, thank you very much John.

I read the thread and linked threads and took care of Franc's explanations:

Franc Urbanc wrote: 4. the kernel main stack (defined in startup file) is not checked in stack checking.

If I understand it right, a kernel main stack overflow in main()-execution on startup is meant here and this would not be detected by RTOS.
That is not surprising me, because the RTOS is initialised at the end of main() and os_stk_overflow() is a function of the RTOS.

But I can't imagine the occurrence of a stack overflow while main()-execution in my case. Here is a more precise explanation of the reset on startup (my problem case):

The latest call in main() is os_sys_init(task1). It starts task1 that creates all other tasks we need (including a low-priority idle task to prevent the calling of os_idle_demon).
One of the tasks started by task 1 (let's call it 'displaytask') initializes a mutex for an display RAM writing function and shows a welcome screen.
After writing the welcome screen to the display RAM using the mutex-locked function the displaytask waits 3 seconds via os_dly_wait().
This is the time the controller resets - the next statement of the displaytask is never called.
When I debug the firmware I found my own idle-task incrementing a static int to 999999 and then setting it to 0 in an endless-loop, while the displaytask is waiting.
While incrementing the static int in my idle task, the controller resets.
This is the idle task (with a stack usage of 0 bytes I guess):

__task void task3(void) { static int iVar; for(iVar= 0; iVar < 1000000; ++iVar){ if(iVar == 999999) iVar= 0; } } //------------------------------------------------------------------------------

The described behaviour occurs e.g. when I change two static unsigned char (RW data) to non-static unsigned char (stack of a task) inside a function written by me. I see, it smells like overflowing stack but I can't imagine why the hell in the described situation any stack should overflow.

Any more ideas what to check to get the reason of the reset?

Beside reading in forum and checking the firmware I realized that debug information was cached on my local PC and that the RT-Agent was not implemented. I fixed this to be capable to really find errors. Now I'm armed to eleminate the bug!

Two other hints where crossing my way while invastigating to detect the bug:

1. May a wrong aligned stack pointer be the reason?

2. May the MAM Timing (set to 4 fetch cycles) be the reason?
I changed it to 5 cycles but the controller still was doing the reset, but maybe it has something to do with my problem?

Best regards and many thanks for every hint

EDIT: OK, NOW I AM GOING TO GO NUTS, THE FIRMWARE NOW IS NOT RESETTING ANY MORE WITH CHANGED VARIABLES... I KEEP TESTING IT. COULD IMPLEMENTATION OF RTA AND DISABLED DEBUG CACHE HAVE SOLVED THE PROBLEM???
Cancel
Vote up 0 Vote down

Cancel
0 Robert Suess over 14 years ago in reply to Robert Suess

Regarding to my problem: Where can I find the information that is provided by adding '--info=summarystack" to the linker control string?

THX
Cancel
Vote up 0 Vote down

Cancel
0 John Linq over 14 years ago in reply to Robert Suess

Did you enable this feature of RTX?

http://www.keil.com/support/man/docs/rlarm/rlarm_ar_cfgstchk.htm
http://www.keil.com/support/man/docs/rlarm/rlarm_ar_cfgerrfunc.htm

Just in case you did not use this feature,
1. Enable Stack Checking of RTX.
2. Set a breakpoint at the beginning of [void os_error (U32 err_code)].
3. Run your program, wait to see what will happen.
Cancel
Vote up 0 Vote down

Cancel
0 John Linq over 14 years ago in reply to John Linq
An endless-loop normally looks like:

while(1) { }

for(;;) { }
Cancel
Vote up 0 Vote down

Cancel
0 Robert Suess over 14 years ago in reply to John Linq

Hello John.

Thank you for your answer again.

Yes, OS_STKCHECK is enabled all the time. We use an older version of RTX and so we have the os_stk_overflow() instead of os_error() to recognize stack overflows. But this error function is never called when the reset situations occur.

Regarding the endless-loop: Yes, I know what an endless-loop looks like, but I wanted my idle task to have some job to do ;)

Since the controller does no reset with the changed variables at the moment, I now will try to force the reset situation again.

I want to remember all readers to this thread that there are many unanswered questions asked by myself in this thread. I conclude them:

1. What static code analyzer would you suggest to debug / analyze a RTX project? (I am able to use a analytic function (--callgraph) provied by the compiler inside µVision4)

2. Do you think that I should use the water-level method for stack checking, if I can force occurrence of the reset-error again? (Why should I do so, since os_stk_overflow was never called in the past?)

3. If question 2 is answered with 'yes', how can I locate the 1096 bytes large stacks for the tasks and fill them with 0xDEADFADE? (I know how to write values to a memory area, but do not know where exactly the stack in RAM is placed by the RTOS.)

4. May a wrong aligned stack pointer be the reason for occurrence of reset-errors?

5. May the MAM Timing setting (4 fetch cycles) be another reason for errors?

6. Is there any idea why implementation of RT Agent has led to an working version of the firmware? (I think of the hint by Per regarding the possibility of rearrangement of the whole firmware if one little thing is changed.)

Best regards and thank you for any answer to my questions
Cancel
Vote up 0 Vote down

Cancel
0 ImPer Westermark over 14 years ago in reply to Robert Suess

Note that the watermark method indicates how much stack you use. The OS code just tells you if you get an overflow.

But if you only make use of the OS code, then the question is - how do you properly allocate optimal stack sizes for all your tasks without either being very close to the limit (so a single extra auto variable [potentially from changed code optimization] takes you over the limit) or wastes excess stack space that you could have used for larger communication buffers?

You always want to quantify your stack need, so you can produce a document saying how much safety margin you have added and why you think that should be enough.
Cancel
Vote up 0 Vote down

Cancel
0 Robert Suess over 14 years ago in reply to ImPer Westermark

Ok, I see. But how can I fill the stack of every task with a pattern? Thats the point I am hanging at.

My idea is to declare a char at the very beginning of a task and then fill 1096 bytes with my pattern beginning from the char's address.
The char is an auto variable and should be placed on the stack. In debug mode I can check the char's adress with a breakpoint when the task starts.
Then I let the firmware run and try to heavy load the task.
At the end I check how much of the pattern exists any more in the 1096 bytes following the char's address.

That's it?
Cancel
Vote up 0 Vote down

Cancel
0 ImPer Westermark over 14 years ago in reply to Robert Suess

I use individually sized stacks for every task.

So I have a number of global arrays that I send as parameters for the stacks when I create the tasks. It's quite easy to fill these arrays before the tasks are created as I already know their addresses and sizes. And if I verify that the linker doesn't split them into two memory regions (for processors that has multiple RAM memory regions), I can use a single loop to fill all stack memory space.

If you configure the OS to supply the stacks, then you should still have access to a symbol for the memory area the OS will make use of, so you don't need to find the individual start address of each task stack.
Cancel
Vote up 0 Vote down

Cancel

0 Robert Suess over 14 years ago in reply to ImPer Westermark

I see. I'll try to generate an example with user-defined stack for my idle task 'task3', that needs no stack space for variables. So the only thing I have to do is reserve a stack space that has at least 68 bytes and fill it with a pattern:


static U64 Idle_Stk[88/8];
OS_TID id3;

main(){
   unsigned char pattern[8]= {0xDE, 0xAD, 0xFA, 0xDE, 0xDE, 0xAD, 0xFA, 0xDE};
   int i;
   for(i= 0; i < sizeof(Idle_Stk); ++i)
      memcpy(&Idle_Stk[i], pattern, 8);
   // ...
   os_sys_init(task1);
}
//-------------------------------------------

__task void task1(void){
   //...
   id3= os_tsk_create_user(task3, 1, &Idle_Stk, sizeof(Idle_Stk));
   //...
}
//-------------------------------------------

Is that code right?

How can I verify that the stack is not splitted by the linker?

Best regards and thank you very much so far!

0 Robert Suess over 14 years ago in reply to Robert Suess
I've made a little mistake... sizeof(Idle_Stk) returns 88 and in the for-loop I need result 11. So the for-loop should look like this:

for(i= 0; i < (sizeof(Idle_stk) / 8); ++i)
Cancel
Vote up 0 Vote down

Cancel
0 ImPer Westermark over 14 years ago in reply to Robert Suess
Not sure where you got your value 68 from. But it is a quit "odd" value - do note the alignment requirements for the stack. You would normally also size the stacks as x times your alignment requirement.

So it's quite common to have something like:

U64 render_stack[1280/8]; U64 display_stack[1024/8]; ...
Cancel
Vote up 0 Vote down

Cancel
0 Robert Suess over 14 years ago in reply to ImPer Westermark

I successful tested my first task-creation with user-defined stack (including a pattern initializing the stack). I have seen the pattern in debugger and how much it is overwritten. I'm very proud!

The 68 byte come from here: http://www.keil.com/support/man/docs/rlarm/rlarm_ar_cfgstack.htm , where is written:
On the full context task switch, the RTX kernel stores all ARM registers on the stack. Full task context storing requires 64 bytes of stack.

Additionally I remember that I've read something these days, that in some cases 4 bytes more are needed for successful task switch, but I can't find it this minute.

Thats why I "guessed" to need at least 68 bytes for the stack of my idle-task.

I verified the stack-usage of my idle-task with debugger and found 4 byte used at the very beginning of the stack and 64 bytes used at the end of the stack, so I believe that 68 bytes are quite fine.

I want to thank you very much again - I now have a wide set of tools if I need to find any error in the future!

If I come in a situation again where the controller resets while starting I will investigat the reasons more deeper and report in this thread here.

So lets go on to estimate how much stack space a task needs. Let's say I have another simple task. Looking in the file generated by --callgraph linker option I find a Max Depth of 128 bytes and the task itself needs 0 byte of extra stack. So I would simply estimate that the task needs a 196 bytes wide stack (68 bytes basic stack for task switch and 128 bytes for the longest call chain).
May that be right?
Another question regarding this task: The task has a local unsigned short. Why this variable needs no stack space?

Best regards
Cancel
Vote up 0 Vote down

Cancel
0 ImPer Westermark over 14 years ago in reply to Robert Suess

Note that the compiler can decide to use a register instead of allocating a variable on the stack - then the stack space for that variable will be included in the stack space used for a state save during a task switch.

The four bytes you saw at one end of the stack was probably the OS overwrite marker, that it uses to detect a stack overflow.
Cancel
Vote up 0 Vote down

Cancel
0 Robert Suess over 14 years ago in reply to ImPer Westermark
Ok, I see. Your explanations sound logical to me, thank you Per.

To go on with user defined stack space for most of the tasks in our firmware I checked the 'Max Depth' value outputted by --callgraph. Then I added 68 bytes to estimate maximum stack space needed and increased the value to a multiple of eigth. That works fine so far.

But now, there is a task in the callgraph output file that looks like this:

task4 (ARM, 848 bytes, Stack size 0 bytes, ma96.o(.text), UNUSED)

If I create the task with a user-defined stack of 68 (72) bytes, the os_stk_overflow() is called right after the task has been started.
I wonder if there is the word 'UNUSED' in the callgraph output. The task is called often and there are several functions that will be called by the task on runtime.

Why can callgraph not calculate any call chain?

Why is the task marked as 'UNUSED' in callgraph output file?

Should I manually estimate the worst case call chain for the task?

Best regards
Cancel
Vote up 0 Vote down

Cancel