I have an application that is running on an ARM9 and compiled with V3.60. It uses RL. I am having a strange problem. We are really close to the maximum amount of memory on our processor, within 2K. During development, I declared an unsigned char array of size 1056 with global scope in one of the libraries that handles data packets that are received.
#include <91x_conf.h> #include <91x_lib.H> #include <RTL.h> #include <string.h>
U8 my_buffer[1056];
I changed my implementation later and no longer needed that array.
Now, here is the strange part.
Task1 cannot handle large packets without that array declaration being present. If I comment the array declaration out or cut the size in half, the first large packet (~1076 bytes) that comes in crashes the entire application, though it works as long as the packets received over Task1's interface are small. Another weird thing is that Task2 is unable to properly receive data if I leave the array declaration in my code, yet it works if I leave the declaration out.
Task1 and Task2 both claim between 2-3K of allocated blocks of memory on the heap when they start, and those blocks are not reallocated or deallocated until the board is rebooted.
Any ideas what is going on here?
Well, I think I've figured it out. I'll explain so that maybe this helps someone.
The next line after the Mem_Write call was: os_dly_wait(30);
I commented that out too. If I uncommented only the os_dly_wait line, crashes occured again, whereas uncommenting Mem_Write and leaving os_dly_wait commented did not. What I think was happening is that the os_dly_wait call combined with a debug message that I was sending on the same interface and the response back to the sender were stacking up too much stuff, causing a stack overflow. The debug message and the response to the sender both use static arrays, but they have to control an OS_MUT with an infinite timeout to be sent. Removing the delay allowed the debug message and response to go out without a delay, thus preventing the overflow.
So it was a combination of stack overflow and task switching that caused a crash, followed by a watchdog reset.
Thanks all.
It's hard to tell without seeing the code, but it sounds as if the root cause is still there! Are you comfortable with a buffer overrun solved by a different timing...?
Not exactly, no. I plan to discuss ways to correct the root cause with our Manager of Software Development to make sure it is fully resolved.