This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

App crashes without unused array declaration

I have an application that is running on an ARM9 and compiled with V3.60. It uses RL. I am having a strange problem. We are really close to the maximum amount of memory on our processor, within 2K. During development, I declared an unsigned char array of size 1056 with global scope in one of the libraries that handles data packets that are received.

#include <91x_conf.h>
#include <91x_lib.H>
#include <RTL.h>
#include <string.h>

U8 my_buffer[1056];

I changed my implementation later and no longer needed that array.

Now, here is the strange part.

Task1 cannot handle large packets without that array declaration being present. If I comment the array declaration out or cut the size in half, the first large packet (~1076 bytes) that comes in crashes the entire application, though it works as long as the packets received over Task1's interface are small. Another weird thing is that Task2 is unable to properly receive data if I leave the array declaration in my code, yet it works if I leave the declaration out.

Task1 and Task2 both claim between 2-3K of allocated blocks of memory on the heap when they start, and those blocks are not reallocated or deallocated until the board is rebooted.

Any ideas what is going on here?

  • Time to check your map file. Either you have a buffer overflow somewhere that needs part of this memory. Try to fill the memory with a known value and later check if it has been touched.

    Another alternative is that you have a buffer overflow or absolute address in use somewhere that overwrites something completely different, and the existence of this unused buffer affects what variable that is located where the buffer overrun happens.

    A third alternative is that you have a stack overflow, and that your buffer is located so that the existence/removal of the buffer changes what will be overwritten by the stack overflow.

    Buffer overruns can be evilishly hard to catch, which is a reason for always code defensively from day one. If you play with macros, you may even have to run the source code through the preprocessor just to be able to spot a source code line that may somehow change a pointer or index or in some other way result in memory addresses where none are allowed.

  • Task1 cannot handle large packets without that array declaration being present.

    That nearly certainly means that your "Task1" contains a massive buffer overflow that currently just so happens to overflow into that supposedly "unused" buffer. Take away the buffer, and you overflow into some other variables --- ka-BOOM.

    To test if this is your root cause, write a recognizable pattern you wouldn't have in any of your data (customary choices include repetetions of 0xDEADBEEF), execute a test sequence that's known to trigger the crash if the buffer is absent, then inspect the buffer. Is your tell-tale pattern still intact? Or does it, by any chance, hold content of that big data package (or derived from that)?

  • ... as the others have already said.

    Another possible way to catch such a problem is to set a data-write breakpoint in the supposedly "unused" array...

  • I believe you all are right. Thank you for your quick responses. I am leaning toward the stack overflow idea.

    A particular branch of my code under a certain operation from Task1 that handles those large packets also saves them to a storage media. I commented out just the part that was saving them, and the data transfer itself causes no crash. The call to save the data resembles this:

    Mem_Write( &rx_pkt[ 9 ], storage_address, data_size );

    Where rx_pkt is the data, data_size is extracted from the packet, which contains the length of the data, and the algorithm sequentially saves the data along the storage media. The data_size is always 11 less than the size of the packet, as I control the format on both ends. It SHOULD be consistent, as a failed checksum on receipt means that the packet is just ignored.

    So, something is overflowing there in the function call, I think.

    It goes BOOM on the very first packet received if I comment out the array, but only after it sends a response back to the data sender. The overflow may actually be there because the response is placed on the stack too.

    That gives me a few places to start tracing. Thanks guys.

  • Well, I think I've figured it out. I'll explain so that maybe this helps someone.

    The next line after the Mem_Write call was:
    os_dly_wait(30);

    I commented that out too. If I uncommented only the os_dly_wait line, crashes occured again, whereas uncommenting Mem_Write and leaving os_dly_wait commented did not. What I think was happening is that the os_dly_wait call combined with a debug message that I was sending on the same interface and the response back to the sender were stacking up too much stuff, causing a stack overflow. The debug message and the response to the sender both use static arrays, but they have to control an OS_MUT with an infinite timeout to be sent. Removing the delay allowed the debug message and response to go out without a delay, thus preventing the overflow.

    So it was a combination of stack overflow and task switching that caused a crash, followed by a watchdog reset.

    Thanks all.

  • It's hard to tell without seeing the code, but it sounds as if the root cause is still there! Are you comfortable with a buffer overrun solved by a different timing...?

  • Not exactly, no. I plan to discuss ways to correct the root cause with our Manager of Software Development to make sure it is fully resolved.