Hi,
We have an application using RL-ARM on STM32 (Cortex-M3), 6 tasks are running in total.
Between our tasks we pass messages via mailboxes, strictly from a _alloc_box and _free_box memory pool.
However, in long term testing of our product, we are experiencing a system crash after ~3 days, where an invalid memory address (as in not one from the pool) is coming off the mailbox. This looks like an address that the RTOS uses. However, the ones that are going into the mailbox have been allocated with _alloc_box.
We have tried updating to the latest MDK (3.50) binary only version, and it does the same.
Does anyone have any ideas or advice as to why this could be happening, and how we could debug this? We have tried tracing, and increasing the stack size, but this does not help. Our next idea at the moment was to see if we could obtain the source code, but I am not sure this will tell us much more.
Best Regards,
Martin.
it sounds like memory corruption. mailboxes use dynamically allocated memory - can't you use events/mutexes/semaphores in conjunction with statically allocated data? maybe you even suffer from memory fragmentation, which is likely after a large number of allocations and de-allocations.
Hi Tamir,
Thanks for your reply. I agree with you it does sound like memory corruption.
However, the _alloc_box and _free_box routines use a statically allocated memory pool. Therefore the memory is declared from the outset, and each element is the same size in bytes. Dynamic memory allocation is not used.
if you don't have access to source code, try embedding all calls to allocating/deallocating routines inside your own routines that keep an administration of the memory accessed.
1) verify that all the items returned from _alloc_box are actually within the box
2) Do not free the same memory location more than once. This may (will) cause the box to become invalid.
3) If you overwrite data within a box used to keep track of the state of the box, the results are unpredictable. (i.e. your code will fail at some point)
Item 2 or 3 are the most likely issues if you are seeing #1.
If #1 is not the case, than I would suspect you are over writing the mailbox structure at some point. This will also cause unpredictable results.
Thanks for your replies.
After extensive testing, we were still unable to determine the cause of this failure.
We have peppered the application with breakpoints if any rtos function returns an incorrect value, we also confirmed that _alloc_box returns in-bounds data as expected.
What was strange is that some of the O/S calls were returning corrupted values. But if you re-position the program counter at the start of the function and run it again, the program continues OK, and the RTOS call returns the correct value.
Some kind of corruption is occurring but we cannot see where.
As a test, we have replaced the Keil RTOS with FreeRTOS. Now our application has been running for nearly a week now, whereas we would get 1-3 days with RL-ARM.
I still cannot rule out some rogue code in our app, but we will see what happens in this test.
Thanks,
Have you seen this thread: http://www.keil.com/forum/docs/thread14677.asp
Might there be a problem that interrupts are enabled somewhere by a OS call, while you assume that you are still in a critical section?
View all questions in Keil forum