Problem
Once in a blue moon (every ~500 hours of run time, non-deterministic) I am getting a Usage Fault/ Illegal unaligned load or store. Please, help me to trace the fault to actual offending instruction and extract additional info.
What have I done so far:
Per ARM AN209 (www.keil.com/.../apnt209.pdf) I have installed a hard fault handler.
extern C void HardFault_Handler(void) { __asm volatile ( tst lr, #4 \n ite eq \n mrseq r0, msp \n mrsne r0, psp \n ldr r1, [r0, #24] \n ldr r2, handler2_address_const \n bx r2 \n handler2_address_const: .word prvGetRegistersFromStack...Code
Values of the registers extracted from the exception stack frame are:
LR is 0x0803B131 PC is 0x0803A758 PSR is 0x01000000 // Extracted after printf executed HFSR is 0x40000000 indicating Forced (I do not have a separate handler for UsageFaults) CFSR is 0x01000000 indicating UsageFault, UNALIGNED access xSPR is 0100 0003 MMFAR is 0 BFAR is 0
pc (0x0803A758) points to a dead loop of osRtxIdleThread.
This could not be the offending instruction, could it? What am I missing?
I have checked the RM0385 Reference manual, I see that the exception stack has been parsed properly.
@2001A198:
00000000 00000000 00000000 00000000 00000000 0803B131 0803A758 01000000E25A2EA5
My system is:
Cortex M7 (STM32F746NG..)
Keil RTOS
Once the unit is stalled, I have connected to it using SEGGER J-Link/J-Trace for Cortex M (using Ozone), stopped the program, and examined the memory contents.
Unaligned access fault trap (UNALIGN_TRP) is disabled ( per reference it means that only multi-word instructions can generate this fault)
I read:
community.arm.com/.../debugging-a-usage-fault-for-an-unaligned-memory-access
www.keil.com/.../3777.htm (but no external SDRAM is used)
RM0385 Reference manual
also I read:
medium.com/.../the-curious-case-of-unaligned-access-on-arm-5dd0ebe24965
stackoverflow.com/.../unaligned-access-causes-error-on-arm-cortex-m4
stackoverflow.com/.../what-is-non-aligned-access-arm-keil
stackoverflow.com/.../arm-unaligned-memory-access-workaround
Without direct access to your platform it is very hard to guess what is happening. On possible cause is stack overflow, but I am not sure how come the stacked PC showing the idle thread. Since you mentioned you have J-trace, ideally use J-Trace to collect instruction trace in real-time, that can be really useful in solving problems like this.
A few other things you can try:
- investigate if there is stack overflow in your RTX threads : in you RTX_Config.h, enable Stack overrun checking, and optionally enable Stack usage watermark.
If you enabled Stack usage watermark, you can use RTX RTOS viewer (View->Watch windows -> RTX RTOS) to observe actual stack usage. Once the program run for a bit and then halted (by you, via the debugger), in the RTX RTOS window, you can expand the thread information there and you can then see stack usage details. (Note: enabling these check will increase context switching overhead, so normally this is enabled only during software development).
- investigate if there is an overflow of main stack. First look at the stack usage report in the html file (in objects directory) to see the max stack usage, compared to the main stack allocated in the device startup file.
- use event trace to observe what is the combinations of exception events happening just before the crash.
- Potentially you can try setup a data watchpoint at the end of main stack (adding a data variable in the main stack declaration, and set the data watch point to it) to see if it hit. If it does, your main stack has overflowed.
regards,
Joseph
Joseph, thanks for you response.
Without direct access to your platform ...
The system is still powered up and can be examined. Let me know what other information do you think it may be valuable to extract?
FYI, I am having an interrupt based UART transfer and TCP/IP networking enabled.
Re: stack overflow
Usually, in case of stack overflow system goes through SIGABRT and termination, not to a hard fault handler. Do you think it might be different this time?
Re: use J-Trace to collect instruction trace in real-time
This is what I was planning to do. My trouble is that sometimes it takes 500 hours, sometime more to get to a fault. So I am trying to extract as much info from the present case as I can.
And even when I get an instruction trace in case of imprecise faults the offending instruction may be many instructions upstream of the execution flow.
Re: in you RTX_Config.h, enable Stack overrun checking and enable enable Stack usage watermark
Useful, thanks, will do.
Re: use RTX RTOS viewer ... to observe actual stack usage
Will do, thanks.
Re: Use event trace to observe what is the combinations of exception events
Will enable that, great idea
Joseph, Which of the information I provided makes you think it is stack overflow related?