This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Unable to determine offending instruction: usage fault illegal unaligned load or store cortex m7 keil mdk pro

Problem

Once in a blue moon (every ~500 hours of run time, non-deterministic) I am getting a Usage Fault/ Illegal unaligned load or store. Please, help me to trace the fault to actual offending instruction and extract additional info.

What have I done so far:

Per ARM AN209 (www.keil.com/.../apnt209.pdf) I have installed a hard fault handler.

extern C void HardFault_Handler(void) 
{ 
__asm volatile 
( 
tst lr, #4 \n 
ite eq \n 
mrseq r0, msp \n 
mrsne r0, psp \n 
ldr r1, [r0, #24] \n 
ldr r2, handler2_address_const \n 
bx r2 \n 
handler2_address_const: .word prvGetRegistersFromStack...Code

Values of the registers extracted from the exception stack frame are:

LR is 0x0803B131
PC is 0x0803A758
PSR is 0x01000000
// Extracted after printf executed
HFSR is 0x40000000 indicating Forced (I do not have a separate handler for UsageFaults)
CFSR is 0x01000000 indicating UsageFault, UNALIGNED access
xSPR is 0100 0003
MMFAR is 0
BFAR is 0

pc (0x0803A758) points to a dead loop of osRtxIdleThread.

This could not be the offending instruction, could it? What am I missing?

I have checked the RM0385 Reference manual, I see that the exception stack has been parsed properly.

@2001A198:

00000000 00000000 00000000 00000000   00000000 0803B131 0803A758 01000000
E25A2EA5

My system is:

Cortex M7 (STM32F746NG..)

Keil RTOS

Once the unit is stalled, I have connected to it using SEGGER J-Link/J-Trace for Cortex M (using Ozone), stopped the program, and examined the memory contents.

Unaligned access fault trap (UNALIGN_TRP) is disabled ( per reference it means that only multi-word instructions can generate this fault)

I read:

community.arm.com/.../debugging-a-usage-fault-for-an-unaligned-memory-access

www.keil.com/.../3777.htm (but no external SDRAM is used)

RM0385 Reference manual

also I read:

medium.com/.../the-curious-case-of-unaligned-access-on-arm-5dd0ebe24965

stackoverflow.com/.../unaligned-access-causes-error-on-arm-cortex-m4

stackoverflow.com/.../what-is-non-aligned-access-arm-keil

stackoverflow.com/.../arm-unaligned-memory-access-workaround

stackoverflow.com/.../unaligned-access-causes-error-on-arm-cortex-m4

  • Without direct access to your platform it is very hard to guess what is happening. On possible cause is stack overflow, but I am not sure how come the stacked PC showing the idle thread. Since you mentioned you have J-trace, ideally use J-Trace to collect instruction trace in real-time, that can be really useful in solving problems like this.

    A few other things you can try:

    - investigate if there is stack overflow in your RTX threads : in you RTX_Config.h, enable Stack overrun checking, and optionally enable Stack usage watermark.

    If you enabled Stack usage watermark, you can use RTX RTOS viewer (View->Watch windows -> RTX RTOS) to observe actual stack usage. Once the program run for a bit and then halted (by you, via the debugger), in the RTX RTOS window, you can expand the thread information there and you can then see stack usage details. (Note: enabling these check will increase context switching overhead, so normally this is enabled only during software development).

    - investigate if there is an overflow of main stack. First look at the stack usage report in the html file (in objects directory) to see the max stack usage, compared to the main stack allocated in the device startup file.

    - use event trace to observe what is the combinations of exception events happening just before the crash.

    - Potentially you can try setup a data watchpoint at the end of main stack (adding a data variable in the main stack declaration, and set the data watch point to it) to see if it hit. If it does, your main stack has overflowed.

    regards,

    Joseph

  • Joseph, thanks for you response.

    Without direct access to your platform ...

    The system is still powered up and can be examined. Let me know what other information do you think it may be valuable to extract?

    FYI, I am having an interrupt based UART transfer and TCP/IP networking enabled.

    Re: stack overflow

    Usually, in case of stack overflow system goes through SIGABRT and termination, not to a hard fault handler. Do you think it might be different this time?

    Re: use J-Trace to collect instruction trace in real-time

    This is what I was planning to do. My trouble is that sometimes it takes 500 hours, sometime more to get to a fault. So I am trying to extract as much info from the present case as I can.

    And even when I get an instruction trace in case of imprecise faults the offending instruction may be many instructions upstream of the execution flow.

    Re: in you RTX_Config.h, enable Stack overrun checking and enable  enable Stack usage watermark

    Useful, thanks, will do.

    Re: use RTX RTOS viewer ... to observe actual stack usage

    Will do, thanks.

    Re: Use event trace to observe what is the combinations of exception events

    Will enable that, great idea

    Joseph, Which of the information I provided makes you think it is stack overflow related?