This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

[CM4] Best general way to handle a hardfault/lockup

Over the past few months I've been doing a lot of work on a Kinetis K24 processor, which is a Cortex-M4, running the MQXLITE RTOS. It also has a couple other SDKs built in and a surprising level of complexity for a CM4 application. What all that leads to is a frustrating number of faults, and I still have trouble catching a few.

I currently run with the usage fault, memmanage fault, and bus fault handlers disabled because I consider all these fatal errors. I'm only interested in logging the causes of the faults to a persistent storage and rebooting, so I force everything to escalate to hard fault.

I currently have a hard fault handler that looks like this:

    __asm volatile
      (
       " ldr r1, =last_fault \n" // get the persistent data address
       " mov r2, #1 \n"  // store the fault type
       " str r2, [r1, #28] \n" 
       " tst lr, #4  \n" // Determine which banked stack pointer we were using when the fault occurred
       " ittee eq     \n"
       " mrseq r0, msp  \n" // Load the appropriate stack pointer 
       " andeq r4, r0, #0x80000000  \n" // And mark which one it was
       " mrsne r0, psp   \n"
       " movne r4, r0  \n"
       " str r4, [r1, #16]  \n" // put away the stack register
       " ldr r3, [r0, #20]  \n" // stored lr
       " ldr r2, [r0, #24]  \n" // stored pc
       " ldr r5, [r0, #0]   \n" // stored r0
       " ldr r6, [r0, #4]   \n" // stored r1
       " str r3, [r1, #12]  \n" // put away the lr
       " str r2, [r1, #8]   \n" // put away the pc
       " str r5, [r1, #20]  \n" // put away cached r0
       " str r6, [r1, #24]  \n" // put away cached r1
       " ldr r2, handler2_address_const   \n" // a handler that parses the fault status registers
       " blx r2    \n"
       " handler2_address_const: .word store_fault_info            \n"
       " bkpt 255"  // force a lockup and reset the chip
       );

This has served me well for a lot of simple faults - null pointer dereferences, etc. The handler reads the status information, writes it to a peripheral on the K24 called the "system register file" that persists through any reboot that isn't POR or low voltage, and I read it when I boot up.

However, I still get some faults that do not appear to trigger this handler - I get a reboot, and my persistent data is uninitialized. My core question is, why does my handler sometimes not execute when a hard fault occurs? And how can I make it more general to handle this case?