This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

[CM4] Best general way to handle a hardfault/lockup

Over the past few months I've been doing a lot of work on a Kinetis K24 processor, which is a Cortex-M4, running the MQXLITE RTOS. It also has a couple other SDKs built in and a surprising level of complexity for a CM4 application. What all that leads to is a frustrating number of faults, and I still have trouble catching a few.

I currently run with the usage fault, memmanage fault, and bus fault handlers disabled because I consider all these fatal errors. I'm only interested in logging the causes of the faults to a persistent storage and rebooting, so I force everything to escalate to hard fault.

I currently have a hard fault handler that looks like this:

    __asm volatile
      (
       " ldr r1, =last_fault \n" // get the persistent data address
       " mov r2, #1 \n"  // store the fault type
       " str r2, [r1, #28] \n" 
       " tst lr, #4  \n" // Determine which banked stack pointer we were using when the fault occurred
       " ittee eq     \n"
       " mrseq r0, msp  \n" // Load the appropriate stack pointer 
       " andeq r4, r0, #0x80000000  \n" // And mark which one it was
       " mrsne r0, psp   \n"
       " movne r4, r0  \n"
       " str r4, [r1, #16]  \n" // put away the stack register
       " ldr r3, [r0, #20]  \n" // stored lr
       " ldr r2, [r0, #24]  \n" // stored pc
       " ldr r5, [r0, #0]   \n" // stored r0
       " ldr r6, [r0, #4]   \n" // stored r1
       " str r3, [r1, #12]  \n" // put away the lr
       " str r2, [r1, #8]   \n" // put away the pc
       " str r5, [r1, #20]  \n" // put away cached r0
       " str r6, [r1, #24]  \n" // put away cached r1
       " ldr r2, handler2_address_const   \n" // a handler that parses the fault status registers
       " blx r2    \n"
       " handler2_address_const: .word store_fault_info            \n"
       " bkpt 255"  // force a lockup and reset the chip
       );

This has served me well for a lot of simple faults - null pointer dereferences, etc. The handler reads the status information, writes it to a peripheral on the K24 called the "system register file" that persists through any reboot that isn't POR or low voltage, and I read it when I boot up.

However, I still get some faults that do not appear to trigger this handler - I get a reboot, and my persistent data is uninitialized. My core question is, why does my handler sometimes not execute when a hard fault occurs? And how can I make it more general to handle this case?

Top replies

sfoster over 7 years ago +1 verified

Figured it out: A piece of code from our vendor was implementing a critical section by setting FAULTMASK rather than BASEPRI. This prevented the hard fault handler from firing (when FAULTMASK is set, only...

Parents

0 42Bastian Schick over 7 years ago

Simple question: Do you have a watchdog running which resets the system? If yes, maybe the hardfault handler triggers another hardfault (reading from R0)?
Is your persistent area completely empty? Even the marker missing?
Cancel
Up 0 Down

Cancel

Reply

0 42Bastian Schick over 7 years ago

Simple question: Do you have a watchdog running which resets the system? If yes, maybe the hardfault handler triggers another hardfault (reading from R0)?
Is your persistent area completely empty? Even the marker missing?
Cancel
Up 0 Down

Cancel

Children

0 sfoster over 7 years ago in reply to 42Bastian Schick
I do have a watchdog running, and I have a pre-watchdog-reset handler that writes to the persistent area as well. This pre-watchdog-reset handler consistently works well.

Immediately after boot, I copy data out of the persistent area, and then write a cookie to the persistent area. When I get this fault that appears not to trigger the handler, after the reset the only thing in the persistent area is the cookie.

I suppose my hard fault handler could be triggering another fault, but I'm at a bit of a loss where - the first time I would think this could occur is when dereferencing the stack pointer value I loaded from the core (line 13 in the snippet). However, on line 12 I'm loading the value of that stack pointer into my persistent area.

If I induce a hard fault by doing

// Load a bad value to the core stack pointer mov r0, #1 msr PSP, r0 // Jump-and-exchange to an ARM address to force a usage fault mov r0, #0 blx r0

then after the reset I see the value of the bad stack pointer after exception stacking (0xffffffffsomething because it underflows on exception stack), and then everything else in the persistent area is uninitialized because trying to load from that location causes a second fault.

Thank you for your response!
Cancel
Up 0 Down

Cancel