I 'm an embedded software engineer and I'm currently working on a nRF51822 target which is a ARM m0. I want to design a fault logger to track errors in our code even it a debugger is not attached to the device at the moment of the fault.
I think the best solution is to write a custom hardfault handler to dump the stack frame and flash it in the internal flash memory. To do that, I need to exit the hardfault handler which is in Handler mode, whith privilege etc... and return to a "normal" threaded mode execution. In fact, in order to flash the device, I need severals interrupts to be fired. That's not possible in handler mode.
My first guess was to corrupt the stack frame to change the value of the PC and LR to jump to a safe zone. In the safe zone I would just had to flash the saved stack frame and reset the chip.
Here is my pseudo code :
My hardfault handler :
volatile void* msp;
msp=(void*)(__get_MSP()+0x20); //in my example I know i'm using MSP
volatile uint32_t* r0=((uint32_t*)msp);
volatile uint32_t* r1=((uint32_t*)msp+1);
volatile uint32_t* r2=((uint32_t*)msp+2);
volatile uint32_t* r3=((uint32_t*)msp+3);
volatile uint32_t* r12=((uint32_t*)msp+4);
volatile uint32_t* lr=((uint32_t*)msp+5); /* Link register. */
volatile uint32_t* pc=((uint32_t*)msp+6); /* Program counter. */
volatile uint32_t* psr=((uint32_t*)msp+7);/* Program status register. */
//Save stack frame in structure
//Prepare jump to safezone
The problem is if I go step by step in debugger the code jump as intended in the safe zone in threaded mode but if I place a breakpoint in the safe zone and run the code at once, the breakpoint is never reached.
What is happening ? Is it possible to continue execution after a Hardfault ? Did I omit some actions in order to recover ? Is the device in Lockup ?
Tanks in advance,
PS: Sorry for my bad English, I'm French
"I think the best solution is to write a custom hardfault handler to dump the stack frame and flash it in the internal flash memory. To do that, I need to exit the hardfault handler which is in Handler mode, whith privilege etc... and return to a "normal" threaded mode execution. In fact, in order to flash the device, I need severals interrupts to be fired. That's not possible in handler mode."
Actually, this kind of goal cannot be achieved by a simple hardfault handler in software layer. As you know, once the hard fault occurs, the CPU falls into fault handlers. For some serious hardware errors, we cannot rely on that other peripheral interrupts such as flash devices can work well because the whole hardware system may hang up if the bus is locked up.
One possible solution is to design software based on the hardware features. If the hard faults occur, the CPU triggers a silent reset and makes a special flag while keeping the DDR or SRAM content unchanged. So that after the reboot, the software can check the flag, which may be a special hardware register bit. If there is a previous hard fault detected, the software will use a separate debug stack to analyze the DDR/SRAM that all offending bug context was saved.
First of all, thanks for your reply.
Your solution seems to be good and I'll look closer in the future to implement it :)
Few questions remains :
In the case of a classic interuption, like a timer interrupt let's say, can we corrupt the stack frame as described in my first post to jump to another section ?
Could you explain why the corruption is working with the debugger connected and not if disconected?
As we do not know your code in details, I can just give some generic hints.
For the timer interrupt example, you can corrupt the current stack frame intentionally if you want and jump to another section. However, timer interrupt will come regularly, if the second timer interrupt triggers the timer handler ( at this moment, the stack frame is corrupted by the first interrupt execution ), it may crash. Of course you can deliberately do the stack frame housekeeping by yourself, it is a tough task and this kind of code is difficult to be ported or maintained later.
If you run the code with/without debugger, there is a difference for the CoreSight debug mode available.If your debugger is connected, the breakpoint can be Hardware breakpoint. Then it can access some special memory address. If your debugger is absent and you set breakpoint in this case, it should be software breakpoint.Software breakpoint cannot be configured in Flash memory in embedded systems.
Here is the Cortex-M0 TRM: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0432c/DDI0432C_cortex_m0_r0p0_trm.pdf
For more details about HW vs SW breakpoint:https://stackoverflow.com/questions/8878716/what-is-the-difference-between-hardware-and-software-breakpoints
Hope it can help.