Hello,
I am trying to understand how the frame pointer works because I want to unwind the stack in a HardFault handler..
I am looking at a dissassembly that runs perfectly for an Atmel ATSAMV71Q21 Cortex M7. It was compiled with GCC in the AtmelStudio 7 IDE. To get the frame pointer, I compiled with -fno-omit-frame-pointer -mtpcs-frame -mtpcs-leaf-frame. It looks like that GCC used register r7 for the frame poitner thumb2 mode.
The function prologue has a push, a sub and an add. I like to confirm if the Cortex M7 superscalar 6 stage pipeline waits because of the dependency on SP at instruction at code address 0x00401dce from the push at 0x00401dcc?
Why doesn't the frame pointer point to something more predictable like the previously pushed r7 frame pointer or the previous SP value before entering the function?
int I2cHW::endTransmission(){ 401dcc: b5b0 push {r4, r5, r7, lr} /* SAVE REGISTERS. The stack moves down by 4*8 = 32 bytes. The frame is pointer is r7, the link register is LR*/ 401dce: b086 sub sp, #24 /*allocate 24 bytes on stack. Lower stack pointer by 24 bytes.*/ 401dd0: af04 add r7, sp, #16 /* frame pointer = stack pointer + 16. Why???*/ .... BODY REMOVED .... if (isAckValid){ 401edc: 7a63 ldrb r3, [r4, #9] 401ede: b11b cbz r3, 401ee8 <_ZN5I2cHW15endTransmissionEv+0x11c> return 0; 401ee0: 2000 movs r0, #0 }else{ return 1; } } 401ee2: 3708 adds r7, #8 /* move frame pointer by 8*/ 401ee4: 46bd mov sp, r7 /* stack poitner = frame pointer*/ 401ee6: bdb0 pop {r4, r5, r7, pc} /* restore registers. move stack up by 4*8= 32 bytes*/ return 1; 401ee8: 2001 movs r0, #1 401eea: e7fa b.n 401ee2 <_ZN5I2cHW15endTransmissionEv+0x116> 401eec: 2044f31c .word 0x2044f31c 401ef0: 00451e94 .word 0x00451e94 401ef4: 00451de4 .word 0x00451de4 401ef8: 00451fac .word 0x00451fac 401efc: 00451e5c .word 0x00451e5c 401f00: 0044e5ed .word 0x0044e5ed 401f04: 00447a71 .word 0x00447a71 401f08: 00440f71 .word 0x00440f71 401f0c: 00451f24 .word 0x00451f24 401f10: 00451fc4 .word 0x00451fc4 401f14: 00405421 .word 0x00405421 401f18: 00452004 .word 0x00452004 401f1c: 00451fec .word 0x00451fec
When you say "unwind the stack in a HF handler", is this in order to infer just the function call stack leading to the fault, or do you want more than that, i.e. values of all local variables in all stack frames leading to the faullt?
If the former, I have done something similar. I extract the stack in use, from the LR register. I then search in memory from that stack top to some upper stack address, perhaps top of ram. I skip the first 32 bytes since those were stacked by the fault handler entry itself (I'm working on Cortex M3). I then just compare each 32 bit word I see against what COULD be pushed LR values. Such a value must be between known bounds on .text section start and end, and must have bit 0 set (i.e.is an odd value). I stop at some count of matches, say 8, since 8 is quite a deep call stack, assuming no recursion. Yes, I may get some false positives, but those are easily weeded out once I consult the .map file for the running binary.
I have some code I can share if that helps.
Thanks for your response.
> When you say "unwind the stack in a HF handler", is this in order to infer just the function call stack leading to the fault,
That would be the most useful starting point. In nested function calls, like the example foo() calls bar() that calls scaleArray() and there's a fault in scaleArray(), I would like the handler to report at least that scaleArray() caused the fault and track back to bar(). The function bar() could have passed bad parameters. Other functions could have called scaleArray() without putting bad parameters. The real bug is in bar().
>I have some code I can share if that helps.
Your method sounds interesting. It is making me think about ideas. Could you share the code please? I understand most of what you are saying. I need a little help with just this part:
>I extract the stack in use, from the LR register.
I give you the code I came up with so far. In a larger program with interrupts, my HardFault_c doesn't give the right location. At times, it is the instruction right after caller of the caller of function that crashed, others it is the function that crashed.
I am still far from the objective of computing the list of nested calls.
#include <atmel_start.h> uint32_t testHardFault(uint32_t a, uint32_t b){ uint32_t c ; volatile uint32_t reg7; c=a+b; int32_t *p ; reg7 = getR7(); p= (int32_t *) 0x0badcafe; *p = 0x0; return c; } int sub1(int a, int b){ volatile int buffer[5]; int c = testHardFault(a,b); return c; } int add1(int a, int b){ int c = a +b; c = sub1(c,b); return c; } __attribute__((naked)) int getR7(void) { asm volatile ( "push {lr}\n\t" "MOV R0, R7\n\t" "pop { pc}\n\t" ); } int main(void) { /* Initializes MCU, drivers and middleware */ atmel_start_init(); volatile int r; r= add1(10,20); r= add1(30,40); /* Replace with your application code */ while (1) { } } typedef struct { uint32_t r0; uint32_t r1; uint32_t r2; uint32_t r3; uint32_t r12; uint32_t LR; uint32_t PC; uint32_t xPSR; } StackContents_t; void HardFault_Handler_c ( int *pStackDump ){ volatile unsigned long _MSP ; /*MAIN STACK POINTER*/ _MSP =__get_MSP() ; /*register 3 has to be 0x0badcafe*/ volatile StackContents_t *stackContents = (volatile StackContents_t *) pStackDump; volatile unsigned long _CFSR ; volatile unsigned long _HFSR ; volatile unsigned long _DFSR ; volatile unsigned long _AFSR ; volatile unsigned long _BFAR ; volatile unsigned long _MMAR ; /*unsigned long is uint32_t*/ // Configurable Fault Status Register // Consists of MMSR, BFSR and UFSR ( think it is SCB->CFSR ) _CFSR = (*((volatile unsigned long *)(0xE000ED28))) ; // Hard Fault Status Register (think it is SCB->HFSR) _HFSR = (*((volatile unsigned long *)(0xE000ED2C))) ; // Debug Fault Status Register _DFSR = (*((volatile unsigned long *)(0xE000ED30))) ; // Auxiliary Fault Status Register _AFSR = (*((volatile unsigned long *)(0xE000ED3C))) ; // Read the Fault Address Registers. These may not contain valid values. // Check BFARVALID/MMARVALID to see if they are valid values // MemManage Fault Address Register _MMAR = (*((volatile unsigned long *)(0xE000ED34))) ; // Bus Fault Address Register _BFAR = (*((volatile unsigned long *)(0xE000ED38))) ; char hardFaultBuffer[160]; /*consumes stack*/ snprintf(hardFaultBuffer, sizeof hardFaultBuffer, "MSP 0x%08lx CFSR 0x%08lx HFSR 0x%08lx\nDFSR 0x%08lx AFSR 0x%08lx MMAR 0x%08lx\nBFAR 0x%08lx\n", _MSP,_CFSR,_HFSR,_DFSR,_AFSR,_MMAR,_BFAR ); /*SOMETIMES TRUE: The LR is the location it crashed. The PC is the location the function returns to if it were successful.*/ snprintf(hardFaultBuffer, sizeof hardFaultBuffer, "R0 0x%08lx R1 0x%08lx R2 0x%08lx\nR3 0x%08lx R12 0x%08lx LR 0x%08lx\nPC 0x%08lx xPSR 0x%08lx\n", stackContents->r0, stackContents->r1, stackContents->r2, stackContents->r3, stackContents->r12, stackContents->LR, stackContents->PC, stackContents->xPSR ); snprintf(hardFaultBuffer, sizeof hardFaultBuffer, "SOMETIMES HardFault at 0x%08lx\n", stackContents->LR ); snprintf(hardFaultBuffer, sizeof hardFaultBuffer, "System handler and control state SHCSR 0x%08lx \nInterrupt Control and State ICSR 0x%08lx\n", SCB->SHCSR, SCB->ICSR ); while (1) { } } /* __attribute__((naked)) int getR7(void) { asm volatile ( "push {lr}\n\t" "MOV R0, R7\n\t" "pop { pc}\n\t" ); } */ /* Put the value of the stack pointer as an argument to HardFault_Handler_c(). The __attribute__((naked)) ensures that there is no assembly code generated at all besides what is written in the asm volatile. */ __attribute__((naked)) void HardFault_Handler ( void ){ { asm volatile ( "MOV R0, SP\n\t" "b HardFault_Handler_c\n\t" ); } }
I'll push my code up to Github and post a link here. Might take a day or so...
Cool, thanks!
Hello @tobermory,
I ended up doing a call stack unwind function mostly in the way you described it here. I post my code. It doesn't look at R7 and doesn't care about stack frames.
P.S. This thread seem to have been encountering some editing and deletions.For a while, only a portion of the replies were visible.
__attribute__((naked)) void callStackUnwindIntoBuffer ( char *callstackunwindbuffer , int callstackunwindbufferLength){ asm volatile ( "MOV R2, SP\n\t" "b callStackUnwindIntoBuffer_c\n\t" ); } void callStackUnwindIntoBuffer_c( char *callstackunwindbuffer , int callstackunwindbufferLength, uint32_t *pStack ){ volatile uint32_t *locationOfLR; char *pDest = callstackunwindbuffer; int spaceLeft = callstackunwindbufferLength; uint32_t length; const uint32_t qtyCallStackLevels = 14; const uint32_t ignoredLevels = 0; /*locationOfLR = (uint32_t *) __get_MSP();*/ locationOfLR = pStack; char localBuffer[40]; extern char _sstack, _estack; int i=0; while ( i<qtyCallStackLevels ){ /*linear search for a valid LR addresses */ while( ( (( (*locationOfLR) & 0xFFFF0000 )< 0x00400000) || (( (*locationOfLR) & 0xFFFF0000 )> 0x004C0000) /*|| ( (*locationOfLR) & 1 == 0)*/ ) && (locationOfLR < &_estack) ) { locationOfLR++; } if( (i>= ignoredLevels) && (locationOfLR!=&_estack) ){ snprintf(localBuffer, sizeof localBuffer, "%08lx: 0x%08lx\r\n", locationOfLR, *locationOfLR ); length = strlen(localBuffer); if (length < spaceLeft){ snprintf(pDest, spaceLeft, "%s", localBuffer); spaceLeft = spaceLeft - length; pDest += length; } } i++; if ((locationOfLR>&_estack)){ i = qtyCallStackLevels; }else { locationOfLR++; } } snprintf(localBuffer, sizeof localBuffer, "END\n", locationOfLR, *locationOfLR ); length = strlen(localBuffer); if (length < spaceLeft){ snprintf(pDest, spaceLeft, "%s", localBuffer); spaceLeft = spaceLeft - length; pDest += length; } }
For completeness, here is my code that addresses fault dumps and inferred call stacks:
https://github.com/tobermory/faultHandling-cortex-m.git
Thanks. I had problems logging in, so this is the soonest I could respond.
If you compare your code above with mine (see the Github link), you'll see that you do the fault dump data formatting in the fault handler, i.e. after the fault has already occurred. I do it ahead of time, and use minimal code to fill in the register value 'holes'. I bypass sprintf entirely, preferring to hex format values by hand. I was nervous of calling into arbitrary C library routines once a fault had happened. I think the chance of a lockup (fault in fault handler) increases. On my board, a lockup defaults to a reset, and the fault capture would be lost entirely.
Hello tobermory,
>I bypass sprintf entirely, preferring to hex format values by hand.
The sprintf is a function I should avoid for embedded. It is not MISRA compliant. Also, sprintf is huge. I hade a coworker that used an embedded printf. It was corrupting memory because the implementation had a specific static length of buffer. He needed more than length than that. He spent a lot of time searching what went wrong.
For the concept of a fault handler, the sprintf is even less ideal.
Look at the first hit for a google search for 'Small printf source code'. I did not try. It could be interesting.
>I think the chance of a lockup (fault in fault handler) increases.
Agreed.