This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

cortex m7 frame pointer in prologue

Hello,

 I am trying to understand how the frame pointer works because I want to unwind the stack in a HardFault handler.. 

I am looking at a dissassembly that runs perfectly for an Atmel ATSAMV71Q21 Cortex M7. It was compiled with GCC in the AtmelStudio 7 IDE. To get the frame pointer, I compiled with -fno-omit-frame-pointer -mtpcs-frame -mtpcs-leaf-frame. It looks like that GCC used register r7 for the frame poitner thumb2 mode.

The function prologue has a push, a sub and an add. I like to confirm if the Cortex M7 superscalar 6 stage pipeline waits because of the dependency on SP at instruction at code address 0x00401dce from the push at 0x00401dcc? 

Why doesn't the frame pointer point to something more predictable like the previously pushed r7 frame pointer or the previous SP value before entering the function? 

int I2cHW::endTransmission(){
  401dcc:	b5b0      	push	{r4, r5, r7, lr}       /* SAVE REGISTERS. The stack moves down by 4*8 = 32 bytes. The frame is pointer is r7, the link register is LR*/
  401dce:	b086      	sub	sp, #24			/*allocate 24 bytes on stack. Lower stack pointer by 24 bytes.*/
  401dd0:	af04      	add	r7, sp, #16      /* frame pointer = stack pointer + 16. Why???*/
  
  
  
  .... BODY REMOVED ....
  
      if (isAckValid){
  401edc:	7a63      	ldrb	r3, [r4, #9]
  401ede:	b11b      	cbz	r3, 401ee8 <_ZN5I2cHW15endTransmissionEv+0x11c>
        return 0;
  401ee0:	2000      	movs	r0, #0
    }else{
        return 1;
    }
}
  401ee2:	3708      	adds	r7, #8   /* move frame pointer by 8*/
  401ee4:	46bd      	mov	sp, r7		/* stack poitner = frame pointer*/
  401ee6:	bdb0      	pop	{r4, r5, r7, pc}  /* restore registers. move stack up by 4*8= 32 bytes*/
        return 1;
  401ee8:	2001      	movs	r0, #1
  401eea:	e7fa      	b.n	401ee2 <_ZN5I2cHW15endTransmissionEv+0x116>
  401eec:	2044f31c 	.word	0x2044f31c
  401ef0:	00451e94 	.word	0x00451e94
  401ef4:	00451de4 	.word	0x00451de4
  401ef8:	00451fac 	.word	0x00451fac
  401efc:	00451e5c 	.word	0x00451e5c
  401f00:	0044e5ed 	.word	0x0044e5ed
  401f04:	00447a71 	.word	0x00447a71
  401f08:	00440f71 	.word	0x00440f71
  401f0c:	00451f24 	.word	0x00451f24
  401f10:	00451fc4 	.word	0x00451fc4
  401f14:	00405421 	.word	0x00405421
  401f18:	00452004 	.word	0x00452004
  401f1c:	00451fec 	.word	0x00451fec

  • When you say "unwind the stack in a HF handler", is this in order to infer just the function call stack leading to the fault, or do you want more than that, i.e. values of all local variables in all stack frames leading to the faullt?

    If the former, I have done something similar. I extract the stack in use, from the LR register. I then search in memory from that stack top to some upper stack address, perhaps top of ram. I skip the first 32 bytes since those were stacked by the fault handler entry itself (I'm working on Cortex M3). I then just compare each 32 bit word I see against what COULD be pushed LR values.  Such a value must be between known bounds on .text section start and end, and must have bit 0 set (i.e.is an odd value). I stop at some count of matches, say 8, since 8 is quite a deep call stack, assuming no recursion. Yes, I may get some false positives, but those are easily weeded out once I consult the .map file for the running binary.

    I have some code I can share if that helps.

  • Hello,

    Thanks for your response.

    > When you say "unwind the stack in a HF handler", is this in order to infer just the function call stack leading to the fault,

    That would be the most useful starting point. In nested function calls, like the example foo() calls bar() that calls scaleArray() and there's a fault in scaleArray(), I would like the handler to report at least that scaleArray() caused the fault and track back to bar(). The function bar() could have passed bad parameters. Other functions could have called scaleArray() without putting bad parameters. The real bug is in bar().

    >I have some code I can share if that helps.

    Your method sounds interesting. It is making me think about ideas. Could you share the code please? I understand most of what you are saying. I need a little help with just this part:

    >I extract the stack in use, from the LR register. 

    I give you the code I came up with so far. In a larger program with interrupts, my HardFault_c doesn't give the right location. At times, it is the instruction right after caller of the caller of function that crashed, others it is the function that crashed.

    I am still far from the objective of computing the list of nested calls.

    #include <atmel_start.h>
    
    
    uint32_t testHardFault(uint32_t a, uint32_t b){
    	uint32_t c ;
    	volatile uint32_t reg7;
    	c=a+b;
    	int32_t *p ;
    	reg7 = getR7();
    	p= (int32_t *) 0x0badcafe;
    	*p  = 0x0;
    	return c;
    	
    }
    
    int sub1(int a, int b){
    	volatile int buffer[5];
    	int c = testHardFault(a,b);
    	return c;
    	
    }
    
    int add1(int a, int b){
    	int c = a +b;
    	c = sub1(c,b);
    	return c;
    	
    }
    
    __attribute__((naked)) int getR7(void)
    {
    	asm volatile (
    	"push   {lr}\n\t"
    	"MOV R0, R7\n\t"
    	"pop    { pc}\n\t"
    	);
    }
    
    
    int main(void)
    {
    	/* Initializes MCU, drivers and middleware */
    	atmel_start_init();
    	volatile int r;
    	r= add1(10,20);
    	r= add1(30,40);
    
    	/* Replace with your application code */
    	while (1) {
    	}
    }
    
    
    
    
    typedef struct {
    	uint32_t r0;
    	uint32_t r1;
    	uint32_t r2;
    	uint32_t r3;
    	uint32_t r12;
    	uint32_t LR;
    	uint32_t PC;
    	uint32_t xPSR;
    		
    } StackContents_t;
    
    
      
    
    void HardFault_Handler_c ( int *pStackDump  ){
    	volatile unsigned long _MSP ;
    	 
    	/*MAIN STACK POINTER*/
    	_MSP =__get_MSP()  ; 
    	 /*register 3 has to be 0x0badcafe*/
    	
    	volatile StackContents_t *stackContents = (volatile StackContents_t *) pStackDump;    
    	
    	
        volatile unsigned long _CFSR ;
        volatile unsigned long _HFSR ;
        volatile unsigned long _DFSR ;
        volatile unsigned long _AFSR ;
        volatile unsigned long _BFAR ;
        volatile unsigned long _MMAR ;
    	/*unsigned long is uint32_t*/
    
    	
        // Configurable Fault Status Register
        // Consists of MMSR, BFSR and UFSR   ( think it is SCB->CFSR )
        _CFSR = (*((volatile unsigned long *)(0xE000ED28))) ;   
    
                                                                                       
        // Hard Fault Status Register (think it is SCB->HFSR)
        _HFSR = (*((volatile unsigned long *)(0xE000ED2C))) ;
    
        // Debug Fault Status Register
        _DFSR = (*((volatile unsigned long *)(0xE000ED30))) ;
    
        // Auxiliary Fault Status Register
        _AFSR = (*((volatile unsigned long *)(0xE000ED3C))) ;
    
        // Read the Fault Address Registers. These may not contain valid values.
        // Check BFARVALID/MMARVALID to see if they are valid values
        // MemManage Fault Address Register
        _MMAR = (*((volatile unsigned long *)(0xE000ED34))) ;
        // Bus Fault Address Register
        _BFAR = (*((volatile unsigned long *)(0xE000ED38))) ;
    
    	 
    	char hardFaultBuffer[160]; /*consumes stack*/
    	
    	
    
    		
    	snprintf(hardFaultBuffer, sizeof hardFaultBuffer, "MSP   0x%08lx  CFSR  0x%08lx  HFSR  0x%08lx\nDFSR  0x%08lx  AFSR  0x%08lx  MMAR  0x%08lx\nBFAR  0x%08lx\n",
    	_MSP,_CFSR,_HFSR,_DFSR,_AFSR,_MMAR,_BFAR  );
    
    	/*SOMETIMES TRUE: The LR is the location it crashed. The PC is the location the function returns to if it were successful.*/
    	snprintf(hardFaultBuffer, sizeof hardFaultBuffer, "R0    0x%08lx  R1    0x%08lx  R2    0x%08lx\nR3    0x%08lx  R12   0x%08lx  LR    0x%08lx\nPC    0x%08lx  xPSR  0x%08lx\n",
    	stackContents->r0,
    	stackContents->r1,
    	stackContents->r2,
    	stackContents->r3,
    	stackContents->r12,
    	stackContents->LR,
    	stackContents->PC,
    	stackContents->xPSR  );
    
    	snprintf(hardFaultBuffer, sizeof hardFaultBuffer, "SOMETIMES HardFault at  0x%08lx\n",
    	stackContents->LR );
    	
    	snprintf(hardFaultBuffer, sizeof hardFaultBuffer, "System handler and control state SHCSR 0x%08lx \nInterrupt Control and State ICSR 0x%08lx\n",
    	SCB->SHCSR,
    	SCB->ICSR );
    
    
    	while (1) {
    	}
    
    }
    
    /*
    __attribute__((naked)) int getR7(void)
    {
    	asm volatile (
    	"push   {lr}\n\t"
    	"MOV R0, R7\n\t"
    	"pop    { pc}\n\t"
    	);
    }
    */
    
    
    /*
    Put the value of the stack pointer as an argument to HardFault_Handler_c().
    The __attribute__((naked)) ensures that there is no assembly code generated at all besides what is written in the asm volatile.
    */
    __attribute__((naked)) void HardFault_Handler ( void ){
    	
    {
    	asm volatile (
    	
    	"MOV R0, SP\n\t"
    	"b HardFault_Handler_c\n\t"
    	
    	);
    }
    }

  • I'll push my code up to Github and post a link here.  Might take a day or so...

  • Hello @tobermory,

    I ended up doing a call stack unwind function mostly in the way you described it here.  I post my code. It doesn't look at R7 and doesn't care about stack frames.

    P.S. This thread seem to have been encountering some editing and deletions.For a while, only a portion of the replies were visible.


     
     __attribute__((naked)) void callStackUnwindIntoBuffer ( char *callstackunwindbuffer , int callstackunwindbufferLength){
        

         asm volatile (
        
     
         "MOV R2, SP\n\t"
         "b callStackUnwindIntoBuffer_c\n\t"
        
         );

     }
     
    void callStackUnwindIntoBuffer_c( char *callstackunwindbuffer , int callstackunwindbufferLength, uint32_t *pStack ){

        volatile uint32_t *locationOfLR;
        char *pDest = callstackunwindbuffer;
        int spaceLeft = callstackunwindbufferLength;
        uint32_t length;
        
        const uint32_t qtyCallStackLevels = 14;
        
        const uint32_t ignoredLevels = 0;
        /*locationOfLR =  (uint32_t *) __get_MSP();*/
        locationOfLR = pStack;
        
        char localBuffer[40];
        extern char _sstack, _estack;
        
        int i=0;
        
        while ( i<qtyCallStackLevels ){

            /*linear search for a valid LR addresses */
            
            while(
                    (
                        (( (*locationOfLR)  & 0xFFFF0000 )< 0x00400000)         
                        ||
                        (( (*locationOfLR)  & 0xFFFF0000 )> 0x004C0000)
                        /*|| ( (*locationOfLR) & 1 == 0)*/
                    )
                        && (locationOfLR < &_estack)
                ) {
                locationOfLR++;
            }
            
            if( (i>= ignoredLevels)  && (locationOfLR!=&_estack) ){
                snprintf(localBuffer, sizeof localBuffer, "%08lx: 0x%08lx\r\n",    locationOfLR, *locationOfLR    );
                length = strlen(localBuffer);
                if (length < spaceLeft){
                    snprintf(pDest, spaceLeft, "%s", localBuffer);
                    spaceLeft = spaceLeft - length;
                    pDest += length;
                }
                
            }
            i++;
            
            if ((locationOfLR>&_estack)){
                i = qtyCallStackLevels;
            }else {
                locationOfLR++;
            }
        }
        
                
        snprintf(localBuffer, sizeof localBuffer, "END\n",    locationOfLR, *locationOfLR    );
        length = strlen(localBuffer);
        if (length < spaceLeft){
            snprintf(pDest, spaceLeft, "%s", localBuffer);
            spaceLeft = spaceLeft - length;
            pDest += length;
        }
            
    }

  • For completeness, here is my code that addresses fault dumps and inferred call stacks:

    https://github.com/tobermory/faultHandling-cortex-m.git

  • Thanks. I had problems logging in, so this is the soonest I could respond.

  • If you compare your code above with mine (see the Github link), you'll see that you do the fault dump data formatting in the fault handler, i.e. after the fault has already occurred.  I do it ahead of time, and use minimal code to fill in the register value 'holes'. I bypass sprintf entirely, preferring to hex format values by hand.  I was nervous of calling into arbitrary C library routines once a fault had happened.  I think the chance of a lockup (fault in fault handler) increases. On my board, a lockup defaults to a reset, and the fault capture would be lost entirely.

  • Hello tobermory,

    >I bypass sprintf entirely, preferring to hex format values by hand. 

    The sprintf is a function I should avoid for embedded. It is not MISRA compliant. Also, sprintf is huge. I hade a coworker that used an embedded printf. It was corrupting memory because the implementation had a specific static length of buffer. He needed more than length than that. He spent a lot of  time searching what went wrong.

    For the concept of a fault handler, the sprintf is even less ideal.

    Look at the first hit for a google search for 'Small printf source code'. I did not try. It could be interesting.

    >I think the chance of a lockup (fault in fault handler) increases.

    Agreed.