Cortex M4 hard fault finding root cause on LPC4078 pc=0x0

d-r over 5 years ago

Hi everyone,

I'm getting a hard fault at my LPC4078 on LPCXpresso and would be very glad if you could help me finding the root cause.

The µC runs with freeRtos 8.2.2 but I'm not sure if the hard fault has ever anything to do with it.

When the hard fault occurs it hangs on this position:

The register values are:

r0 volatile uint32_t 0x1 (Hex)
r1 volatile uint32_t 0x300 (Hex)
r2 volatile uint32_t 0x0 (Hex)
r3 volatile uint32_t 0x10008a90 (Hex)
r12 volatile uint32_t 0x0 (Hex)
lr volatile uint32_t 0x12f89 (Hex)
pc volatile uint32_t 0x0 (Hex)
psr volatile uint32_t 0x0 (Hex)
SCB SCB_Type * 0xe000ed00
CPUID const volatile uint32_t 0x410fc241 (Hex)
ICSR volatile uint32_t 0x429803 (Hex)
VTOR volatile uint32_t 0x8000 (Hex)
AIRCR volatile uint32_t 0xfa050000 (Hex)
SCR volatile uint32_t 0x0 (Hex)
CCR volatile uint32_t 0x200 (Hex)
SHP volatile uint8_t [12] 0xe000ed18 (Hex)
SHCSR volatile uint32_t 0x0 (Hex)
CFSR volatile uint32_t 0x20000 (Hex)
HFSR volatile uint32_t 0x40000000 (Hex)
DFSR volatile uint32_t 0x0 (Hex)
MMFAR volatile uint32_t 0xe000edf8 (Hex)
BFAR volatile uint32_t 0xe000edf8 (Hex)
AFSR volatile uint32_t 0x0 (Hex)
PFR const volatile uint32_t [2] 0xe000ed40 (Hex)
PFR[0] const volatile uint32_t 48
PFR[1] const volatile uint32_t 512
DFR const volatile uint32_t 0x100000 (Hex)
ADR const volatile uint32_t 0x0 (Hex)
MMFR const volatile uint32_t [4] 0xe000ed50 (Hex)
MMFR[0] const volatile uint32_t 1048624
MMFR[1] const volatile uint32_t 0
MMFR[2] const volatile uint32_t 16777216
MMFR[3] const volatile uint32_t 0
ISAR const volatile uint32_t [5] 0xe000ed60 (Hex)
ISAR[0] const volatile uint32_t 17830160
ISAR[1] const volatile uint32_t 34676736
ISAR[2] const volatile uint32_t 555950641
ISAR[3] const volatile uint32_t 17895729
ISAR[4] const volatile uint32_t 19988786
RESERVED0 uint32_t [5] 0xe000ed74 (Hex)
RESERVED0[0] uint32_t 0
RESERVED0[1] uint32_t 0
RESERVED0[2] uint32_t 0
RESERVED0[3] uint32_t 0
RESERVED0[4] uint32_t 0
CPACR volatile uint32_t 0xf00000 (Hex)

Unfortunately pc is 0x0. That helped me a lot at similar hard fault failures.

How would you proceed finding the cause? Are there any information missing or should I check any other values?

I already searched in Google but until now I didn't find anything useful or it seemed to be too complex.

I'm looking forward hearing from you for any hints or tips.

Best regards,

Daniel

Top replies

Parents

0 tobermory over 5 years ago

I have seen errors like this, pc being set to 0, so can offer some insight. It may be way off base. I haven't used your CPU, nor an M4 at all, but have used M3 and the two are similar enough for what I'm about to describe. Also, I use the GNU toolchain + make (no IDE) but your problem is obviously a runtime one so the build process isn't so relevant.

Let's look at the facts:

pc 0

lr 12f89

psr 0

hfsr 40000000

shcsr 0

This tells us: fault was escalated to Hard Fault from a lesser fault, since hfsr[30] is set. pc being 0 is a usage fault, M4 can't go to the ARM mode, only Thumb, and Thumb mode always has pc[0] = 1. So, a usage fault has been escalated to Hard Fault. SHCSR[18] being clear confirms Usage Fault handler not enabled at time of fault.

psr[8:0] being zero tells us we weren't in any system exception (SVC,PendSV,Systick) or interrupt handler at time of fault. You say you are running with an RTOS, so I would guess you were in thread mode and using Process stack at time of fault.

OK, here's my theory...

Some function A includes a call to some other function B, at 0x12f84:

12f84: bl B

12f88: next thing in A

Why do I think this? Because your lr is 12f89, and that is correct lr value for B to return to A at the instruction after A's call to B (the instruction is at 12f88 but to jump to it, PC[0] must be 1).

If you have a listing file (I would do an OBJDUMP on my .axf file to produce a .lst file), you can look for 12f84 and that will tell you both A and B.

So, A has called B. The standard function prolog is

push r7, lr

A function does this so that it preserves the caller's r7, lr in order to use them itself. r7 (aka fp) is the frame pointer used to refer to the function's own local variables. lr needs saving if B wants to make further calls, i.e. B calls C. r7 will be pushed first, lr second.

As well as the prolog above, a function ends with an epilog that re-instates its caller' s r7 and lr and of course returning to it. This is

pop r7, lr

bx lr

or, the shorter equivalent

pop r7, pc

Now, let's imagine B looks something like

B() {

int X[2];

X[2] = 0;

}

B's prolog saved A's lr one slot on the stack ABOVE x[1]. Of course only X[0] and X[1] are valid, but our code overran the array bounds and did X[2] = 0.

This has trashed the slot on the stack that will be popped into pc by the epilog. The code above will indeed set pc to 0, and would fault. I think that your compiled epilog would have been the

pop r7, pc

variant, since if it had been the

pop r7, lr

bx lr

variant then you would have had lr = pc = 0 at time of fault, and your lr was not 0.

A tell-tale sign of this kind of error is to examine r7 too. Your dump didn't include it, but if r7 and pc are related, that's a clue. If B had also done

X[3] = 1;

we'd see r7 = 1 at time of fault, since B's epilog would pop the 1 into r7 and the 0 into pc (or into lr which is then xferred to pc via bx lr).

Note here that this is not a stack overflow, you haven't run too far DOWN in memory. It's actually the opposite, you've written HIGHER in memory than your function's own local variable space.

I see that you also mention Bus Faults. If instead of

X[2] = 0

the code was

X[2] = BIG

then you have loaded BIG into PC and BIG may not be present in the address space, so the processor can't go fetch the instruction there, and if I recall, that is a Bus Fault.

I learned all of the above from Yiu's amazing Def Guide to M3/M4, 3rd ed, oh and of course by solving my own pc=0 situations!
Cancel
Vote up +1 Vote down

Reply

Accept answer

Cancel

Reply

0 tobermory over 5 years ago

I have seen errors like this, pc being set to 0, so can offer some insight. It may be way off base. I haven't used your CPU, nor an M4 at all, but have used M3 and the two are similar enough for what I'm about to describe. Also, I use the GNU toolchain + make (no IDE) but your problem is obviously a runtime one so the build process isn't so relevant.

Let's look at the facts:

pc 0

lr 12f89

psr 0

hfsr 40000000

shcsr 0

This tells us: fault was escalated to Hard Fault from a lesser fault, since hfsr[30] is set. pc being 0 is a usage fault, M4 can't go to the ARM mode, only Thumb, and Thumb mode always has pc[0] = 1. So, a usage fault has been escalated to Hard Fault. SHCSR[18] being clear confirms Usage Fault handler not enabled at time of fault.

psr[8:0] being zero tells us we weren't in any system exception (SVC,PendSV,Systick) or interrupt handler at time of fault. You say you are running with an RTOS, so I would guess you were in thread mode and using Process stack at time of fault.

OK, here's my theory...

Some function A includes a call to some other function B, at 0x12f84:

12f84: bl B

12f88: next thing in A

Why do I think this? Because your lr is 12f89, and that is correct lr value for B to return to A at the instruction after A's call to B (the instruction is at 12f88 but to jump to it, PC[0] must be 1).

If you have a listing file (I would do an OBJDUMP on my .axf file to produce a .lst file), you can look for 12f84 and that will tell you both A and B.

So, A has called B. The standard function prolog is

push r7, lr

A function does this so that it preserves the caller's r7, lr in order to use them itself. r7 (aka fp) is the frame pointer used to refer to the function's own local variables. lr needs saving if B wants to make further calls, i.e. B calls C. r7 will be pushed first, lr second.

As well as the prolog above, a function ends with an epilog that re-instates its caller' s r7 and lr and of course returning to it. This is

pop r7, lr

bx lr

or, the shorter equivalent

pop r7, pc

Now, let's imagine B looks something like

B() {

int X[2];

X[2] = 0;

}

B's prolog saved A's lr one slot on the stack ABOVE x[1]. Of course only X[0] and X[1] are valid, but our code overran the array bounds and did X[2] = 0.

This has trashed the slot on the stack that will be popped into pc by the epilog. The code above will indeed set pc to 0, and would fault. I think that your compiled epilog would have been the

pop r7, pc

variant, since if it had been the

pop r7, lr

bx lr

variant then you would have had lr = pc = 0 at time of fault, and your lr was not 0.

A tell-tale sign of this kind of error is to examine r7 too. Your dump didn't include it, but if r7 and pc are related, that's a clue. If B had also done

X[3] = 1;

we'd see r7 = 1 at time of fault, since B's epilog would pop the 1 into r7 and the 0 into pc (or into lr which is then xferred to pc via bx lr).

Note here that this is not a stack overflow, you haven't run too far DOWN in memory. It's actually the opposite, you've written HIGHER in memory than your function's own local variable space.

I see that you also mention Bus Faults. If instead of

X[2] = 0

the code was

X[2] = BIG

then you have loaded BIG into PC and BIG may not be present in the address space, so the processor can't go fetch the instruction there, and if I recall, that is a Bus Fault.

I learned all of the above from Yiu's amazing Def Guide to M3/M4, 3rd ed, oh and of course by solving my own pc=0 situations!
Cancel
Vote up +1 Vote down

Reply

Accept answer

Cancel

Children

0 d-r over 5 years ago in reply to tobermory

Wow, what a lot of information tobermory

I didn't catch all of it and since I solved the issue in my device I'm currently not working on this topic. But that may help on similar issues and for experience on this kind of failure. Thank you!
Cancel
Vote up 0 Vote down

Reply

Accept answer

Cancel
0 tobermory over 5 years ago in reply to d-r
__attribute__((naked)) void FaultHandler(void) { __asm__( "TST LR, #4 \n" "ITE EQ \n" "MRSEQ r1, MSP \n" "MRSNE r1, PSP \n" "MOV r2, LR \n" "MOV r0, r7 \n" "B FaultHandler_C \n" ); static void FaultHandler_C( uint32_t r7, uint32_t* stack, uint32_t excRet ) { ... };

You are welcome. I've included my own Fault Handler impl, that grabs r7 as I suggested, and also r14, which holds excReturn at time of fault.

And to follow up on the bit about the prolog and epilog, the prolog pushes TO the stack FROM r7, lr. The epilog expects to pop those SAME values FROM the stack TO r7, lr. The array out-of-bounds write shows how surprisingly easy it is to destroy that sequence of stack operations.
Cancel
Vote up 0 Vote down

Reply

Accept answer

Cancel