Cortex M4 hard fault finding root cause on LPC4078 pc=0x0

Hi everyone,

I'm getting a hard fault at my LPC4078 on LPCXpresso and would be very glad if you could help me finding the root cause.

The µC runs with freeRtos 8.2.2 but I'm not sure if the hard fault has ever anything to do with it.

When the hard fault occurs it hangs on this position:

The register values are:

r0 volatile uint32_t 0x1 (Hex)
r1 volatile uint32_t 0x300 (Hex)
r2 volatile uint32_t 0x0 (Hex)
r3 volatile uint32_t 0x10008a90 (Hex)
r12 volatile uint32_t 0x0 (Hex)
lr volatile uint32_t 0x12f89 (Hex)
pc volatile uint32_t 0x0 (Hex)
psr volatile uint32_t 0x0 (Hex)
SCB SCB_Type * 0xe000ed00
CPUID const volatile uint32_t 0x410fc241 (Hex)
ICSR volatile uint32_t 0x429803 (Hex)
VTOR volatile uint32_t 0x8000 (Hex)
AIRCR volatile uint32_t 0xfa050000 (Hex)
SCR volatile uint32_t 0x0 (Hex)
CCR volatile uint32_t 0x200 (Hex)
SHP volatile uint8_t [12] 0xe000ed18 (Hex)
SHCSR volatile uint32_t 0x0 (Hex)
CFSR volatile uint32_t 0x20000 (Hex)
HFSR volatile uint32_t 0x40000000 (Hex)
DFSR volatile uint32_t 0x0 (Hex)
MMFAR volatile uint32_t 0xe000edf8 (Hex)
BFAR volatile uint32_t 0xe000edf8 (Hex)
AFSR volatile uint32_t 0x0 (Hex)
PFR const volatile uint32_t [2] 0xe000ed40 (Hex)
PFR[0] const volatile uint32_t 48
PFR[1] const volatile uint32_t 512
DFR const volatile uint32_t 0x100000 (Hex)
ADR const volatile uint32_t 0x0 (Hex)
MMFR const volatile uint32_t [4] 0xe000ed50 (Hex)
MMFR[0] const volatile uint32_t 1048624
MMFR[1] const volatile uint32_t 0
MMFR[2] const volatile uint32_t 16777216
MMFR[3] const volatile uint32_t 0
ISAR const volatile uint32_t [5] 0xe000ed60 (Hex)
ISAR[0] const volatile uint32_t 17830160
ISAR[1] const volatile uint32_t 34676736
ISAR[2] const volatile uint32_t 555950641
ISAR[3] const volatile uint32_t 17895729
ISAR[4] const volatile uint32_t 19988786
RESERVED0 uint32_t [5] 0xe000ed74 (Hex)
RESERVED0[0] uint32_t 0
RESERVED0[1] uint32_t 0
RESERVED0[2] uint32_t 0
RESERVED0[3] uint32_t 0
RESERVED0[4] uint32_t 0
CPACR volatile uint32_t 0xf00000 (Hex)

Unfortunately pc is 0x0. That helped me a lot at similar hard fault failures.

How would you proceed finding the cause? Are there any information missing or should I check any other values?

I already searched in Google but until now I didn't find anything useful or it seemed to be too complex.

I'm looking forward hearing from you for any hints or tips.

Best regards,

Daniel

Parents
  • Ah, Sorry! I misread your memory view. I though 0x13421 was the stacked PC. (I need new glasses!)

    There is a possibilty that the DMA handler caused a stack overflow and corrupted a task stack. This ends up the stacked PC in the exception stack frame of the task become 0. Because the task is not running, this doesn't trigger the fault immediately. But a bit later, FreeRTOS context switch (PendSV) into the thread that has the corrupted stack and crash. So please check the size of your main stack (which is used by the exception handlers).

    Another thing to check : make sure the task stacks are double word aligned. (Although in this case it might not be the cause of the problem.)

    regards,
    Joseph

    EDITED: There could be other possible reasons for the task stack corruption. e.g. Incorrect DMA operations or some other bugs in the DMA handler that cause a stack corruption.

    Alternatively, the application task that was crash has a stack overflow which end up the stack grow into the main stack. The ISR service using the same stack region corrupt the stack frame inside and end up crashing the task when it is resumed. FreeRTOS do have some stack checking feature which can help detect such issue:

    www.freertos.org/Stacks-and-stack-overflow-checking.html

Reply
  • Ah, Sorry! I misread your memory view. I though 0x13421 was the stacked PC. (I need new glasses!)

    There is a possibilty that the DMA handler caused a stack overflow and corrupted a task stack. This ends up the stacked PC in the exception stack frame of the task become 0. Because the task is not running, this doesn't trigger the fault immediately. But a bit later, FreeRTOS context switch (PendSV) into the thread that has the corrupted stack and crash. So please check the size of your main stack (which is used by the exception handlers).

    Another thing to check : make sure the task stacks are double word aligned. (Although in this case it might not be the cause of the problem.)

    regards,
    Joseph

    EDITED: There could be other possible reasons for the task stack corruption. e.g. Incorrect DMA operations or some other bugs in the DMA handler that cause a stack corruption.

    Alternatively, the application task that was crash has a stack overflow which end up the stack grow into the main stack. The ISR service using the same stack region corrupt the stack frame inside and end up crashing the task when it is resumed. FreeRTOS do have some stack checking feature which can help detect such issue:

    www.freertos.org/Stacks-and-stack-overflow-checking.html

Children
  • Now I found a software bug but unfortunately don't understand what causes the corrupted task stack frame.

    In rarely cases when the BusFault instead of UsageFault - error occured I recognized a unique 32bit - value at stacked PC position. Searched the whole RAM area for this value and found it in addition to that PC position at another RAM address. Map-File shows an array around that position. I examined that array and found out that the index counter of that array is in error case too high and exceeds the bounds of the array. A classic programming bug (luckily it's not implemented by me :-))

    When I increase the mentioned array the error definitely doesn't occure anymore.

    I think it's not necessary to understand the "voodoo magic" what happens after there's a write exceeding the array bounds. I see some variables after the array are set to invalid values and there's a function call, where values of the array are passed. But that doesn't explain, why passed values are stored at "stacked PC" position and not behind where local variables etc are stored in the frame? Step over in debugger isn't enough, I've got to start "running" until it happens after that exceeding write access.

    Are there some possibilities to avoid array out-of-bounds write accesses? Of course you can implement if-conditions to trap it or you've got to comply MISRA-C rules. But isn't there a MPU in LPC4078 which identifies such invalid write accesses?

     Thank you for all the support. Fortunately I don't need to examine more of the ISR's, DMA handler etc. It's good to know to get support in this forum if I get similar errors in future.

  • Glad to know that you have made good progress.

    I guess the out-of-bound array write have corrupted some task stack - the saved PC value of a task, which is in the exception stack frame, are located inside task's stack.

    During context switching, the processor switched from a task running in Thread mode to an OS exception (e.g. SysTick), and the current PC in the task is saved in the exception stack frame in the task's stack. The OS code then use PendSV code to context switch, and switch to another task using exception return. If a task stack is corrupted, and later when the OS context switch into it, the return address in the exception stack frame is invalid (0x0 in your case) and therefore it entered hardFault.

    FreeRTOS can utilize the MPU to help detect this kind of issue:

    www.freertos.org/FreeRTOS-MPU-memory-protection-unit.html

    However, many FreeRTOS projects doesn't enable the MPU. Enabling the MPU require you to define the memory regions for each tasks (and share data variable) and the MPU region alignment requirements in Armv7-M make it a bit more complicated.  This is much easier to do in Armv8-M processors (e.g. Cortex-M23, Cortex-M33).

    regards,

    Joseph