How to debug cause of Hard Fault on CortexM0?

Dear everyone,

I am developing an MCU system by using the DesignStart Pro, CortexM0 version. I program the DesignStart Pro RTL into Altera FPGA, and I use Keil C for software building.

The system has been run properly so far. But when I changed a small logic in RTL of a peripheral module (called A) in the system, a HardFault Error occurs.

I use ULINK2 debugger to debug the problem, and I see the phenomenon as below.

  1. When I start to enter to debug session by pressing “Start/Stop Debug session” button of Keil C, the T bit is clear to 0, and the PC(R15) points to 0x20000968. Please see Figure 1 in attachment.
  2. In debug session, at the first pressing of “Step Over” button, the program enters to HardFault handler.

In “The Definitive Guide to ARM® Cortex®-M0 and Cortex-M0+ Processors” (Second Edition) of Joseph Jiu, I already read the following information.

  1. If T bit is cleared, a HardFault exception will be generated in the next instruction execution.
  2. In “Fault Handling” section, a common program errors that cause HardFault due to accidentally switching to ARM state. In this mention, switching to ARM state is caused by software errors only.

iii. In “Trouble Shooting” section, there are also 2 cases for troubleshooting the problems that I think it has relation with my case: “Program does not run/start” and “Program started, but enter HardFault”.

My questions are as follows.

  1. a) As information in the phenomenon 1., what is the cause to make T bit switching to 0 (software error or hardware error caused)?

For hardware, I check my changes RTL logic of peripheral module (called A), this does not make any error during hardware compile or send a bus error to CPU. As my debug information, when I start to enter debug session, the T bit is cleared right away. There is no chance for any peripheral modules in MCU system running.

For the software, I used the same software which can run well in case I did not modify the logic of hardware.

From above information, I guess the hardware error cause the hardfault. But I can not know which condition of hardware error can clear the T bit to 0 and cause the hardfault.

  1. b) As information in the phenomenon 1., the PC points to 0x20000968. In Cortex-M0 memory model, this area is SRAM area (0x20000000-0x3FFFFFFF), it is not the code area. Normally, when the first execution, PC must point to Code area (0x00000000-0x1FFFFFFF).

Is the T bit cleared to 0 caused the PC point to SRAM area?

  1. c) Can I know the root cause of hardfault from descriptions of iii? From iii, I think my hardfault related to ‘Program started, but enter HardFault”, and the PC is pointing to a valid location (SRAM). But I cannot extract any useful information of T bit clear to 0 by following description.

“. If the T bit in the stacked xPSR is 0 and the stacked PC is pointing to the beginning of an ISR, check the vector table (all LSB of exception vectors should be set to 1).

. If the stacked IPSR (inside xPSR) is indicating an ISR is running, and the stacked PC is not inside the address range of the ISR code, then you likely to have a stack corruption in that ISR. Look out for data array accesses with unbounded index.”

I am sorry for too long question. It takes me several weeks to find out the cause of my problem, but I have not solved my problem yet. Also, I am a newbie of ARM usage, so I do not have enough experience for debug hardfault.

I appreciate for any support from you.

Parents
  • Hi Jack,

    I suggest you start some investigations with a working system, then you can become a bit more familiar with the boot process and stepping through code. Particularly, watch the stack pointer and the data in the stack since the most likely cause of a problem like the one you describe is stack corruption.

    It is perfectly possible to execute code from the SRAM region. The T bit does need to be set, and there needs to be valid code (neither of which are true in your example).

    Check the vector table. The value at address 0x00000004 is the reset vector. I could make a guess that it looks a little bit like this:

    invalid vector table

    The debugger will guess that the entry point for the image is 0x20000968, in ARM state, so sets a breakpoint there for you. This is not valid (the address contains un-initialised RAM), and the T-bit is not set either.

    Below is an example of how the vector table should look:

    Valid vector table

    The first address is used to initialise the stack pointer, the next contains the reset vector (0x00000190, Thumb state).

Reply
  • Hi Jack,

    I suggest you start some investigations with a working system, then you can become a bit more familiar with the boot process and stepping through code. Particularly, watch the stack pointer and the data in the stack since the most likely cause of a problem like the one you describe is stack corruption.

    It is perfectly possible to execute code from the SRAM region. The T bit does need to be set, and there needs to be valid code (neither of which are true in your example).

    Check the vector table. The value at address 0x00000004 is the reset vector. I could make a guess that it looks a little bit like this:

    invalid vector table

    The debugger will guess that the entry point for the image is 0x20000968, in ARM state, so sets a breakpoint there for you. This is not valid (the address contains un-initialised RAM), and the T-bit is not set either.

    Below is an example of how the vector table should look:

    Valid vector table

    The first address is used to initialise the stack pointer, the next contains the reset vector (0x00000190, Thumb state).

Children