Dear Experts,
I'm working on a freertos project which is running at Cortex-M4 and I'm being troubled by a problme - hard fault.
The following is my debugging process:
I dump the registers in the stack when the hardfault happens.
[Hard fault handler]R0 = 0x1d8R1 = 0x2001091fR2 = 0x398dcdR3 = 0x0R4:0x0R5:0xe4a0R6:0x0R7:0x20010928R8:0xa5a5a5a5R9:0xa5a5a5a5R10:0xa5a5a5a5R11:0xa5a5a5a5R12 = 0x0LR = 0x1ffe1987PC = 0x1ffe199aPSR = 0x61000000- FSR/FARBFAR = 0xe000ed38MMFAR = 0xe000ed34CFSR = 0x10000HFSR = 0x40000000DFSR = 0x0AFSR = 0x0- Misc LR/EXC_RETURN = 0xfffffffd
From the HFSR and CFSR, I can know that hardfault is caused by the "UNDEFINSTR" error.
And the PC is 0x0x1FFE199A,
Before executing 0x1FFE199A, the LDRD is executed which is due to one uint64_t variable is used. If I change the variable to uint32_t type and there is no "LDRD" instruction generated, and the programme can run well.
So I suspect, the issue I met is related with the "LDRD", but I don't know the rootcause?
One thing I want to say, the hardfault usually happens after the progaramm running for several hours, sometimes 1 hour, sometimes 5 hours....
I'm sure the stack is not fulled and the project is compiled by the arm-none-eabi-gcc(2 018-q4-major).
Can any one give some suggestions for next setp debug?
Thanks a lot!
Are you sure the instruction is called in the normal flow? Or could it be there was a jump in the middle of the LDRD (which is 4 bytes) whereas the normal LDR is only 2 bytes.
Try adding a __asm("nop") before the LDRD to move it.
Hi Expert,
Thanks your reply and sorry for I'm so late response!
The program has been running for over 20 hours and it's in good status based on your suggestion about adding __asm("nop") before LDRD.
But I still have the following questions based your questions;
1. Are you sure the instruction is called in the normal flow? [Gavin]: This is a godd question, but hm... how do I know or check if the instruction is called in the normal flow? Any suggestions?
2.Could it be there was a jump in the middle of the LDRD (which is 4 bytes) whereas the normal LDR is only 2 bytes. [Gavin]: If it is a case, have you met such similar issue before? Meanwhile, is it a potential problem so that we need to pay attention to during our programming? How do we avoid this issue in our development?
3. Why the issue doesn't happen (maybe) after adding the NOP before LDRD?
>how do I know or check if the instruction is called in the normal flow?
If the device that you are using have ETM instruction trace, and the board has trace port connection, and if you have a debug probe that support trace, then you can instruction trace to see what happened to the instruction flow before it crashed.
>3. Why the issue doesn't happen (maybe) after adding the NOP before LDRD?
The NOP instruction produced an address offset of two bytes. Assume there is a stack corruption (not necessary stack overflow, could be array with unbounded index or something else) somewhere that caused an incorrect jump into the middle of the LDRD, the jump no longer go into the middle of an instruction after the change. The program execution is still wrong but doesn't necessary cause a crash.
regards,
Joseph
Hi Joseph,
Thanks your quick reply, I'll check the instruction flow first.