This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex-M3 Memory management fault recovery

Note: This was originally posted on 2nd June 2011 at http://forums.arm.com

Hi, is the Cortex-M3 supposed the user to do mannual instructions to recovery from the memory management fault handler?

After enable the MPU feature on my Cortex-M3 processor and stepped into the illegal access instruction, the MMF exception would generated as expected. But after executing of the normal MMF exception handler, the program would go back to the illegal access instruction again! We have inspected the stack of entering the MMF exception, the PC pushed is exactly the illegal instruction, that would be the cause of the loop. So is this the definition of the Cortex-M3 familiy?

The other exceptions would normally adjust the PC value to the next instruction before entering the exception, liked system tick or other IRQs.

BR
Parents
  • Note: This was originally posted on 3rd June 2011 at http://forums.arm.com

    Hi Samsun,

    In most cases (usage fault, bus fault, etc) the return address for fault exceptions is the same address of the faulting instruction.
    This allows the fault handler to locate the exact cause of the fault.

    The exception for this behaviour is imprecise bus fault, where the return address is after the faulting instruction (could be offseted by a few instructions).

    In most cases, Memory Management Fault is used in systems with embedded OS.  The OS receive the fault information, and then it can decide whether it should terminate the task that cause the fault, or in some cases, adjust the MPU settings and resume the task from the faulting instruction.  It is not normal for the fault handler to return to the faulting task and just skiping the offending instructions, as the program might not work correctly when a instruction is skipped.

    For example, the OS might have allocated several KB of stack for a task and programmed the MPU accordingly. During operation, the task might need more memory. When this happen, the task tried to access a memory location beyond the programmed MPU region, and triggered the Memory Management Fault. The OS will then look at the cause of the fault, and if there is enough memory, it might then decided to change the MPU setting for this task to allow it to have more memory, and return to the faulting instruction.  The task can then continue as nothing has happened, and no instruction is skipped.

    Of course, the OS could also decide to terminate the task.

    Hope this help.
    regards,
    Joseph
Reply
  • Note: This was originally posted on 3rd June 2011 at http://forums.arm.com

    Hi Samsun,

    In most cases (usage fault, bus fault, etc) the return address for fault exceptions is the same address of the faulting instruction.
    This allows the fault handler to locate the exact cause of the fault.

    The exception for this behaviour is imprecise bus fault, where the return address is after the faulting instruction (could be offseted by a few instructions).

    In most cases, Memory Management Fault is used in systems with embedded OS.  The OS receive the fault information, and then it can decide whether it should terminate the task that cause the fault, or in some cases, adjust the MPU settings and resume the task from the faulting instruction.  It is not normal for the fault handler to return to the faulting task and just skiping the offending instructions, as the program might not work correctly when a instruction is skipped.

    For example, the OS might have allocated several KB of stack for a task and programmed the MPU accordingly. During operation, the task might need more memory. When this happen, the task tried to access a memory location beyond the programmed MPU region, and triggered the Memory Management Fault. The OS will then look at the cause of the fault, and if there is enough memory, it might then decided to change the MPU setting for this task to allow it to have more memory, and return to the faulting instruction.  The task can then continue as nothing has happened, and no instruction is skipped.

    Of course, the OS could also decide to terminate the task.

    Hope this help.
    regards,
    Joseph
Children
  • Joseph,

    Thanks for the excellent book "the definitive guide to Cortex-M3", which is very informative.

    As to the bus fault, I am a little surprised to know that MM bus fault is imprecise.

    I am trying to design a scheme for parity bit error detection and recovery. since memory read is precise on the cortex-M3, is it feasible to use the precise exception to recover from a soft error?

    one scheme is to simply return from exception handler - I assume the program returns to the instruction that has caused bus fault. If it's a soft error, then the retried instruction shall go through successfully.

    Is there any way to skip the instruction that is causing fault in case of a hard error?

    Kevin

  • Hi Kevin,

    Memory Management Faults are precise (not imprecise as in your message). So if you just have a return in your fault handler, it will retry the same instruction again.

    For parity bit error / ECC, you can generate a bus fault when an error is detected. This trigger the bus fault exception (not Memory Management Fault).

    If you have ECC in your system, you can design your system to correct the error in the memory and then when the handler return, the read access is retry and hopefully it will have correct data the second time.

    A bus error response for Write operations can be precis or imprecise, but normally you don't generate parity error or ECC error during writes because the write data from the processor is assumed to be correct.

    If is in theory possible to design your handler to skip one instruction, but the handler will need to check if the bus fault is precise or imprecise first, and also use software to decode if the faulting instruction is 16-bit or 32-bit so that the return program address can be calculated. The handler will need to extract the faulting program counter address from the stack frame (see HardFault handler section). However, if the fault is caused by a stack corruption you might not be able to do that.

    Then finally I would question if skipping a faulting instruction is right. If a bus fault is reported, skipping the faulting instruction doesn't mean the remaining part of the program can continue correctly. As you say, all read operations on Cortex-M3/M4 are precise so you can use bus fault to handle parity / ECC error.

    Hope this helps.

    regards,

    Joseph

  • Joseph,

    Thanks. I don't like the idea of skipping the faulty instruction, either.

    In case that M3 is handling multiple tasks within a queue, and it gets stuck (bus fault by read access to system memory) within one task.

    Is it possible for M3 to abort the remaining operations after limited retries for current task gracefully and move forward to handle next task in the queue? or M3 has to be stalled for a system reset.

    Kevin

  • Hi Kevin,

    You can possibly implement this in your bus fault handler.

    For example, create a static variable to count the number of retry, and if it exceed the limit then signal to the embedded OS to kill the task. You might also able instruct the OS to restart the task, but it depends on the OS you use. Or potentially move the task into a suspended queue.

    But whether the system can still function correctly after the task is killed and restarted is application dependent.

    Regards,

    Joseph

    Sent from my iPad