This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

recovery from illegal instruction Undef Abort exception

Hello,

Let's say my embedded software has been thoroughly tested, and there shall be no such thing as an Prefetch Abort exception of type "illegal instruction" due to incorrect privilege level or compilation issues.

Besides, the instruction is supposed to be correct at fetch time:

- the L1 instruction cache is protected by parity, and an error would have triggered a synchronous external abort.

- the L2 cache is protected by parity as well, and an error would have triggered a synchronous external abort.

- the DDR memory has SECDED ECC, and an uncorrectable ECC error would have triggered an AXI slave error, which in turn would have generated a synchronous external abort.

So the instruction becomes corrupted and illegal only when already in the CPU pipeline. If we look at the pipeline data structure which would have caused such a corruption due to a radiative event, I could think of:

- the re-order buffer (ROB);

- the pipeline registers (fetch - decode - execute);

- the centralized scheduler;

- the program counter register;

- the Global History Buffer (GHB);

- the Branch Target Address Cache (BTAC).

To recover from the illegal instruction, I assume that:

- the pipeline (including ROB) shall be flushed, which I suppose is done on taking the Undefined Abort exception;

- the BTAC and GHB shall be invalidated;

After all that, I would like to replay the instruction by decrementing the PC to the one of the illegal instruction. By so, from my understanding, the instruction will be re-fetched from L1 instruction cache (or beyond).

Do you think that this mitigation scheme is sufficient? or did I miss something?

Thank you a lot for any help.

Florian

Parents
  • As for SECDED ECC, it would not detect multi-bit error. So you cannot be sure if the opcode is still valid in DDR RAM and thus not in any cache on top.

    So I think an illegal instruction exception shows a severe problem and I would not "just" try to continue.

    At least, I would record the address to be sure you do not end up in an endless loop.

    Since it is code area, it should be read-only and you could use a hash (xxhash64 for example) or CRC32 to detect also multibit errors.

Reply
  • As for SECDED ECC, it would not detect multi-bit error. So you cannot be sure if the opcode is still valid in DDR RAM and thus not in any cache on top.

    So I think an illegal instruction exception shows a severe problem and I would not "just" try to continue.

    At least, I would record the address to be sure you do not end up in an endless loop.

    Since it is code area, it should be read-only and you could use a hash (xxhash64 for example) or CRC32 to detect also multibit errors.

Children
No data