Hello All,
I am using Cortex M0 based controller and want to know if the following issues can happen and what can be the possible solution by software to handle the issues:
1. ALU resulting wrong result run time.
2. Register access giving wrong results run time
3. Memory access (RAM or ROM) giving wrong results run time.
Thanks in advance!!
Hello,
it seems to be difficult to solve such issues by software.
If you want much reliability, I will recommend you to adopt the lock-step feature of Cortex-R series.
Best regards,
Yasuhiko Koumoto.
Thanks for the info.
Solving the issues is the next step.
I want to know if this issue can occur? Under what scenarios it can be produced (if randomly).
Can this be 1.detected 2.alarmed 3.solved by some means using SW algorithms or Hardware?
Let say if ALU stuck can happen if core supply drops to 0 or high in very quick time(neon Sec). Then this can be handled using
- HW by avoiding sudden voltage drop or peaks.
- SW algo of filters
Thanks
regarding
I can not say it would never happen but it would be very rare case.
ALU wrong results, wrong value reading registers or memories would be occurred by an unstable voltage or influence of a radio waves.
To prevent them would be responsible to outer core (i.e. Cortex-M0) logics.
Are you in the situation which you can design microcontroller of which core is Cortex-M0?
If it is so, you can equip ECC or parity for the memories.
Regarding the ALU or registers, you should implement such mechanism into the core.
However, I don't know it would be possible under the normal processer license.
Hello Yasuhiko,
Can ECC mechanism handle register related issues also?
or only issues related to memory?
Shashi
Hello Shashi,
if you have the architecture license, then it might be possible to implement ECC for registers. However, for such purpose, the triple structure F/Fs will be used. Probably it would be the normal implementation.
A Cortex M0 is pretty tiny and they tend to be built to a fairly conservative specification rather than pushing the limits so they are very reliable. You've got to measure what your requirements are and what is actually likely - for instance people are always drilling holes in walls - would that maybe destroy whatever it is. As to software solutions it sounds like you want something like double entry bookkeeping - eliminate most errors by using two processors doing the problem in completely different ways and see if the results match up. This has been done in bigger computers with fiber optics for the connections but people still code things in a similar way even in separate teams and make the same mistakes that way.
Something looking like EEC logic on small internal blocks like the registers is likely to add more gates than the size of the logic it is trying to monitor. Given the mostly likely cause of this type of random failure is either power supply glitches or a device on the threshold of being just within tolerance, any extra logic will be impacted too, and adding more logic is statistically likely to make the problem worse not better. "Failure monitoring" only really helps when it's monitoring quite big blocks ...