Space radiation can cause different types of problems in an IC that are manifested as latch-up or Single Event Upsets (SEU). Latch-up can be immunized against by hardening the chip with a process enhancement, but addressing SEUs depends on the type of circuit that is being subjected to a particle strike.
Engineers tend to worry more about SEUs that affect logic circuits rather than memory cells, as dogma suggests that Error Correcting Code mechanisms can take care of any flipped bit issues in memory arrays. But be careful – before you are satisfied that your memory array can withstand radiation strikes, it is important to understand how well an Error Correcting Code system works.
Error Correcting Code (ECC) memories have the ability to detect a flipped memory bit and correct it, but there are often serious limitations. The first limitation is how wide a word length the ECC system operates upon. The VA1020 ARM® Cortex®-M0 microcontroller Error Detection & Correction sub-system implements a Hamming Code based solution that detect two errors and corrects one PER BYTE. This means that there can be four flipped bits per 32-bit word and the microcontroller will still operate normally. This level of protection is unusually high, even for hi-rel components.
Another limitation for conventional ECC memory systems is when there is more than a single flipped bit, particularly in the same physical location on the die. There are two ways of protecting against this phenomenon. The first way is to implement ECC on a small word size, for example like the byte-sized ECC protection on the VA10820. The second way is to layout the memory cells so that bits in the same data word are spaced apart widely – this reduces the risk of ever having two flipped bits in the same byte. As you may have suspected, the memory array layout on the VA10820 has been designed to ensure that a particle strike cannot easily ‘take-out’ two bits of memory, regardless of the angle of incidence of the particle strike.
Yet another limitation of most ECC systems is their ability to deal with accumulated errors. ECC systems check memory words as they are read by the CPU. The danger of this approach is that particle strikes can flip bits on areas of the memory array that are not regularly being fetched by the CPU. This increases the likelihood that there will be more than a single bit error, creating an uncorrectable error. This limitation is addressed on the VA10820 by the inclusion of a ‘Scrub Engine’. The Scrub Engine operates independently of the ECC system and will operate in the background of regular CPU activity to periodically examine the contents of each memory location and correct any bit-flip errors. This prevents the build-up of accumulated errors to reduce the possibility of a double-bit error that is uncorrectable. The Scrub Engine frequency can be adjusted so that a full memory scrub can be implemented regularly enough to be effective based on the radiation conditions of the environment at any time. A recommended approach is to measure the number of errors that the EDAC system encounters and use that information to adjust the scrub rate to a reasonable level.
VORAGO have determined that the Soft Error Rate (SER), measured at Geosynchronous Solar min. with 100 mils of aluminum shielding, improves significantly with the combination of the EDAC and Scrub Engine on the VA10820, from 1.3 e-7 to 1.0 e-15 errors per bit-day. This system therefore makes it extremely unlikely that there will be a radiation-induced memory problem.
The bottom line is that all ECC memory systems are not created equally. When you observe that ECC has been implemented on a chip, don’t treat it as a check-box item. Take a closer look to determine whether it is really adequate for the expected radiation environment that the device will be exposed to. Also be aware that particle strikes can wreak havoc with logic circuits as well as flipping memory bits. The VA10820 has been designed to handle this with triple modular redundancy and other circuit design techniques….but that is a story for another day.