We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Here is a link to a number of suggestions I have compiled for hardening of firmware.
I'm pretty sure that a lot can be said about the list, so please post coding tips or links to pages with good information of software hardening.
iapetus.neab.net/.../hardening.html
Just an addendum about the use of an MMU.
Running separate tasks protected from each other is a great way of getting separation - one task can't overwrite the data of another task.
But this is not the same as catching a buffer overflow.
Guard pages can catch an out-of-bounds access. But the MMU normally works with pages that may be 4 or 8kB large.
An invalid access that is more than one page outside the array may skip the guard page an access other variables owned by the same task, without this access being caught. Only an explicit range test can catch this.
And an array that does not completely fill a number of memory pages will have an unprotected zone between the end of the array and the guard page. Testing of the code may conclude that no guard-page access happens, and fail to notice that the array had an off-by-one access (possibly a read of random data, possibly a write of data to a location that will not at a later time be copied to EEPROM for non-volatile storage).
In the end, a MMU is very valuable but should not be seen as a magic solution to catching problems. And on the Keil forum, most users don't even have a MMU to activate.
The first and most important line of defense is the developer - making sure every line of code is well designed, and running on a sound hardware design.
The second line of defense is defensive programming, where the code contains guard clauses to catch invalid states, out-of-range values, ...
A MMU would only form a third-line defense. When the MMU catches an error, then the problematic task has probably already done a lot of mischief.
Mmmmm.....
In the introduction I wrote, I mentioned something about Bit Error; so, maybe some people think that, why don't we design a very good hareware, which can protect the system from Bit Error. If there is no Bit Error, the software doesn't need to handle these issues.
that is possible, but will require extra circuitry and logic which equals extra cost. these are usually checks that are better done in software, which is also easier to change in face of changing requirements.
It isn't uncommon with ECC to protect internal memory. Most flash memory has ECC, and some RAM has it. But think about all the processors with external memory interfaces without ECC. Small microcontrollers normally has all memory internally and no MMU. Most processors with MMU requires external memory expansions, and most of them do not have a ECC-protected memory interface.
Life can be fun if you happen to get your equipment installed close to a contactor handling hundreds of ampere. With limited distances to other unknown installations, do don't need a war to have industrial equipment suffer very rough EMI abuse. Think what would happen if your equipment is controlling that big contactor, and the contactor has a failed spark suppressor.
Some certifications may require the equipment to be fully operating when hit by ESD or strong electrical or magnetical fields. Some certifications accepts a controlled reboot. Some certifications requires the rebooted unit to return to the previous state.
To assume (ass-u-me) that the hardware can handle all cases can be a bit premature. If you fail the certification or a pre-compliance test, you will have to redesign the hardware and/or firmware. But a redesign takes a lot of time, and a new sets of tests before a new certification run.
The important thing to remember is that defensive programming does not take extra time. On the contrary. It normally speeds up the debugging process by catching problems early, and sometimes even pinpointing the root cause.
yes yes yes yes and once more - yes.
Doesn't the realiability of systems is highly depending on the defensive codes had been written everywhere? Will systems be failed if some programmers forgot to put these code somewhere?
At least, compiler team should guarantee that the stack is put into ECC memory. Then programmers can only take care about static local variables and global variables. Otherwise, programmers should test any variables one by one everywhere. Eventually, you will not able to find people who want to maintain the source code.