This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Hardening of firmware

Here is a link to a number of suggestions I have compiled for hardening of firmware.

I'm pretty sure that a lot can be said about the list, so please post coding tips or links to pages with good information of software hardening.

iapetus.neab.net/.../hardening.html

Parents Reply Children
  • The greenhouse effect, climate change, and extreme weather bring some impacts to Human society. In the olden day, the weather changes mainly by season changes; however, the weather changes now very quickly and very often, sometimes, the temperature difference within a day could be 10 degrees Celsius. I believe this will make the past Reliability Test Conditions become insignificant. I am curious about, is there any Amendment being made to the existing Reliability Test Standard? For example, MIL-STD.

  • MIL-STD-883 does contain thermal shock testing. I guess that is what you are getting at (?). [I don't remember what the actual specs are, and don't have it handy here, but this is from my memory of it].

    Think about it for a second, the aircraft/missile/thong/whatever leaves the earth at a balmy 74 degrees F, heats up during the flight, and/or cools off to the high-altitude temperatures. Then suddenly descends back to earth again. Lands, and all is good.

    It does this within minutes, or seconds.

    So a "sudden" 10 deg C per-day change due to the global thermal cycles, isn't going to stress out these widgets that are MIL/DoD rated.

    Net effect, I don't think there is a specific revision being made due to any day-to-day temperature change: its already in there with +120 to -40 to +120 to -40 deg cycle changes within minutes, not days.

    --Cpt. Vince Foster
    2nd Cannon Place
    Fort Marcy Park, VA

  • "Normal" automotive environment tests are also way harder than what the nature can manage (if you don't try to temperature-cycle your equipment near a vulcano or having your equipment hit by lightning).

    For automotive use, you may go from room-temperature in a garage to either extreme cold or extreme heat within minutes. And while a rocket may have to suffer a single heat-cycle, and high-end jets gets picked to pieces quite regularly, the electronics of your car will be virtually untouched until you scrap the car or the electronics fails and has to be replaced.

    You don't have to limit yourself to just the engine electronics. Think about the standard car stereo - maybe a pleasant +22°C when you drive the car, and then down to -30°C or +90°C when you leave the car in the winter or in direct sunlight. Most companies who works with this kind of equipment has quite impressive climate chambers for cycling of temperature, moisture, ... and the tests are done with quite big and quite fast cycles, unless running long-time tests at one of the extremes.

  • Hi Cpt. Vince and Per,

    Many thanks to your explanations. Your explanations are very logical.

    Due to my limited English ability and professional knowledge, I only mentioned the temperature factor, but in fact, something else should be considered, I did some Google search and found that, what I thought is more like the Aging Test. (It shows that, "American Society for Testing and Materials" owns a lot of data about Aging Test). It is hard to say if the Global Climate Change does bring some impacts on Materials Aging. But it does bring some impacts to Human Health.

  • I am not sure what you mean exactly, but of course there is an issue of data retention of non-volatile memory such as EEPROMs. the effect of aging of data stored on them is normally very well documented in the respective data sheets.

  • Regarding: Materials Aging

    Another thought ...

    I have not seen any mention of firmware ageing.

    Do loops get slower as they get older?

  • No, the loops don't get slower. But it is a well known fact that the quality of the compiled binaries or untouched source code degrades with time.

    Code that has worked for years will suddenly start to misbehave. Code that has passed validation tests will suddenly stop doing it, so the age factor is important.

    That is why so many developers are so very scared of inheriting old and trusted code. The manager says: You don't have to worry - we haven't needed to release an update in five years. The next week you get two customers with problems. Within a couple of months, a significant percentage has a problem. And despite not having touched the code, the new developer gets the blame since the problems started after he took over the responsibility. Now is the time for the poor *** to find out that the original compiler was not stored in the source code repository, or has a license registration method making it impossible to reinstall, and that the last code changes five years ago somehow wasn't commited...

  • An aging plastic IC package might cracks, and leads to some open/short problem, I guess.

  • There are a huge number of problems you can get with hardware.

    - Oxidation with socketed components or with connectors - in some cases melt-downs because the contact resistance gets too high.
    - Wet capacitors drying out (normally from high temp).
    - Tantalum capacitors exploding because they have been run out-of-spec.
    - Metal fatigue in bonding threads inside the chips.
    - Electromigration in chips, power transistors or switch regulators because they have run at high currents and high temperature for a long time.
    - Solder joint whiskers.
    - Metal fatigue in solder joints.
    - Factories that hasn't baked components, getting moisture crack the chip.
    - ESD damages (a damage in the factory can take months or years until the failure).
    - Damaged conformant coating, resulting in leak currents or possibly PCB traces being corroded until they break.
    ...

    The problem is to try to decide what hardware failures that should be possible to detect and what work-arounds there should be in the firmware. Is it enough to warn about a problem or is the failure critical, requiring the unit to "brick" itself? Should there be redundant hardware? What is the probability of producing incorrect results? What will happen if incorrect results are produced? What will happen if no results at all are produced? What is required by the certification?