This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Hardening of firmware

Here is a link to a number of suggestions I have compiled for hardening of firmware.

I'm pretty sure that a lot can be said about the list, so please post coding tips or links to pages with good information of software hardening.

iapetus.neab.net/.../hardening.html

Parents
  • "Does it cover anything that's *not* blindingly obvious?"

    The text I wrote?
    Or the post from Cpt. Vince?
    Or something from Jack Ganssle's page?
    Or something else?

    What is blindingly obvious tends to vary quite a lot between people, which should be quite obvious if you look at the varying posts to this forum.

    Some thing it is obvious to start with a requirements specification.

    Some think it is obvious to read up on a development too before buying it.

    Some think it is obvious to read datasheets before selecting components.

    ...

    But despite everything obvious, people still manages to run off the road. Isn't the reason for checklists to help people remember all the obvious steps, so they don't try to land their plane without landing gears?

    What is obvious is a function of earlier training and past mistakes.

    But being obvious doesn't stop us from forgetting.

Reply
  • "Does it cover anything that's *not* blindingly obvious?"

    The text I wrote?
    Or the post from Cpt. Vince?
    Or something from Jack Ganssle's page?
    Or something else?

    What is blindingly obvious tends to vary quite a lot between people, which should be quite obvious if you look at the varying posts to this forum.

    Some thing it is obvious to start with a requirements specification.

    Some think it is obvious to read up on a development too before buying it.

    Some think it is obvious to read datasheets before selecting components.

    ...

    But despite everything obvious, people still manages to run off the road. Isn't the reason for checklists to help people remember all the obvious steps, so they don't try to land their plane without landing gears?

    What is obvious is a function of earlier training and past mistakes.

    But being obvious doesn't stop us from forgetting.

Children
  • The text I wrote?
    Or the post from Cpt. Vince?
    Or something from Jack Ganssle's page?
    Or something else?

    The text I quoted. That is the usual procedure - quote text, post reply. Here's the text again:

    Much of this stuff is covered in a SQEM (Software Quality Engineering Manual) that your company had developed... typically as a result of Mil/Aero/FDA standards.

    If you are having any difficulty finding that text and the lengthy ramble that precedes it please follow the author's advice and scroll up.

    What is blindingly obvious tends to vary quite a lot between people, which should be quite obvious if you look at the varying posts to this forum.

    I would have hoped that anyone working on a project to "Mil/Aero/FDA stadards" would find the text preceding the part I quoted blindingly obvious.

    But then, I would have hoped that anyone working to those standards would be able to express a couple of simple points clearly and concisely.

  • Always so positive.

    And would you care to come with an estimate what percentage of developers who frequents this forum that does "Mil/Aero/FDA stadars [sic!]"?

    Are you sure that the rest of the visitors have no interest in, or need for, making their firmwares more robust?

    A big problem for a lot of developers is that there are large numbers of smaller companies that does not even have any Software Quality Engineering Manuals because the management do not know anything at all about software development, and sometimes doesn't even know if their employed developers are top-notch or just about able to flash a diode.

    It can take a lot of work for one or a few developers to help themselves to some form of formalized development model, and also manage to get it established in the company. As captain Vince notes, a large part of the development is not the actual coding. How do you get management to realize that a project budget should include significant time for work spent before code start, and significant amount of time after you have a release candidate? And to allocate time/cost for development of test models suitable for the developed product?

    I like situations where I can release one new firmware version every 12-24 months, because a customer requests the addition of a function. But the joke I did post earlier about the development process is quite close to the real world, or we wouldn't laugh at it. Too many companies forgets about the cost to maintain a product if it isn't properly designed, not to mention the goodwill loss from the customers having to constantly wait for the next maintainance release just to get an _almost_ working application.

    It is so easy to just say that something is obvious. But that is just a great way of being ignorant.

    In some situations what is obvious can be expressed very clearly and concisely. That is what you do in a checklist. But if you widen your view and realize that everything may not be obvious to every visitor on this forum, then you would note something else. Sometimes you also have to do a bit of sell in of a concept. Motivating things with examples. Here is a shocker: Most of the things written on this forum are not aimed specifically for your eyes. When you do see errors expressed, you do the forum a service by pointing them out. When you spend your time focusing on how the presentation affects you, people will quickly think: "Oh no, not again", and directly scroll to the next post.

  • And would you care to come with an estimate what percentage of developers who frequents this forum that does "Mil/Aero/FDA stadars [sic!]"?

    That's an easy one: none.

    Are you sure that the rest of the visitors have no interest in, or need for, making their firmwares more robust?

    This, along with the rest of your post, misses the point. Try re-reading my original post.

    It is so easy to just say that something is obvious. But that is just a great way of being ignorant.

    I'll take ignorance!

  • And would you care to come with an estimate what percentage of developers who frequents this forum that does "Mil/Aero/FDA stadars [sic!]"?

    That's an easy one: none.

    Please enlighten us, Mr. Sprat: How do you know that? Do you indeed carry the gift of telepathy (as Erik once suggested...) or did you use you amazing deduction skills to infer the above?

  • Please enlighten us, Mr. Sprat: How do you know that? Do you indeed carry the gift of telepathy (as Erik once suggested...) or did you use you amazing deduction skills to infer the above?

    The latter, Mr. Michael, the latter.

    I'm afraid that telepathy does not exist.

  • Hi Per,

    May I be allowed to translate a very little parts of your "Some concepts for hardening embedded software" into Chinese, and to post the translated parts with a very short Chinese introduction to a BBS forum in Taiwan? (with a link to the original source and the name of you)? This is to introduce your documentation to my region.

  • LOL - the first actual comment about the page will be in a language I can't read :)

    Yes, you may translate the text. Name + link to the english text will be fine.

    I should update the page with some form of usage/license information to make it easier to make use of the text.

  • A Traditional Chinese Introduction to Per Westermark's "Some concepts for hardening embedded software"

    www.ptt.cc/.../M.1239466995.A.F65.html

  • Per,

    Thanks for an interesting article.

    John,

    Thanks for the translation. I hope to persuade my (Chinese) wife to read it. Then, maybe, she'll start to understand what my job is about.

    But at the moment, it's throwing up a 404 error :(

  • Which link is giving 404? Both my links are up and working - tested from remote proxy.

    And the link John Linq posted is also working ok.

  • It was John Linq's link that was giving the 404; but it's OK now.

    Thanks.

  • Just a link about problems with bad firmware:
    www.theinquirer.net/.../seagate-barracudas-7200-11-failing

    At least two, but probably three, TB+ disks failed so fast I didn't even had time to transfer the information to empty 1.5TB WD disks I already had laying around.

    These bright guys seems to have intentionally bricked the units to protect the hardware, but at the same time making it impossible to update to fixed firmware, and Seagate will charge full recovery fees for restoring the data from fully functioning hardware.

    I think it's time to update my backup program to not only count number of copies and geographic separation but also media brand/model.

    The ability to accept new firmware should be kept at almost any cost. Bricked units don't exactly help with the goodwill.

  • Some comments from Taiwan. (in Traditional Chinese)

    www.ptt.cc/.../M.1239467345.A.3A8.html

    Will try to persuade them to join the discussion here in KEIL's forum.

    1. sunneo says, he implements this kind of Hardening, by Operating System and Multi-Layer ISR. (I don't understand.)

    2. tinlans says, the quoted code is "unreachable code" for compiler; and will be removed by compiler in most case with optimization. He suggests to implement this kind of Hardening by hardware.

    for (idx = 0; idx < BUF_SIZE; idx++) {
        ...
    
        if (idx >= BUF_SIZE) {
            // loop variable has for some reason been corrupted. Take proper
            // action.
            perform_corrective_action();
    
        } else {
            buf[idx] = new_data;
        }
    }
    

  • Mmmmm.....

    My English ability and Technical skills are not good; hope my translation is not very incorrect/improper.

  • If you have a processor with MMU, then you can set up guard pages on either side of arrays, and have the processor generate an exception if your code tries to access any of these pages.

    This is similar to how most full-size operating systems (not RTOS) automatically grows stacks.

    But a very significant percentage of embedded equipment dont have the luxury of having an MMU.

    The ability of the compiler to do dead-code elimination very much depends on the data declarations, and the full contents of a loop. Having an unsigned loop variable and trying to test for a negative value can be trivially deduced to be meaningles by a compiler. Tests for upper bounds can be eliminated if the compiler can see that a write to the loop variable is followed by multiple identical tests, where one or more of the tests comes after the break condition of the loop, in which case the following tests for the same value would be expected to evaluate to the same result - in this case not being reachable.

    This is a reason why a sw design should avoid aliased accesses to variables, where two different pointers, or a pointer and a direct access may modify the same variable - the compiler may decide that it knows the contents of a variable even when modified. The program gets tested in a non-optimized debug build and then fails in a release build with full optimization, and then the compiler gets blamed.

    This is also a reason why a lot of thought should be put on the use of the volatile keyword. It affects the compilers abilities to decide what is dead code, but will also make an aliased access take effect. A program with an aliasing bug may run perfectly because the compiler caches the relevant data in registers, but a trivial change to the code may exhaust the number of registers. A change of compiler version of compilation options may give the same result even without any code changes. The biggest disadvantage with volatile is of course the slowdown of the code and the increased load on the memory subsystem.

    Not sure about your multi-layer interrupts, but an ISR is expected to be short and fast, so it should not contain any delays or big code constructs. If an interrupt requires a lot of work to be done, then you normally let the ISR trig an event and have either a RTOS task or possibly a lower-priority interrupt that allows nesting to perform the actual work. On Linux for example, you have the concept of tasklets that you may use to perform the real work after having been trigged by the ISR.

    Another thing is that you may have an ISR separated into a top-half and a bottom-half, where the top-half runs with interrupts disabled and the bottom-half enables interrupts. The first part of the ISR is then a form of critical section, guarding from interference from new interrupts.

    But this is a separate issue from having a stuck interrupt, where you either get no interrupts at all, or you instantly gets a new interrupt as soon as the ISR ends. If the interrupt state machine in the processor gets into an invalid state, it may be enough to reinitialize the interrupt source but you might just as well need to reset the processor. A level-trigged interrupt from a broken sensor would require the interrupt source to be deactivated until the sensor is fixed, possibly polling or inverting the logic of the interrupt input until the stuck condition goes away. A processor that can't invert the polarity of a level-trigged interrupt can make use of an XOR gate between the external hardware and the interrupt input, if polling isn't acceptable.

    But edge-trigged interrupts can also get into troubles because of external hardware. An external failure such as the loss of a pull-up resistor can result in huge numbers of potentially very high-priority interrupts that may starve the main application or lower-priority interrupts.

    If you have a timer tick that clears an event counter, and the event interrupt incrementing the counter, then the event interrupt can detect a counter that gets incremented too much. Either the timer interrupt has stopped working (or is starved because it has lower priority), or there are too many events within a time window. This is an example of using watchdogs for individual interrupt sources. It is also an example of why it is problematic to kick the watchdog from an ISR.