I can't imagine anyone reading this posting hasn't already read about the Apple "goto fail" bug in SSL. My reaction was one of incredulity; I really couldn't believe this code could have got into the wild on so many levels.

 

First we've got to consider the testing (or lack thereof) for this codebase. The side effect of the bug was that all SSL certificates passed, even malformed ones. This implies positive testing (i.e. we can demonstrate it works), but no negative testing (i.e. a malformed SSL certificate), or even no dynamic SSL certificate testing at all?

 

What I haven't established* is whether the bug came about through code removal (e.g. there was another 'if' statement before the second goto) or, due to trial-and-error programming, the extra goto got added (with other code) that then didn't get removed on a clean-up. There are, of course, some schools of thought that believe it was deliberately put in as part of prism!

 

Then you have to query regression testing; have they never tested for malformed SSL certificates (I can't believe that; mind you I didn't believe Lance was doping!) or did they use a regression-subset for this release which happened to miss this bug? Regression testing vs. product release is always a massive pressure. Automation of regression testing through continuous integration is key, but even so, for very large code bases it is simplistic to say "rerun all tests"; we live in a world of compromises.

 

Next, if we actually analyse the code then I can imagine the MISRA-C group jumping around saying "look, look, if only they’d followed MISRA-C this couldn't of happened" (yes Chris, it's you I'm envisaging) and of course they're correct. This code breaks a number of MISRA rules, but most notably:

15.6 (Required) The body of an iteration-statement or selection-statement shall be a compound-statement

Which boils down to all if-statements must use a block structure, so the code would go from (ignoring the glaring coding error of two gotos):

       if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
              goto fail;
       if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
              goto fail;
              goto fail;
       if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
              goto fail;

to

       if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0) {
              goto fail;
       }
       if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0) {
              goto fail;
              goto fail;
       }
       if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0) {
              goto fail;
       }

This would then stop the unconditional goto be executed. But would cause a further rule violation:

Rule 2.1 (Required) A project shall not contain unreachable code

Nevertheless, what might surprise you (unless you follow MISRA-C closely) is that the use of the goto statement ISallowed:

Rule 15.2 (Required) The goto statement shall jump to a label declared in the same function.

but discouraged:

Rule 15.1 (Advisory) The goto statement should not be used.

In addition, I would expect all static analysis tools to flag this error; indicating even rudimentary static analysis is not being applied to this codebase. But that's not really what struck me when I first saw the code. My reaction was

"They must have ignored compiler warnings, as any compiler worth its salt  would warn about unreachable code"

Now I do most of my work, here at Feabhas using either the ARM/Keil compiler (armcc) or the IAR Compiler (iccarm) and am very used to seeing this warning as it is common to have infinite loops in multi-tasking code. Sure enough, give the following code:

int calculate_F(void);
int simpleTest(int p) 
{
   int ret_val = 1;
   if(p & 0x000F)
      goto out;
   if(p & 0x00F0)
      goto out;
      goto out;
   if(p & 0x0F00)
      goto out;
      ret_val = calculate_F();
out:
       return ret_val;
}

As expected, the ARM compiler reports a warning regarding unreachable code.

goto fail

So how could this happen? To my utter amazement, compiling this with GCC and -Wall (Warnings all) doesn’t report any warnings. Also apparently neither does Clang (the default compiler on OSX) with -Wall, but does if you specify -Wunreachable-code (which isn’t part of all ???).

 

So what’s my takeaway from this? I’ve always advocated the use of Static Analysis tools as an integral part of the build cycle (i.e. not trying to apply it retrospectively to 100,000’s of lines of code) rather than relying on the compiler to generate appropriate warnings. And if you weren’t convinced before, this is just another, now well documented, example of why you should.

 

* Please let me know if it has