After running about 4 years my code controlling an observatory dome began creating dome move direction errors. The problem appeared when the Master Controller (PC) begin sending some corrupted data. I thought I had error checking on the control protocol. I caught the bad data, entered a routine to send back an error message. But instead I caught the error and proceeded to run the command parser. The problem? I had failed to initialize a variable. For four years the bad code has run. I'm glad it was not on it's way to the moon or some such. I claim to use PC-Lint faithfully but in this case I guess I lied to myself. Bradford
The "once in a blue moon" errors can be really problematic. Having something that seems to run ok does not mean that it really is ok. Just that no error has been found yet.
I implemented a Windows server program many years ago. I later reused the serial code in another program, and then in a third. Then one day, I got information that this program did unexpectedly die for some users if the serial port was in use when the program started. I could not reproduce. One day many moons later I realized that the program only died when run on some (all?) non-english Windows. The error message complaining about the failure to open the serial port explicitly requested an english resource string. This resource string lookup failed for non-english versions of Windows, resulting in a request for the lookup of an english resource string to report this error. The program spun a recursive loop trying to issue error messages until the stack couldn't grow anymore.
When I found that error in program #3 and started to ask around about program #2 and #1, I got some feedback that one or two customers had mentioned an unexpected death of the program. But no one had ever managed to figure out the reason, so no one did file a bug report. Several hundred installations running for maybe 10 years before the error was caught. The biggest reason why it wasn't caught earlier was that the installations of program #1 and #2 did run 24/7/365 so they very, very seldom did start the program. And even more seldom with the serial port already busy.
www.obliquity.com/.../bluemoon.html