Debugging embedded software is often a time-consuming activity, both in terms of chasing down a specific bug and as a general project activity. Further, as an activity, it is often an eclectic mix of desperation, perspiration, and a fair bit of magical thinking. In this article, I will cover techniques and tactics that might not completely eliminate all the hassles of debugging but can at least minimize the magical part. If you are a relative newcomer to the embedded software world, you might pick up some useful nuggets of information. If you are a seasoned pro, you are probably aware of these topics, but you might then re-discover some techniques that you already know that you should practice.
We know for a fact that newly written software is seldom, if ever, completely bug-free. However, we also know that there are actions we can take up front to help us reduce the number of issues we have to deal with in our code, which is another way to say that we have less debugging to do. An obvious place to start is to lay down some basic rules for code hygiene. Here is a summary of some rules:
One of first things to realize (or remember) is that if you are developing and debugging embedded software, you are very likely to do so in an environment where executing code on the target is done through a debugger. For example, if you are working in an IDE, the easiest way to execute your program is by firing up the debugger. This is sort of obvious, but it also means that you have all the powers of the debugger at your fingertips – maybe without realizing it.
To get down to the nitty gritty, we examine the power of breakpoints. But first, let us throw some shade at the venerable printf as a debugging tool. The most important reason to not use printf is that adding printf-statements in your code can have dramatic effects on how your code is compiled. Not only is the printf a function call, but the arguments to the call have to be accounted for. This in turn means that stack and register usage looks completely different and a lot of compiler optimizations will not be performed, especially if the statement is located in a tight loop. This can have unpredictable consequences if your code is complex or relies on C/C++ behavior that is implementation-defined or even undefined by the C/C++ standards. What might happen is that your code behaves perfectly well when adding the printf to the code, but breaks again when you remove the printout. By the way, this is a very good reason to strive for MISRA compliance. Another good reason is that printf is a pretty weak tool as it can only display data. The third reason is that to change the behavior of the printout or add more printing statements, you need to rebuild the application and download it to the target again. Finally, at some point you have to go through the code base and remove all the statements you added, even if they are all guarded with #ifdefs.
So, let us take a break from the preaching to look at the different types of breakpoint available. A breakpoint can, in its simplest form, be a stop sign at a particular source statement, so execution breaks unconditionally when reaching the right spot. A decent debugger will then let you examine the content of variables, registers, and the call stack as well as memory in general. Such a code breakpoint can be very useful in itself, but it can also be associated with an expression whose truth value determines if execution stops or not.
By doing so, you can focus on the interesting cases instead of examining the interesting variables every time execution pass through the breakpoint location. For example, if you want to take a closer look at what is going on in a specific range of value in a loop index variable, you can setup the expression to stop only when the index is in that range rather than stopping each time you hit that code. Of course, you can also construct more complex stop expressions based on any variables that are in scope.
Sometimes you really need to see the value of one or more expressions. An easy way to do this is to use a log breakpoint. A log breakpoint is a breakpoint whose only purpose is to print a message in the debug log window without stopping execution. It is essentially a debugger-supplied printf that can also be combined with a Boolean expression to determine if the message should be generated or not.
A very powerful type of breakpoint is the data breakpoint. A data breakpoint will trigger when a specific variable or memory location is accessed. This can be extremely helpful if you are trying to figure out why data values in a specific location are not the data you expect. Why would you need to do that, you say? Well, there can be several reasons you may find yourself in that situation, but one of the sources for such issues is pointers. If you use (or abuse) pointers there is a fair chance that at some point you get some pointer arithmetic wrong, and while reading from or writing to the wrong address might not make the program fall over, it can still produce very strange results. These kind of issues can be very tricky to debug, as the actual bug and the place where you experience the effect are often not related in any way.
Combining data breakpoints (or any type of breakpoint, for that matter) with the call stack window can be very revealing. The call stack window shows you where you came from, which can sometimes be a bit surprising…It also gives you the opportunity to move up and down the call chain and examine parameter values.
Note that some of these types of breakpoints might not always be available, depending on the exact device you are running your program on, and the specific debug probe.
Some targets support live reading of memory, so that the debugger can continuously display variable values and other information during execution with a standard debug probe.
If you can stand a few extra buzzwords and adjectives, let us spend a few lines talking about a debugging tool that is truly amazing. Trace is a way to record the execution and other types of data flow on your device, like interrupt information and other hardware events. For example, viewing combined event data in a timeline can be very revealing about how your system behaves: are your interrupts firing when they should, and how does it correlate with other activity?
What makes trace a bit more complex than regular debugging is that there are many different types of trace technologies, and different ways to access the trace data. On top of that, you may need a trace-enabled probe. So, to utilize the power of trace in the best way for your needs, it is beneficial to think about what you need to do to use trace at the beginning of your project.
High-quality trace tools are designed to take away the pain of trace complexity and use all available trace information, but you still have to figure out your needs on the hardware side. However, investing some time and resources up front in trace as a debug and code quality tool will pay off when you hit that first tricky issue.
Some of the topics in this article might seem borderline trivial, but the best solutions to tricky problems often are. Finding the root cause of a software problem can take days or even weeks, or it can be a quick and easy process. One way to increase the chance of the latter is not to always reach for a printf statement, but rather spend a moment to think about how to best use your knowledge of the code base in combination with the features of your debugger and trace tools. Over time, you may find that this way of working is a real boost to your productivity and efficiency, not to mention peace of mind.