This blog is for those of you who spend hours every day developing and debugging software, and have experienced the frustration that comes with the question “how on earth did my software get here?”
This is actually a fairly common situation. These days complex software applications are developed by teams, and the larger the applications (and the teams) the more opportunities there are for misunderstandings and coding errors resulting in bugs.
A typical methodology for software debug involves spotting a place in the code that signals that something has gone wrong, then setting a breakpoint at that place. When the breakpoint is hit you look at the status of the system (call stack, variables, registers) and try to work out yourself what happened. This requires time and skill, as you need to picture in your mind a history of how the execution got to that point in the code by simply looking at the code and current status of the application. Effectively you need to “replay” the execution multiple times in your head until you find out the sequence of instructions executed, which points at the software bug.
This methodology often requires the software developer to validate their guesses by adding annotations to the code (e.g. printf statements) to provide an execution trail with which to debug. This comes with several problems:
Every time you add an annotation to need to rebuild and reload your application. This is not much of a problem with small applications, but a larger problem when working on large code bases or as part of a team.
Sometimes it is hard to get the printf in the right place. The longer the distance between the actual bug and the point in the code where you know there has been a bug, the harder this becomes. For complex applications this can be like finding a needle in a haystack.
The dreaded probe effect caused by annotations may make the problem go away. This is the most annoying one of the three.
For ARM-based embedded systems and microcontrollers instruction and data trace provide a much better debug methodology, as you get a record of all the instructions executed by the processor. You can use this history as a proper, non-intrusive trail of execution. Things do not get much better than this!
Unfortunately for Linux-based multicore systems trace is expensive, difficult to use and more importantly, not always available in production SoCs. It is often the only way to debug system issues (e.g. related to the communication between CPUs, data sharing in L2 caches, etc), but overkill for debugging user space code. For starters, trace records everything that happens on any core, so it is up to you to track applications as they move between processors and they get scheduled in and out. In addition, you may find that you are running out of space in your trace buffer before finding the bug (a typical problem with fast CPUs in multicore configurations). All in all, it is just not the right debug methodology for this particular problem.
It is our belief that user-space application debug should be convenient and easy, as user-space application developers do not have (nor wish to have) a deep understanding of the architecture of their target SoC. They just want to get their code right.
In the ARM® DS-5™ Development Studio v5.16 we have introduced a new feature, called Application Rewind, developed in collaboration with Undo Software. Application Rewind gives you similar benefits to trace when debugging Linux and Android* native code on ARM targets, but it does so with no JTAG or CoreSight instruction trace, but by simply using a special debug agent running on the target. *Note that Android support is planned for DS-5 version 5.17.
The idea is that you only need to connect your debugger to the target using an Ethernet or USB connection, the same as you would do when debugging applications with gdbserver or the Android SDK. The debug agent acts like a CCTV, recording enough information about the execution so that at any time you can “rewind” or “fast forward” a history of the execution of the application. The information is stored on the target’s memory with minimal overhead – only the application being debugged is slowed down, typically for not more than a factor 3 depending on the code. The rest of the system runs at normal speed. This overhead would be an issue for real-time embedded systems, but is perfectly acceptable in a Linux or Android environment
After you hit a breakpoint, you can use the debugger’s controls to control the rewind. For example, you can run back to a breakpoint (rewind up to a specific instruction), run back to a watchpoint (rewind up to the last time a variable was accessed), or simply step back and forward in the code. All the time, the system views in the debugger show registers, variables and memory as they were at the time. This gives all the information you may want for your investigation. The only thing you cannot do is modify the system status when you are in rewind mode, as the application is not really running – you are just navigating the record of the execution.
See DS-5 Application Rewind in action below.
As I wrote earlier, this functionality is available as beta with DS-5 version 5.16. You can download a copy and try it for free with your target for 30 days.