Get the best from ARM debug tools: Stack frames & instruction trace

September 11, 2013

6 minute read time.

This blog covers the use of two powerful debugging techniques — stack frames and instruction trace — to debug random or timing-related bugs on ARM processor-based targets.

Timing-related and random bugs are a common nightmare for software developers. Any consistent, replicable defect can be easily debugged by stepping through the code until the execution branches to an unexpected path. However, when bugs are random or timing-dependent you could spend your life stepping through the code without ever reaching the error condition at "the right time".

The typical approach to dealing with these problems involves instrumenting the code. The idea is simple: you add printf statements to the path of code you think the processor is executing, and each of those statements provides some information about the state of the software at that point. For example, you can print the value of program variables over time.

This approach often works, but it tends to be time consuming (and let's face it, quite annoying). The reasons are many, and include:

You do not want to rebuild your software every time that you decide you need an extra printf statement. Building software takes time, a lot of it if the software is large enough.
It may take hours to track down the execution path of your application and instrument the software to give you the information you need
This method affects the replicability of the problem. When you insert a printf statement the problem may go away until you revert to the original code. Even worse, you may decide to leave the printf statement in the code, and the bug stays, but happens less often. This would cause horrible problems in the long term, when the product is in the field

The "right way", or at least the easy way to deal with this type of problems involves debug techniques that are enabled by many professional debuggers. The two I will describe here are the use of stack frames (also called backtrace) and the use of instruction trace, which is widely supported by ARM processor-based chips.

In order to illustrate the use of these techniques I will use DS-5™ Debugger and DSTREAM™.

Using stack frames
The call stack is an area of RAM used as temporary storage during function calls. A stack frame is the area of the stack allocated by a specific function. The application binary interface (ABI) for the ARM Architecture has well defined rules for how the call stack should be used, which the ARM Compiler and the GNU Compiler adhere to, and which you should consider when writing assembler code.

The way it works is as follows. Every time there is a non-inlined C function call:

The first 4 parameters of the function call are stored in processor registers R0-R3
Any further parameters are "pushed"; (stored) into the call stack
The return address in the program is also pushed into the call stack
The processor branches to the called function

Some functions reserve some further space in the call stack to store local variables. Others don't. However, they all follow the same process when returning to the calling function:

The function's return value is copied to register R0
The stack pointer is restored to its previous location
The processor returns by branching to (or "popping") the return address

The DWARF tables generated by the compilation tools describe the stack frame information for each function in the code. Therefore, when you load the software debug symbols into your debugger, it can decode the contents of the call stack. By making clever usage of this information, the debugger can also give you plenty of historical information about how the software got to a particular statement. It shows the "call chain" from the entry point of the application to the current function.

The way it works is as follows:

The processor's stack pointer (SP or R13) points to the stack frame for the current function
The debugger extracts the return address from the stack frame by using DWARF debug information, and therefore is able to navigate to the calling function
Since the debugger knows the size of the stack frame for each function, it can also read the previous return address and navigate to its calling function
And so on...

The result is that without any kind of instrumentation the debugger can provide a path of function calls down to the instruction pointed at by the program counter.

Some professional debuggers such as DS-5 Debugger go beyond this functionality, and retrieve registers from the call stack to display the processor state at different points in time. This video shows how DS-5 Debugger updates the register, local variable, source and disassembly views by simply selecting a certain stack frame, and lets the user modify the value of variables and registers stored in the call stack.

Instruction trace
The obvious limitation of stack frames is that they do not provide a complete history of the code executed on the target.
For example, imagine that you set a breakpoint in function D to catch a bug, and the software executes the following events:

Function A calls function B
Function B calls function C
Function C returns to function B
Function B calls function D

When the breakpoint in function D is hit, the call stack will show Function A Ã Function B Ã Function D. There will be no trace of the call to function C. Similarly, the call stack does not show interrupts that have been taken and handled before the execution stops.

Instruction trace solves exactly this type of problem, as it provides a complete history of the software executed by the target, instruction by instruction. And it does it in a totally non-intrusive way.

Instruction trace requires special hardware on the target, namely an Embedded Trace Macrocell™ (ETM) or a Program Trace Macrocell™ (PTM). This SoC component extracts and compresses information about the software executed by the processor, and redirects it to an on-chip Embedded Trace Buffer™ (ETB) or an off-chip trace port.

Fortunately, most ARM processor-based SoCs have an ETM or PTM. This means you only need a professional debugger and a JTAG run control unit to extract the contents of the ETB, which can normally hold several thousand instructions. The trace stream is decompressed by the debugger and displayed in trace views.

This enables you to go back in time and analyze exactly which code was executed up to the point the bug was caught. This is done with no instrumentation and no intrusiveness.

This video shows the trace view in DS-5 Debugger, and how trace can be navigated and synchronized with source code. DS-5 Debugger also uses color coding to highlight performance related information, such as which instructions are expensive (e.g. branches or data memory accesses) and which functions take a lot of processor time.

Conclusion
Stack frames and instruction trace are both powerful ways to debug timing-related and random software bugs. Stack frame analysis is cheap and gives you information on variable values at different points in time. Instruction trace provides a complete history of instructions executed by the target in a non-intrusive way.

I hope that this information enables you to make better use of your debugger's functionality in the future.

*Be sure to read my previous blog, Semihosting: a life-saver during SoC and board bring-up, for more on this topic.

1 comment
0 members are here

Tools, Software and IDEs blog

GCC 15: Continuously Improving

Tamar Christina

GCC 15 brings major Arm optimizations: enhanced vectorization, FP8 support, Neoverse tuning, and 3–5% performance gains on SPEC CPU 2017.
- June 26, 2025
GitHub and Arm are transforming development on Windows for developers

Pareena Verma

Develop, test, and deploy natively on Windows on Arm with GitHub-hosted Arm runners—faster CI/CD, AI tooling, and full dev stack, no emulation needed.
- May 20, 2025
What is new in LLVM 20?

Volodymyr Turanskyy

Discover what's new in LLVM 20, including Armv9.6-A support, SVE2.1 features, and key performance and code generation improvements.
- April 29, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Get the best from ARM debug tools: Stack frames & instruction trace

GCC 15: Continuously Improving

GitHub and Arm are transforming development on Windows for developers

What is new in LLVM 20?