How CoreSight trace helped me debug my RTOS

October 25, 2013

7 minute read time.

Over the past few months, I’ve been working on a Real Time Operating System (RTOS) for the ARM® Cortex®-A9 processor, ported from the Keil® RTX RTOS for Cortex-M series of processors. In this blog, I’d like to share with you how I used the ARM CoreSight™ trace capabilities of the ARM Development Studio 5 (DS-5™) to locate a stack corruption problem, deep down in the innards of RTX’s task switcher.

What can CoreSight trace do?

Trace enables you to non-intrusively collect the sequence of instructions that were executed on the target platform – which is really useful when trying to debug thorny real-time issues. The Cortex-A9 processor core can feature a trace interface to an (optional) CoreSight Program Trace Macrocell (PTM) that allows tracing of program flow (instructions only, not data). The PTM outputs the raw trace information to an on-chip Embedded Trace Buffer (ETB) or in more recent devices to an on-chip Embedded Trace FIFO (ETF), or to a parallel trace port via the Trace Port Interface Unit (TPIU) which can be connected to external trace-capable tools.

Trace data captured in ETB/ETF can be read via JTAG using low-cost probes such as the Keil ULINK™ family, but typically those can only store up to a few kilobytes of data. Trace data from the TPIU, however, can be captured by external hardware such as the ARM DSTREAM™ debug and trace unit in its massive gigabytes-deep buffers to allow much longer trace runs. Debuggers can retrieve the captured data to display a complete trace execution history, including branches, exceptions, and whether conditional instructions were executed or skipped. Timestamps, Context IDs and VMIDs can also be captured on higher-end devices. The debugger can then match the instructions executed to the source code via ELF/DWARF debug information, to show which functions were being executed over time. And because CoreSight trace is non-intrusive, the code runs at full speed on the target – tracing does not alter any timing aspects of the code that might cause the symptoms of the bug to "move", unlike some other debug methods such as inserting printf() statements.

In RTX, services are requested from the RTOS using Supervisor (SVC) calls. In the example below, we can see an osDelay function being implemented by calling into the RTOS using an SVC instruction. This results in an SVC Exception occurring, accompanied by a branch via the SVC entry in the vector table (at offset 8 from address 0x20020000) to an SVC Handler. Every instruction that is executed in this sequence, and the SVC Exception event itself, is captured in the Trace.

An example Trace view in DS-5 Debugger

In the navigational timeline, the colour coding is a "heat" map showing the executed instructions and the amount of instructions each function executes in each timeline - the darker red showing more instructions and the lighter yellow showing fewer instructions. At a scale of 1:1 however, the colour scheme changes to display memory access instructions as a darker red colour, branch instructions as a medium orange colour, and all the other instructions as a lighter green colour.

Collecting Trace in DS-5 Debugger

Configuring DS-5 Debugger to collect trace is really easy, due to the slick user interface into its Jython-based Debug and Trace Services Layer (DTSL). To see this in action, create a new debug launch configuration to your target with Run -> Debug Configurations… - mine is shown in the screenshot below:

Debug launch configuration

Then click on the DTSL Options Edit... button to reveal the trace configuration options. For my investigations, I enabled Cortex-A9 Trace via DSTREAM 4GB Trace Buffer, as shown below:

DTSL Options, enabling Trace

Tracing an RTOS-based Application

I ran the RTX-based application code by clicking on the Debug button but, after a minute or so of running apparently quite happily, the code would crash with a Data Abort:

A Data Abort occurred in the RTX test application

To trap the Data Abort exception in DS-5 Debugger, I configured the Manage Signals dialog on the Breakpoints view menu:

Trapping the Data Abort in DS-5 Debugger

Trying this several times, I noticed the address where the Abort occurred was always the same = 0x2003086C (note that the Abort mode return address stored in LR_ABT points 2 instructions ahead at 0x200308F4). The instruction at 0x2003086C that results in the Abort is “POP {r4,pc}” to return from the osDelay subroutine, by popping the return address off the stack:

The POP instruction triggered the abort – by why?

The Abort is occurring because the stack pointer (saved into SP_ABT on an Abort) was not pointing to a valid stack frame, so garbage is being popped. The stack pointer must have been corrupted at run-time by something, but where?

Digging Deeper into the Trace

Just like other RTOS’s, RTX runs with interrupts enabled for OS timing functions. To support different priority levels of interrupt, RTX’s interrupt handler is re-entrant. In RTX, RTOS services are requested via SVC calls. SVC can be nested, so RTX’s SVC handler is also re-entrant. In my experience, stack corruption can sometimes occur in these handlers if they are not written carefully, so I guessed that looking for occurrences of IRQ and SVC exceptions in the trace stream might be fruitful. IRQ and SVC exceptions are captured in the trace by default, but can be thousands of instructions apart. Thankfully, DS-5 Debugger has configurable filters so that, for example, only exceptions are displayed, so that these can be spotted more easily.

DS-5 Debugger is also able to export the captured trace data into a text file. DS-5 Debugger captures up to 1000 instructions by default but, to ensure I captured the exceptions I was interested in, I ended up capturing 100000 instructions. It turned out that the IRQ that was triggering the problem actually occurred 22,281 instructions before its effects became apparent!

Trace reveals an interrupt occurred at a suspect instruction within the SVC handler

Locating and fixing the bug

The trace above clearly shows an interrupt occurring inside the SVC Handler. The symbol marked “Unknown” is the IRQ entry in the Vector table. The interrupt was taken after the “STMDB sp, {r0}” instruction at address 0x20020734. This code is deep inside the task-switching code of RTX:

Inside the task-switching code

Notice the pair of instructions “STMDB SP, {R0}” and “LDMDB SP, {SP}^”. Together, these result in the User mode Stack Pointer being restored from a saved value in R0. R0 is pushed onto the stack and then popped off into SP_USR (the “^” meaning the user mode register). The first instruction (STMDB) pushes the previously-saved User SP in R0 onto the top of the stack. The next instruction (LDMDB) loads the value from the top of the stack and puts it into SP_USR. Note that the value of the current stack pointer (SP_SVC) is not changed by either of these instructions - there is no "writeback" ("!") form of these instructions. Unfortunately, if an interrupt is taken in-between these two instructions, as is happening in our example, the value that was stored onto the stack will be overwritten by the IRQ_Handler entry sequence, by the instruction at 0x2002341C:

SRSFD SP!, #Mode_SVC ; Save LR_IRQ and SPRS_IRQ to SVC mode stack

So the Stack Pointer is being corrupted at this point, but its catastrophic effects do not become apparent until 22,280 instructions later! It would have been very hard to track down this bug without Trace.

One possible fix is to disable interrupts around these two instructions, but that would degrade interrupt latency. A better fix is to pre-adjust SP (effectively creating a write-back form) like this:

PUSH {R0} ; Push R0 onto stack

LDMIA SP,{SP}^ ; Load R0 off stack into User SP

ADD SP,SP,#4 ; Move SP back

After re-building the code, the Data Abort no longer occurred, even after leaving it to run for hours.

Summary

Here, we’ve seen how effective trace can be to locate bugs in real-time systems, even in the presence of additional complexities such as re-entrant interrupt and supervisor calls, and where the effect of a bug does not become apparent until maybe thousands of instructions later. Being non-intrustive, tracing does not alter any timing aspects of the code that might cause the symptoms of the bug to "move", unlike some other debug methods such as inserting printf() statements, and the code runs at full speed on the target whilst being traced. The powerful visualisation features of DS-5 Debugger enable us to survey and filter through an extensive history of instruction execution, then deep-dive into functions and scrutinize individual instructions and the source code used to generate them.

1 comment
0 members are here

Tools, Software and IDEs blog

GitHub and Arm are transforming development on Windows for developers

Pareena Verma

Develop, test, and deploy natively on Windows on Arm with GitHub-hosted Arm runners—faster CI/CD, AI tooling, and full dev stack, no emulation needed.
- May 20, 2025
What is new in LLVM 20?

Volodymyr Turanskyy

Discover what's new in LLVM 20, including Armv9.6-A support, SVE2.1 features, and key performance and code generation improvements.
- April 29, 2025
Running KleidiAI MatMul kernels in a bare-metal Arm environment

Paul Black

Benchmarking Arm®︎ KleidiAI MatMul kernels on bare-metal with AC6, GCC, and ATfE compilers.
- April 17, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog