Over the past few months, I’ve been working on a Real Time Operating System (RTOS) for the ARM® Cortex®-A9 processor, ported from the Keil® RTX RTOS for Cortex-M series of processors. In this blog, I’d like to share with you how I used the ARM CoreSight™ trace capabilities of the ARM Development Studio 5 (DS-5™) to locate a stack corruption problem, deep down in the innards of RTX’s task switcher.
Trace enables you to non-intrusively collect the sequence of instructions that were executed on the target platform – which is really useful when trying to debug thorny real-time issues. The Cortex-A9 processor core can feature a trace interface to an (optional) CoreSight Program Trace Macrocell (PTM) that allows tracing of program flow (instructions only, not data). The PTM outputs the raw trace information to an on-chip Embedded Trace Buffer (ETB) or in more recent devices to an on-chip Embedded Trace FIFO (ETF), or to a parallel trace port via the Trace Port Interface Unit (TPIU) which can be connected to external trace-capable tools.
Trace data captured in ETB/ETF can be read via JTAG using low-cost probes such as the Keil ULINK™ family, but typically those can only store up to a few kilobytes of data. Trace data from the TPIU, however, can be captured by external hardware such as the ARM DSTREAM™ debug and trace unit in its massive gigabytes-deep buffers to allow much longer trace runs. Debuggers can retrieve the captured data to display a complete trace execution history, including branches, exceptions, and whether conditional instructions were executed or skipped. Timestamps, Context IDs and VMIDs can also be captured on higher-end devices. The debugger can then match the instructions executed to the source code via ELF/DWARF debug information, to show which functions were being executed over time. And because CoreSight trace is non-intrusive, the code runs at full speed on the target – tracing does not alter any timing aspects of the code that might cause the symptoms of the bug to "move", unlike some other debug methods such as inserting printf() statements.
In RTX, services are requested from the RTOS using Supervisor (SVC) calls. In the example below, we can see an osDelay function being implemented by calling into the RTOS using an SVC instruction. This results in an SVC Exception occurring, accompanied by a branch via the SVC entry in the vector table (at offset 8 from address 0x20020000) to an SVC Handler. Every instruction that is executed in this sequence, and the SVC Exception event itself, is captured in the Trace.
An example Trace view in DS-5 Debugger
In the navigational timeline, the colour coding is a "heat" map showing the executed instructions and the amount of instructions each function executes in each timeline - the darker red showing more instructions and the lighter yellow showing fewer instructions. At a scale of 1:1 however, the colour scheme changes to display memory access instructions as a darker red colour, branch instructions as a medium orange colour, and all the other instructions as a lighter green colour.
Configuring DS-5 Debugger to collect trace is really easy, due to the slick user interface into its Jython-based Debug and Trace Services Layer (DTSL). To see this in action, create a new debug launch configuration to your target with Run -> Debug Configurations… - mine is shown in the screenshot below:
Debug launch configuration
Then click on the DTSL Options Edit... button to reveal the trace configuration options. For my investigations, I enabled Cortex-A9 Trace via DSTREAM 4GB Trace Buffer, as shown below:
DTSL Options, enabling Trace
I ran the RTX-based application code by clicking on the Debug button but, after a minute or so of running apparently quite happily, the code would crash with a Data Abort:
A Data Abort occurred in the RTX test application
To trap the Data Abort exception in DS-5 Debugger, I configured the Manage Signals dialog on the Breakpoints view menu:
Trapping the Data Abort in DS-5 Debugger
Trying this several times, I noticed the address where the Abort occurred was always the same = 0x2003086C (note that the Abort mode return address stored in LR_ABT points 2 instructions ahead at 0x200308F4). The instruction at 0x2003086C that results in the Abort is “POP {r4,pc}” to return from the osDelay subroutine, by popping the return address off the stack:
The POP instruction triggered the abort – by why?
The Abort is occurring because the stack pointer (saved into SP_ABT on an Abort) was not pointing to a valid stack frame, so garbage is being popped. The stack pointer must have been corrupted at run-time by something, but where?
Just like other RTOS’s, RTX runs with interrupts enabled for OS timing functions. To support different priority levels of interrupt, RTX’s interrupt handler is re-entrant. In RTX, RTOS services are requested via SVC calls. SVC can be nested, so RTX’s SVC handler is also re-entrant. In my experience, stack corruption can sometimes occur in these handlers if they are not written carefully, so I guessed that looking for occurrences of IRQ and SVC exceptions in the trace stream might be fruitful. IRQ and SVC exceptions are captured in the trace by default, but can be thousands of instructions apart. Thankfully, DS-5 Debugger has configurable filters so that, for example, only exceptions are displayed, so that these can be spotted more easily.
DS-5 Debugger is also able to export the captured trace data into a text file. DS-5 Debugger captures up to 1000 instructions by default but, to ensure I captured the exceptions I was interested in, I ended up capturing 100000 instructions. It turned out that the IRQ that was triggering the problem actually occurred 22,281 instructions before its effects became apparent!
Trace reveals an interrupt occurred at a suspect instruction within the SVC handler
The trace above clearly shows an interrupt occurring inside the SVC Handler. The symbol marked “Unknown” is the IRQ entry in the Vector table. The interrupt was taken after the “STMDB sp, {r0}” instruction at address 0x20020734. This code is deep inside the task-switching code of RTX:
Inside the task-switching code
Notice the pair of instructions “STMDB SP, {R0}” and “LDMDB SP, {SP}^”. Together, these result in the User mode Stack Pointer being restored from a saved value in R0. R0 is pushed onto the stack and then popped off into SP_USR (the “^” meaning the user mode register). The first instruction (STMDB) pushes the previously-saved User SP in R0 onto the top of the stack. The next instruction (LDMDB) loads the value from the top of the stack and puts it into SP_USR. Note that the value of the current stack pointer (SP_SVC) is not changed by either of these instructions - there is no "writeback" ("!") form of these instructions. Unfortunately, if an interrupt is taken in-between these two instructions, as is happening in our example, the value that was stored onto the stack will be overwritten by the IRQ_Handler entry sequence, by the instruction at 0x2002341C:
SRSFD SP!, #Mode_SVC ; Save LR_IRQ and SPRS_IRQ to SVC mode stack
So the Stack Pointer is being corrupted at this point, but its catastrophic effects do not become apparent until 22,280 instructions later! It would have been very hard to track down this bug without Trace.
One possible fix is to disable interrupts around these two instructions, but that would degrade interrupt latency. A better fix is to pre-adjust SP (effectively creating a write-back form) like this:
PUSH {R0} ; Push R0 onto stack
LDMIA SP,{SP}^ ; Load R0 off stack into User SP
ADD SP,SP,#4 ; Move SP back
After re-building the code, the Data Abort no longer occurred, even after leaving it to run for hours.
Here, we’ve seen how effective trace can be to locate bugs in real-time systems, even in the presence of additional complexities such as re-entrant interrupt and supervisor calls, and where the effect of a bug does not become apparent until maybe thousands of instructions later. Being non-intrustive, tracing does not alter any timing aspects of the code that might cause the symptoms of the bug to "move", unlike some other debug methods such as inserting printf() statements, and the code runs at full speed on the target whilst being traced. The powerful visualisation features of DS-5 Debugger enable us to survey and filter through an extensive history of instruction execution, then deep-dive into functions and scrutinize individual instructions and the source code used to generate them.
Really helpful article to get someone who has never done something like that up and running. Thanks to the techniques described I was able to fix a similar issue