How to debug: CoreSight basics (Part 1)

June 30, 2015

9 minute read time.

Let's be honest, debug can be a bit of a pain. At the best of times it's a nuisance and in the worst case scenario a complex web of wires that need to be configured properly in order to diagnose and solve your SoC design problems.

A study conducted by Cambridge University found that the global cost of debugging was $312bn in 2013, a figure that undoubtedly has risen in the past two years. With this much money and effort dedicated to this part of SoC design, it is necessary to be as efficient as possible when debugging. Arm CoreSight technology provides solutions for the Debug and Trace of complex SoC designs. It can take years to become an expert in the finer details of CoreSight, but in this series of blogs I intend to provide readers with a starting point to understand the concepts which will help you to work with CoreSight. Like any good technical introduction, let's start with some definitions!

Debug

This refers to features to observe or modify the state of parts of the design. Features used for debug include the ability to read and modify register values of processors and peripherals. Debug also includes the use of complex triggering and monitoring resources. Debug frequently involves halting execution once a failure has been observed, and collecting state information retrospectively to investigate the problem.

Trace

CoreSight provides features which allow for continuous collection of system information for later off-line analysis. Execution trace generation macrocells exist for use with processors, software can be instrumented with dedicated trace generation, and some peripherals can generate performance monitoring trace streams.

Trace and Debug are used together at all stages in the design flow from initial platform bring-up, through software development and optimization, and even to in-field debug or failure analysis.

Historically, the following methods of debugging an ARM processor based SoC exist:

Conventional JTAG debug (external debug)

This is invasive debug with the processor halted using:

Breakpoints and watchpoints to halt the processor on specific activity
A debug connection to examine and modify registers and memory, and provide single step execution

Conventional monitor debug (;self-hosted debug)

This is invasive debug with the processor running using a debug monitor that resides in memory.

Trace

This is non-invasive debug with the processor running at full speed using:

A collection of information on instruction execution and data transfers.
Delivery off-chip in real-time, or capture in on-chip memory.
Tools to merge data with source code on a development workstation for future analysis.

CoreSight technology addresses the requirement for a multi-processor debug and trace solution with high bandwidth for entire systems beyond the processor, despite ever increasing SoC complexity and clock speeds. Efficient use of pins made available for debug is crucial.

CoreSight provides:

A library of modular components and interconnects.
Architected discovery and identification methods to allow for flexible system design and easy inclusion of differentiated debug/trace functions.
A standard implementation of the ARM Debug Interface for debug tools to work with.

Elements of a CoreSight design

The CoreSight architecture introduces a number of key concepts which together enable complex systems to be designed. Standardized programming models and feature discovery registers allow debug tools to be largely generic with minimal dependence on the feature set of an individual SoC.

Debug Access Port

The Debug Access Port (DAP) is present on any SoC which presents a physical port to be connected to external debug tools. The DAP is an implementation of the standardized ARM Debug Interface, and provides a bridge between a reliable low pin count interface and on-chip memory mapped peripherals. Check out my next blog for more details on the DAP. Transactions generated by the DAP are referred to as External Debugger Accesses.

The DAP provides (amongst other things) architected top level control for debug domain power control, and fast code download direct to system memory.

CoreSight components implement memory mapped interfaces, but the DAP can also act as a bridge to an on-chip JTAG scan chain where necessary for legacy components. This gives increased flexibility and power savings when working with multiple clock and power domains on the SoC.

Self Hosted Debug

Most processors have direct access to their own debug resources by using dedicated instructions. In addition, it is common for most processors on a SoC to have access to some or all of the remaining debug components. Exact details vary, but there is typically a region in the system memory map which is multiplexed with external accesses to the debug components. Self hosted debug is typically managed by debug monitor software running on either the target processor or a second processor in the SoC. Access control mechanisms are provided to permit interworking between an external debugger and self-hosted debug such that the external debugger does not need to be aware of the actions of the debug monitor.

Save and Restore sequences can be used by on-chip software to maintain the debug state across power-down cycles, and provide the illusion to the external debugger that the SoC remains powered on. This is particularly important for debug of battery powered devices where infrequent events are being monitored.

Discovery using ROM Tables

All CoreSight systems will include at least one ROM table. This serves the purpose of both uniquely identifying the SoC to an external debugger, and allowing discovery of all of the debug components in a system. Discovery relies on the use of identification registers at architected positions in the memory map of every debug component. All CoreSight components use this standard. This permits discovery sequences of identify at least a sub-set of the feature-set without detailed knowledge of every component. For both external debug, and self-hosted debug, there is a pointer to the address of the top-level ROM table from that debug agent. The ROM table provides a list of address offsets which can be used to locate the next level of component. Components can be ROM tables again, or individual components. Provided the system complies with the rule that each component is only referenced once in the ROM tables and there are no loops, it is possible to identify all the debug components which are accessible to each debug agent.

Processor debug and monitoring features

The exact features vary between processor design, and can also vary from one implementation of a processor to another. Processors typically provide a halting debug mode (where architectural state can be observed) and single step execution. Also common are breakpoint units and Performance Monitoring Units (PMU). CoreSight provides an Embedded Cross Trigger mechanism to synchronize or distribute debug requests and profiling information across the SoC.

Cross Triggering

CoreSight Embedded Cross Trigger (ECT) functionality provides modules for connecting and routing arbitrary signals for use by debug tools. Wherever there are signals to sample or drive, a Cross Trigger Interface (CTI) is used to control the selection of which signals are of interest. Most systems will implement a CTI per processor, and at least one CTI for system level components. The CTIs in the system are interconnected using a Cross Trigger Matrix (CTM) which distributes any selected input events across the SoC to every CTI. Each CTI is programmed to use these distributed events to drive local control signals.

For processors and ETM trace units, the event connections to the CTI are standardized (although this does vary from processor to processor, as described in the processor documentation). Typical connections are listed below.

Source	Destination	Example use case
Trace logic External Outputs (4 bits)	CTI Trigger inputs	Trace logic resources to trigger trace capture or debug
Trace logic External Outputs (2 bits)	PMU inputs	PMU counters to extend trace logic counters
PMU Events (~30 bits)	Trace logic External inputs	Filter trace based on processor events such as cache miss
PMU overflow	CTI Trigger inputs	Forward PMU counter overflow to interrupt controller or other clusters
Processor Debug Restart	CTI Trigger input	Synchronized debug restart across clusters (supporting halt and restart)
Trace Buffer Full	CTI Trigger input	Halt processor on trace buffer full
CTI Trigger Output	Processor interrupt input	Cause interrupt based on input to CTI or other CTI in system
CTI Trigger Output	Processor Debug Halt Request	Enter debug state based on input to CTI or other CTI in system
CTI Trigger Output	Trace Port Trigger request	Indicate trace trigger to trace capture device

Table 1 - Cross Trigger Connections

Trace Sources

CoreSight technology provides a standard infrastructure for the transmission and capture of trace data (presented as arbitrary streams of bytes). This allows for optimum sharing of common resources. Various trace sources are available:

Processor Trace Units

Processor debug is implemented by Embedded Trace Macrocells (ETM trace unit) or Program Trace Macrocells (PTM trace unit) depending on the target processor. Each ETM trace unit or PTM trace unit is specific to the processor it is designed for.

The feature set varies depending on the use cases anticipated for the different processors, but all CoreSight ETM and PTM trace units which use an AMBA Trace Bus (ATB) output can be combined in a system. Trace units might support the following:

Processor execution trace in varying degrees of detail
Resource logic, often useful as an extension to processor performance monitoring resources
Filtering logic to reduce the amount of non-interesting data which is captured

A common feature of trace units is efficient compression and encoding, relying on a copy of the executed code for decompression. Using halting debug, it is possible to extract the code image from program memory.

Instrumentation Trace Units

The instrumentation trace and system trace units provide the ability for running software to be instrumented with messaging (either by the programmer, or through a tool flow). This is more intrusive than using processor trace, but provides information at a higher level. The instrumentation trace macrocells are typically mapped into system memory. Tightly coupled Instrumentation Trace Macrocells (ITM) exist for some processors, the System Trace Macrocell (STM) is a more generic version which can be used in any system.

Trace (ATB) interconnect

One advantage of using a standard trace bus protocol is that a small set of modular components can be used to generate sophisticated trace infrastructure. These components include bridges for timing closure, clock and power domain crossing, replicators and funnels which can be used to combine data streams, and buffer components. Upsizers and downsizers are used to convert busses of varying data width. A key feature of the AMBA Trace Bus (ATB) is that the trace source identification is passed with the data, permitting cycle by cycle interleaving of trace data from different sources. CoreSight trace interconnects provide the following features:

Backpressure to stall a trace source based on the ability of downstream infrastructure to collect data
Flushing of any data stored in intermediate buffer components through the interconnect
Transfer of byte orientated data, agnostic to the underlying data protocol
Synchronisation request distribution

Trace Sinks

A trace sink is the final CoreSight component in a trace interconnect. A system can have more than one trace sink, configured to collect overlapping or distinct sets of trace data. Trace sinks can stream data off chip, provide a dedicated buffer, or route trace data into shared system memory. These different solutions cover a wide range of latency and bandwidth capabilities

If you enjoyed this piece, why not read my next blog, below, which looks at processor trace architectures and debug access ports. If you have any questions then please leave a comment below, I'll get back to you ASAP!

How to debug: part 2

7 comments
0 members are here

Architectures and Processors blog

Introducing GICv5: Scalable and secure interrupt management for Arm

Christoffer Dall

Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
- April 28, 2025
Getting started with AARCHMRS Features.json using Python

Joh

A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
- April 8, 2025
Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

Samer El-Haj-Mahmoud

Arm and 9elements Cyber Security have brought a prototype of OpenBMC to the Arm Neoverse Compute Subsystem (CSS) to advancing server manageability.
- January 28, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog