Let's be honest, debug can be a bit of a pain. At the best of times it's a nuisance and in the worst case scenario a complex web of wires that need to be configured properly in order to diagnose and solve your SoC design problems.
A study conducted by Cambridge University found that the global cost of debugging was $312bn in 2013, a figure that undoubtedly has risen in the past two years. With this much money and effort dedicated to this part of SoC design, it is necessary to be as efficient as possible when debugging. Arm CoreSight technology provides solutions for the Debug and Trace of complex SoC designs. It can take years to become an expert in the finer details of CoreSight, but in this series of blogs I intend to provide readers with a starting point to understand the concepts which will help you to work with CoreSight. Like any good technical introduction, let's start with some definitions!
This refers to features to observe or modify the state of parts of the design. Features used for debug include the ability to read and modify register values of processors and peripherals. Debug also includes the use of complex triggering and monitoring resources. Debug frequently involves halting execution once a failure has been observed, and collecting state information retrospectively to investigate the problem.
CoreSight provides features which allow for continuous collection of system information for later off-line analysis. Execution trace generation macrocells exist for use with processors, software can be instrumented with dedicated trace generation, and some peripherals can generate performance monitoring trace streams.
Trace and Debug are used together at all stages in the design flow from initial platform bring-up, through software development and optimization, and even to in-field debug or failure analysis.
Historically, the following methods of debugging an ARM processor based SoC exist:
This is invasive debug with the processor halted using:
This is invasive debug with the processor running using a debug monitor that resides in memory.
Trace
CoreSight technology addresses the requirement for a multi-processor debug and trace solution with high bandwidth for entire systems beyond the processor, despite ever increasing SoC complexity and clock speeds. Efficient use of pins made available for debug is crucial.
CoreSight provides:
The CoreSight architecture introduces a number of key concepts which together enable complex systems to be designed. Standardized programming models and feature discovery registers allow debug tools to be largely generic with minimal dependence on the feature set of an individual SoC.
The Debug Access Port (DAP) is present on any SoC which presents a physical port to be connected to external debug tools. The DAP is an implementation of the standardized ARM Debug Interface, and provides a bridge between a reliable low pin count interface and on-chip memory mapped peripherals. Check out my next blog for more details on the DAP. Transactions generated by the DAP are referred to as External Debugger Accesses.
The DAP provides (amongst other things) architected top level control for debug domain power control, and fast code download direct to system memory.
CoreSight components implement memory mapped interfaces, but the DAP can also act as a bridge to an on-chip JTAG scan chain where necessary for legacy components. This gives increased flexibility and power savings when working with multiple clock and power domains on the SoC.
Most processors have direct access to their own debug resources by using dedicated instructions. In addition, it is common for most processors on a SoC to have access to some or all of the remaining debug components. Exact details vary, but there is typically a region in the system memory map which is multiplexed with external accesses to the debug components. Self hosted debug is typically managed by debug monitor software running on either the target processor or a second processor in the SoC. Access control mechanisms are provided to permit interworking between an external debugger and self-hosted debug such that the external debugger does not need to be aware of the actions of the debug monitor.
Save and Restore sequences can be used by on-chip software to maintain the debug state across power-down cycles, and provide the illusion to the external debugger that the SoC remains powered on. This is particularly important for debug of battery powered devices where infrequent events are being monitored.
All CoreSight systems will include at least one ROM table. This serves the purpose of both uniquely identifying the SoC to an external debugger, and allowing discovery of all of the debug components in a system. Discovery relies on the use of identification registers at architected positions in the memory map of every debug component. All CoreSight components use this standard. This permits discovery sequences of identify at least a sub-set of the feature-set without detailed knowledge of every component. For both external debug, and self-hosted debug, there is a pointer to the address of the top-level ROM table from that debug agent. The ROM table provides a list of address offsets which can be used to locate the next level of component. Components can be ROM tables again, or individual components. Provided the system complies with the rule that each component is only referenced once in the ROM tables and there are no loops, it is possible to identify all the debug components which are accessible to each debug agent.
The exact features vary between processor design, and can also vary from one implementation of a processor to another. Processors typically provide a halting debug mode (where architectural state can be observed) and single step execution. Also common are breakpoint units and Performance Monitoring Units (PMU). CoreSight provides an Embedded Cross Trigger mechanism to synchronize or distribute debug requests and profiling information across the SoC.
CoreSight Embedded Cross Trigger (ECT) functionality provides modules for connecting and routing arbitrary signals for use by debug tools. Wherever there are signals to sample or drive, a Cross Trigger Interface (CTI) is used to control the selection of which signals are of interest. Most systems will implement a CTI per processor, and at least one CTI for system level components. The CTIs in the system are interconnected using a Cross Trigger Matrix (CTM) which distributes any selected input events across the SoC to every CTI. Each CTI is programmed to use these distributed events to drive local control signals.
For processors and ETM trace units, the event connections to the CTI are standardized (although this does vary from processor to processor, as described in the processor documentation). Typical connections are listed below.
Table 1 - Cross Trigger Connections
CoreSight technology provides a standard infrastructure for the transmission and capture of trace data (presented as arbitrary streams of bytes). This allows for optimum sharing of common resources. Various trace sources are available:
Processor debug is implemented by Embedded Trace Macrocells (ETM trace unit) or Program Trace Macrocells (PTM trace unit) depending on the target processor. Each ETM trace unit or PTM trace unit is specific to the processor it is designed for.
The feature set varies depending on the use cases anticipated for the different processors, but all CoreSight ETM and PTM trace units which use an AMBA Trace Bus (ATB) output can be combined in a system. Trace units might support the following:
A common feature of trace units is efficient compression and encoding, relying on a copy of the executed code for decompression. Using halting debug, it is possible to extract the code image from program memory.
The instrumentation trace and system trace units provide the ability for running software to be instrumented with messaging (either by the programmer, or through a tool flow). This is more intrusive than using processor trace, but provides information at a higher level. The instrumentation trace macrocells are typically mapped into system memory. Tightly coupled Instrumentation Trace Macrocells (ITM) exist for some processors, the System Trace Macrocell (STM) is a more generic version which can be used in any system.
One advantage of using a standard trace bus protocol is that a small set of modular components can be used to generate sophisticated trace infrastructure. These components include bridges for timing closure, clock and power domain crossing, replicators and funnels which can be used to combine data streams, and buffer components. Upsizers and downsizers are used to convert busses of varying data width. A key feature of the AMBA Trace Bus (ATB) is that the trace source identification is passed with the data, permitting cycle by cycle interleaving of trace data from different sources. CoreSight trace interconnects provide the following features:
A trace sink is the final CoreSight component in a trace interconnect. A system can have more than one trace sink, configured to collect overlapping or distinct sets of trace data. Trace sinks can stream data off chip, provide a dedicated buffer, or route trace data into shared system memory. These different solutions cover a wide range of latency and bandwidth capabilities
If you enjoyed this piece, why not read my next blog, below, which looks at processor trace architectures and debug access ports. If you have any questions then please leave a comment below, I'll get back to you ASAP!
[CTAToken URL = "https://community.arm.com/processors/b/blog/posts/how-to-debug-coresight-basics-part-2"_blank" text="How to debug: part 2" class ="green"]
Nice post!! Eoin.
I am lost in the following statement. Could you explain more? Thanks.
Best,
Patrick
Good introduction to Coresight.