The Arm CoreSight ELA-500 Embedded Logic Analyzer provides low level signal visibility into Arm IP and 3rd party IP. When used with a processor, it provides visibility of load, stores, speculative fetches, cache activity and transaction life cycle, none of which are available through instruction tracing.
CoreSight ELA-500 enables swift hardware assisted debug of otherwise hard-to-trace issues, including data corruption and dead/live locks. As well as accelerating debug cycles during complex IP bring up, it provides extra assistance for post deployment debug.
CoreSight ELA-500 offers on-chip visibility of both Arm and proprietary IP blocks. Trigger conditions can be programmed over standard debug interfaces either directly by an on-chip processor or an external debugger.
This guide is intended to demonstrate how the CoreSight ELA-500 can be used with Arm DS-5 Development Studio to debug a real-world deadlock scenario on a Cortex-A72 + CoreSight ELA-500 based system, caused by a bus transaction hang.
One of the most common deadlock scenarios can be caused when a processor initiates memory transactions to a location in the system in which no bus slave exists or the bus slave has limitations such as not being able to handle burst transactions. This type of incomplete transaction can ultimately lead to the processor locking-up (deadlock).
In a perfect world, systems should be designed in such a way that all the entire physical memory map is fully populated. Meaning that all memory transactions, to all addresses, will correctly respond with either a valid transaction result or a bus fault. This said, for certain designs this may not always be the case. The aggressive speculation and prefetching performed by Arm processors mean that these memory map “holes” are more likely to be exposed by incorrect software, even if these memory “holes” are not explicitly referenced by software.
Software can prevent this by correctly configuring the MMU translation tables to accurately describe the physical memory map. Software should configure any memory map “holes” as being Invalid. Configuring the MMU this way will prevent the processor from making any physical bus transactions to that location, and ultimately preventing this type of deadlock scenario.
Debugging these types of deadlock scenarios pose an issue when debugging using traditional methods, such as external debug, and instruction / data trace. A processor core which has locked-up due to an incomplete transaction, will likely not be able to enter halt mode debug. Effectively, the external debugger is unable to break the processor and inspect its internal state. Trace capture may still be available, but will not provide any record of the speculative or prefetched transaction which may be responsible for deadlock.
The CoreSight ELA-500 can be used effectively in this scenario to trace the external bus transactions made by the processor (both explicitly and speculatively). This guide intends to showcase the use case scripting capabilities of DS-5 and demonstrate the example CoreSight ELA-500 use case script shipped with Arm DS-5 Development Studio.
NOTE: The scripts required to program the ELA-500 were added to Arm DS-5 in version 5.25. Please ensure this or a later version of DS-5 is installed.
The ELA-500 can be implemented with up to 12 Signal Groups, each containing 64, 128, or 256 signals. Which signals are connected to each of the signals in the signal groups will be dependent on the system and the IP that it is connected to. The specific signal interfaces will be documented in the relevant documentation (low level signal description documents like this are typically not publicly available and are made available only to licensees of the Arm IP). Arm IP connected to an ELA will be supplied with a JSON file which documents and annotates the signal group connections for that particular IP, in a machine-readable format. The JSON file can be interpreted by DS-5 to allow seamless debugging of a piece of IP using DS-5 and the ELA.
Signals typically consist of debug signals (status or output), and qualifiers (trigger). Qualifier signals may be required to determine that the debug signal is valid. Debug signals are valid when the qualifier signal(s) are asserted.
For the purposes of this demonstration, the Cortex-A72 + ELA-500 system utilizes the LAK-500A. The LAK-500A is an Integration Kit for the ELA-500, and the Cortex-A72, it is an add-on to the ELA-500. The LAK-500A exposes a number of pre-defined debug observation ports to the Cortex-A72 (Signal Groups), and provides the corresponding JSON signal mapping file.
As part of the LAK-500A, one of the debug observation ports to the Cortex-A72 exposes the physical read address signal bus “ARADDR” and an address valid signal. “ARVALID”.
NOTE: These signal names have been obfuscated for this blog post.
These signals are required to determine the read addresses issued by the core, prior to the “lock-up”. Post analysis of these read transactions will help identify which transaction may have caused the fault.
Configuration of the ELA-500 can be achieved either by scripting a use case script or using a configuration GUI interface. The application specific use case script allows a user to script a specific debug recipe. The debug recipe would be used to debug a specific debug scenario with the ELA-500. An example of this can be found by navigating to the following use case script:
Scripts window → Use case → DTSLELA-500 → ela_example.py → Configure ELA
For this demonstration, we will use the GUI ELA-500 Configuration Utility to configure the ELA-500 for our specific debug scenario. DS-5 must be connected to the target SoC prior to ELA configuration.
The ELA-500 uses a “ones hot” encoding for the Signal Group in the Signal Select registers. In this case, Signal Group 0 is selected by programming 0x1 in the ‘Select Signal Group’ field. This will in effect program SIGSEL0 == 0x1 (Trigger State 0 will be associated with the trigger signals in Signal Group 0).
We also need to program the Signal Comparison condition. In this case, we want to trigger when the “ARVALID” signal is valid (ACTIVE HIGH), so we program ‘Signal Comparison (COMP)' to “Equal”.
Finally, we need to program the Next state. This is the ELA state we will enter when we meet the trigger condition. In our case we want to capture on each “ARVALID” assertion. Therefore, we program the ‘Next state’ field to 0x1 (ones hot for Trigger state 0).
Trigger State0’s Signal Compare and Signal Mask value for Signal Group 0 needs to be programmed to monitor the ‘ARVALID’ signal. The bit position of the ‘ARVALID’ signal is documented in the IPs corresponding JSON file or documentation.
You will need to scroll down to find the entry for the Signal Mask and Signal Compare fields. in our example, ARVALID is mapped to bit 83 so we need to input the [95:64]0x00080000 value for both the ‘Signal Mask’ and ‘Signal Compare’.
Click ‘Apply ‘ and then ‘Ok’.
Ensure that Signal group 0 is selected for ‘State 0’ and click ‘OK’.
The result of the ELA-500 recipe programmed above, means that the ELA will have traced each read transaction and stored them into a circular buffer. This circular buffer will hold X number of read transactions (where X relates to the size of the ELA-500 SRAM and number of signals). These read transactions will have been generated by both explicit reads and speculative reads. Post hang analysis of the read transactions can identify rogue accesses to the potential holes in the memory map.
The trace capture shows several accesses outside the bounds of the memory copy routine explicitly called. The last address explicitly read by the core was 0x01001fc0. The processor prefetecher continued to read memory from 0x01002000, 0x01002040 and 0x01002080. These memory accesses are to addresses which reside outside of the internal SRAM. These addresses should have been configured in the translation tables as Invalid. This would have prevented the prefetcher from prefeteching from this region of memory.
Address read valid = 0x1 Shareability = Inner Shareable Execution state = AARCH64 Cache Attr = Write-back, read/write allocate Access size = 64 bytes Read address = 0x01001fc0 Address read valid = 0x1 Sharability = Inner Shareable Execution state = AARCH64 Cache Attr = Write-back, read/write allocate Access size = 64 bytes Read address = 0x01002000 Address read valid = 0x1 Shareability = Inner Shareable Execution state = AARCH64 Cache Attr = Write-back, read/write allocate Access size = 64 bytes Read address = 0x01002040 Address read valid = 0x1 Shareability = Inner Shareable Execution state = AARCH64 Cache Attr = Write-back, read/write allocate Access size = 64 bytes Read address = 0x01002080
[CTAToken URL = "https://developer.arm.com/products/system-ip/coresight-debug-and-trace/coresight-components/coresight-ela-500-embedded-logic-analyzer" target="_blank" text="CoreSight ELA-500 Embedded Logic Analyzer" class ="green"]