ARM Cycle Models have long been used to perform design tasks such as:
In October 2015, ARM acquired the assets of Carbon Design Systems with the primary goal of enabling earlier availability of cycle accurate models for ARM processors and system IP. The announcement of the ARM® Cortex®-R8 is the first step in demonstrating the benefits of early Cycle Model availability. Another goal is to provide Cycle Models which can be used in SystemC simulation environments. The Cortex-R8 model is the first Cycle Model available for use in the Accellera SystemC environment right from the start.
The Cortex-R8 model has been available to lead partners since the beginning of 2016 and will be generally available on ARM IP Exchange this month.
Earlier cycle accurate model availability has led to increased focus on using Cycle Models to understand new processors. This article describes some of the ways the Cycle Model has been used by ARM silicon partners to understand the Cortex-R8.
Prior to early availability of Cycle Models these tasks would have been performed using RTL simulation or FPGA boards. RTL simulation can be cumbersome, especially for software engineers doing benchmarking tasks, and it lacks software debugging and performance analysis features. FPGA boards are familiar to software engineers, but lack the ability change CPU build-time parameters such as cache and TCM sizes.
The examples below provide more insight on how Cycle Models are being used.
Benchmarking
A common activity for a new processor such as Cortex-R8 is to run various benchmarks and measure how many cycles are required for various C functions. SoC Designer provides an integrated disassembly view which can be used to set breakpoints to run from point A to point B and measure cycle counts.
DS-5 can also be connected to the Cortex-R8 for a full source code view of the software.
The cycle count is always visible on the toolbar of SoC Designer.
Many times a simple subtraction is all that is needed to measure cycle count between breakpoints.
After the first round of benchmarking is done, the code can be moved from external memory to TCM and execution repeated. The Cortex-R8 cycle model will boot from ITCM when the INITRAM parameters are set to true. Right clicking on the Cortex-R8 model and setting parameters make it easy to change between external memory and TCM.
In addition to just counting cycles, SoC Designer provides additional analysis features. One useful feature is a transaction view.
The transaction monitor can be used to make sure the expected transactions are occurring on the bus. For example, when running out of TCM little or no bus activity is expected on the AXI interface, and if there is activity it usually indicates incorrect configuration. Below shows a transaction view of the activity on the AXI interface when running from external memory. Each transaction has a start and end time to indicate how long it takes.
All PMU events are instrumented and can be automatically captured in Cycle Models. These are viewed by enabling the profiling feature and looking at the results using the analyzer view. The hex values to the left of each event correspond to the event codes in the Technical Reference Manual. In addition to raw values, graphs of events over time can be created to identify hotspots.
The analysis tools also provide information about bus utilization, latency, transaction counts, retired instructions, branch prediction, and cache metrics as shown below. Custom reports can also be generated.
After observing a benchmark in external memory and TCM, it’s common to change TCM sizes and cache sizes. Models with different cache sizes and TCM sizes can easily be configured and created using ARM IP Exchange and the impact on the benchmark observed. The IP configuration page is shown below. Generating a new model is as simple as selecting new values on the web page and pushing the build button. After the compilation is done the new model is ready for download and can replace the current Cortex-R8 model.
Another use of the Cortex-R8 Cycle Model is to analyze the performance impact of adding the PL310 L2 cache controller. There is a Cycle Model of the PL310 available from ARM IP Exchange. It can be added into a system and enabled by programming the registers of the cache controller. The register view is shown below.
SoC Designer provides ideal memory models which can be configured for various wait states and delays. Performance of memory accesses using these memory models can be compared with adding the PL310 into the system. The same analysis tools can be used to determine latency values from the L2 cache and the overall performance impact of adding the L2 cache. Right clicking on the PL310 and enabling the profiling features will generate latency and throughput information for the analysis view.
Example systems using the Cortex-R8 and software to configure the system and run various programs are available from ARM System Exchange. The systems serve as a quick start by providing cycle accurate IP models, fully configured and initialized systems, and software source code. Most users take an example system as a starting point and then modify and customize it to meet particular design tasks.
Previously, the only ways to evaluate performance and understand the details of a new ARM processor were RTL simulation or FPGA boards with fixed configurations. ARM Cycle Models have become the new standard for IP evaluation and early benchmarking and performance analysis. The Cortex-R8 Cycle Model is available for use in SoC Designer and SystemC simulation. Example systems and software are available, models of different configurations can be easily generated using ARM IP Exchange, and the software debugging and performance analysis features make Cycle Models an easy to use environment to evaluate and make informed IP selection decisions.