Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Architectures and Processors blog Using ARM Cycle Models to Understand the Cortex-R8
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
  • models
  • performance_analysis
  • Cycle Models
  • cortex-r8
  • benchmarking
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Using ARM Cycle Models to Understand the Cortex-R8

Jason Andrews
Jason Andrews
February 18, 2016
5 minute read time.

ARM Cycle Models have long been used to perform design tasks such as:

  • IP Evaluation
  • System Architecture Exploration
  • Software Development
  • Performance Optimization

In October 2015, ARM acquired the assets of Carbon Design Systems with the primary goal of enabling earlier availability of cycle accurate models for ARM processors and system IP. The announcement of the ARM® Cortex®-R8 is the first step in demonstrating the benefits of early Cycle Model availability. Another goal is to provide Cycle Models which can be used in SystemC simulation environments. The Cortex-R8 model is the first Cycle Model available for use in the Accellera SystemC environment right from the start.

The Cortex-R8 model has been available to lead partners since the beginning of 2016 and will be generally available on ARM IP Exchange this month.

Earlier cycle accurate model availability has led to increased focus on using Cycle Models to understand new processors. This article describes some of the ways the Cycle Model has been used by ARM silicon partners to understand the Cortex-R8.

Prior to early availability of Cycle Models these tasks would have been performed using RTL simulation or FPGA boards. RTL simulation can be cumbersome, especially for software engineers doing benchmarking tasks, and it lacks software debugging and performance analysis features. FPGA boards are familiar to software engineers, but lack the ability change CPU build-time parameters such as cache and TCM sizes.

The examples below provide more insight on how Cycle Models are being used.

Benchmarking

A common activity for a new processor such as Cortex-R8 is to run various benchmarks and measure how many cycles are required for various C functions. SoC Designer provides an integrated disassembly view which can be used to set breakpoints to run from point A to point B and measure cycle counts.

diss.PNG

DS-5 can also be connected to the Cortex-R8 for a full source code view of the software.

ds5.PNG

The cycle count is always visible on the toolbar of SoC Designer.

cycle-count.PNG

Many times a simple subtraction is all that is needed to measure cycle count between breakpoints.

After the first round of benchmarking is done, the code can be moved from external memory to TCM and execution repeated. The Cortex-R8 cycle model will boot from ITCM when the INITRAM parameters are set to true. Right clicking on the Cortex-R8 model and setting parameters make it easy to change between external memory and TCM.

param.PNG

In addition to just counting cycles, SoC Designer provides additional analysis features. One useful feature is a transaction view.

The transaction monitor can be used to make sure the expected transactions are occurring on the bus. For example, when running out of TCM little or no bus activity is expected on the AXI interface, and if there is activity it usually indicates incorrect configuration. Below shows a transaction view of the activity on the AXI interface when running from external memory. Each transaction has a start and end time to indicate how long it takes.

trans.PNG

All PMU events are instrumented and can be automatically captured in Cycle Models. These are viewed by enabling the profiling feature and looking at the results using the analyzer view. The hex values to the left of each event correspond to the event codes in the Technical Reference Manual. In addition to raw values, graphs of events over time can be created to identify hotspots.

an-pmu.PNG

The analysis tools also provide information about bus utilization, latency, transaction counts, retired instructions, branch prediction, and cache metrics as shown below.  Custom reports can also be generated.

sys-met.PNG

After observing a benchmark in external memory and TCM, it’s common to change TCM sizes and cache sizes. Models with different cache sizes and TCM sizes can easily be configured and created using ARM IP Exchange and the impact on the benchmark observed. The IP configuration page is shown below. Generating a new model is as simple as selecting new values on the web page and pushing the build button. After the compilation is done the new model is ready for download and can replace the current Cortex-R8 model.

ip-config.PNG

Cache and Memory Latency

Another use of the Cortex-R8 Cycle Model is to analyze the performance impact of adding the PL310 L2 cache controller. There is a Cycle Model of the PL310 available from ARM IP Exchange. It can be added into a system and enabled by programming the registers of the cache controller. The register view is shown below.

pl310.PNG

SoC Designer provides ideal memory models which can be configured for various wait states and delays. Performance of memory accesses using these memory models can be compared with adding the PL310 into the system. The same analysis tools can be used to determine latency values from the L2 cache and the overall performance impact of adding the L2 cache. Right clicking on the PL310 and enabling the profiling features will generate latency and throughput information for the analysis view.

Example systems using the Cortex-R8 and software to configure the system and run various programs are available from ARM System Exchange. The systems serve as a quick start by providing cycle accurate IP models, fully configured and initialized systems, and software source code. Most users take an example system as a starting point and then modify and customize it to meet particular design tasks.

Conclusion

Previously, the only ways to evaluate performance and understand the details of a new ARM processor were RTL simulation or FPGA boards with fixed configurations. ARM Cycle Models have become the new standard for IP evaluation and early benchmarking and performance analysis. The Cortex-R8 Cycle Model is available for use in SoC Designer and SystemC simulation. Example systems and software are available, models of different configurations can be easily generated using ARM IP Exchange, and the software debugging and performance analysis features make Cycle Models an easy to use environment to evaluate and make informed IP selection decisions.

Anonymous
Architectures and Processors blog
  • When a barrier does not block: The pitfalls of partial order

    Wathsala Vithanage
    Wathsala Vithanage
    Acquire fences aren’t always enough. See how LDAPR exposed unsafe interleavings and what we did to patch the problem.
    • September 15, 2025
  • Introducing GICv5: Scalable and secure interrupt management for Arm

    Christoffer Dall
    Christoffer Dall
    Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
    • April 28, 2025
  • Getting started with AARCHMRS Features.json using Python

    Joh
    Joh
    A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
    • April 8, 2025