There has been a lot of coverage since the launch of the Arm Cortex-A75, Arm Cortex-A55, and Arm Mali-G72 at Computex 2017, including some information about early software development using Arm tools. The public announcement of new Arm IP is a milestone for Arm Models because it means we can now talk about available models and share additional information about the new IP.
Arm Fast Models and Cycle Models enable virtual prototyping for partners to do system architecture and develop software for the new Cortex-A75 and Cortex-A55 before silicon is available. Fast Models and Cycle Models provide different trade-offs for simulation speed and abstraction level which enable a variety of use cases.
Fast Models provide high simulation speed and a flexible programmer's view to enable development of device drivers, firmware, operating systems and applications prior to silicon availability. Fast Models support software profiling, debug, trace, and provide a SystemC interface for integration with 3rd party simulation environments.
Typical use cases for Fast Models include functional software debugging, software profiling and optimization, and software validation and continuous integration. Support for Cortex-A75 and Cortex-A55 is available in the recently released Fast Models 11.0.
Cycle models are cycle accurate and enable users to confidently make architecture decisions about IP selection and IP configuration. Cycle Models run in SoC Designer or any SystemC simulator, including simulators from EDA partners.
Typical use cases for Cycle Models include IP selection and configuration, analysis of HW/SW interaction, and benchmarking and system optimization. Support for the Cortex-A75 and Cortex-A55 is available now on Arm IP Exchange.
This article provides some background about what’s new from a modeling perspective and gives some examples of how to use the new IP in an example system. The focus is on the DynamIQ multi-core architecture and models for the Cortex-A75 and Cortex-A55.
Arm DynamIQ technology is the biggest change to CPU subsystems in some time. Over the past decade Arm has introduced numerous Cortex-A CPUs arranged in a cluster of 1-4 CPUs and the ability to expand systems to include multiple clusters. The most common systems included 2 clusters with a total of 6-8 CPUs. A DynamIQ cluster may contain up to 8 CPUs which are a mix of different CPUs types. A single model of a heterogeneous cluster is a new concept for both Fast Models and Cycle Models as users have grown accustomed to the homogeneous cluster with 1-4 CPUs.
Until now, the concept of the CPU ID and the Cluster ID from the MPIDR register has been mostly the same where the bits [1:0] identified the CPU and bits [11:8] identified the cluster number. The Cortex-A75 and Cortex-A55 make different use of the MPIDR affinity levels.
Affinity level 2 in bits [23:16] identify different clusters within the system. The value in this field is equal to the value present on the CLUSTERIDFAFF2 configuration signal.
Affinity level 1 in bits [15:8] identity individual cores within the cluster. The value can range from 0x00 for core 0, to 0x07 for core 7.
Affinity level 0 in bits [7:0] identify individual threads within a multi-threaded core. Since both Cortex-A75 and Cortex-A55 are single-threaded the value is 0.
This means software which was used to reading 0, 1, 2, and 3 to determine which CPU it was running on will now read 0, 0x100, 0x200, and 0x300.
To account for the additional flexibility in configuring DynamIQ technology, both the Fast Models and the Cycle Models have been changed fairly dramatically. Fast Models have numerous options for both fixed configuration models, such as ArmCortexA55x4CT_CortexA75x1CT, and for flexible models such as ArmCortexA55CT_CortexA75CT which have parameters to set the number of cores in the Cortex-A75 and the Cortex-A55 sub-cluster. Using a variable number of CPUs enables the model ports to be described as arrays where the array index matches the associated core. Cycle Models also support the numerous configuration options by automating model construction on Arm IP Exchange. More information is given below in the Cycle Model section.
DynamIQ technology greatly expands the possible configuration options and Arm Models have been enhanced to also support the added flexibility.
One great way to learn new Arm IP is to look at the example Fast Model systems which contain the new models. For DynamIQ technology there a number of examples in $PVLIB_HOME/examples/LISA/FVP_Base which contain the Cortex-A75 and the Cortex-A55.
The Base Platform system models allow early development, distribution, and demonstration of software deliverables for the new CPUs. A range of Base FVPs are supplied with different DynamIQ configurations. The benefit of Base is that it provides a standard peripheral set for software development and porting. Base is available for a wide range of Cortex-A processors so it makes trying system variations very easy.
The Base Platform also helps understand what has changed for DynamIQ. As a virtual prototype system creator there are a few things that stand out:
The GICv3 impact is seen in the CPU affinities parameter:
For Cortex-A73 and Cortex-A53:
"CPU-affinities" = "0.0.0.0, 0.0.0.1, 0.0.0.2, 0.0.0.3"
For Cortex-A75 and Cortex-A55:
"CPU-affinities" = "0.0.0.0, 0.0.1.0, 0.0.2.0, 0.0.3.0"
If simulating power management is of interest, some of the old ports have been removed and replaced by new P-channel ports, pchannel_core and pchannel_cluster, which simplify power-down sequences and require less software intervention.
Besides just reading the documentation for new Arm IP, it’s a good learning experience to take a look at the ports and parameters of the Base Systems as they point to configuration and system connection information that is of great help when building custom systems with Fast Models.
Documentation for the Base Platform is available on Arm Developer.
When working with new CPUs it’s helpful to take a look at bare metal software first to understand what is different. Arm DS-5 provides bare metal software examples which can be used for this purpose. There is a .zip file in $DS5_HOME/examples/Bare-metal_examples_Armv8.zip
Unzipping this shows a couple of useful directories, startup_Cortex-A55_Cortex-A75/ and fireworks_Cortex-A55_Cortex-A75/
Both of these can be compiled with the latest Arm Compiler 6 included in DS-5 5.27.1. The makefile for each reveals that the Cortex-A75 and Cortex-A55 are indeed Armv8.2 CPUs and the compilation is done using –march=armv8.2-a
Changes to CPU identification are shown below:
.type GetMPIDR, "function" .cfi_startproc GetMPIDR: mrs x0, MPIDR_EL1 ret .cfi_endproc .type GetCPUID, "function" .cfi_startproc GetCPUID: mrs x0, MIDR_EL1 ubfx x0, x0, #4, #12 // extract PartNum cmp x0, #0xD0A b.eq CA75 cmp x0, #0xD05 b.eq CA55 b Others CA75: CA55: mrs x0, MPIDR_EL1 ubfx x0, x0, #MPIDR_EL1_AFF1_LSB, #MPIDR_EL1_AFF_WIDTH ret Others: mrs x0, MPIDR_EL1 ubfx x0, x0, #MPIDR_EL1_AFF0_LSB, #MPIDR_EL1_AFF_WIDTH ret .cfi_endproc
To demonstrate how Arm DS-5 and Fast Models work together it's good to review the process. Although Base is a great starting point to experiment with, most users want to create custom Fast Model systems so it's good to review how to compile the Base Systems and use them as a jump start to building custom systems.
First, import the Armv8 software examples described above using File -> Import > DS-5 -> Examples and Programming Libraries:
Next, compile and run the Fast Model Base Platform for DynamIQ found in $PVLIB_HOME/examples/LISA/FVP_Base/Build_Cortex-A55+Cortex-A75
Since Fast Models on Linux supports multiple versions of gcc, the proper configuration may need to be set. For Ubuntu 16.10 the configuration can be set by loading FVP_Base_Cortex-A55+Cortex-A75.sgproj into sgcanvas and setting the configuration to Linux64-Release-GCC-5.4.
Compile:
$ ./build.sh
$ cd Linux64-Release-GCC-5.4/
Run with the CADI server started:
$ ./isim_system -C bp.secure_memory=false –S
This will start the simulation:
Now use the procedure to connect DS-5 to the Fast Model system as described in Using Arm DS-5 with custom Fast Model systems and start debugging. The image below shows debugging the fireworks example on the Base Platform system.
For Cycle Models, Arm DynamIQ technology brings some changes to Arm IP Exchange model creation. This is the first time multiple CPU types can be combined into a single cluster and a single model created which contains multiple CPU types and the DynamIQ Shared Unit (DSU). This results in thousands of possible configurations for the up to 8 core cluster. IP Exchange provides options to build models for the Cortex-A75, the Cortex-A55, and for DynamIQ. The first two options will only allow the respective cores to be included, and the last option allows a combination of both CPU types to be specified and a single model generated for the big.LITTLE cluster. The configuration page is shown below.
There are multiple CPAKs (Cycle Model Performance Analysis Kits) available on Arm System Exchange which include the Cortex-A75 and Cortex-A55. There is one CPAK which contains both CPUs in a 2+4 configuration. The system is shown below.
When simulated, right clicking on the DynamIQ cluster shows 6 cores on the register view menu. The screenshot below shows the register menus for the 2+4 configuration.
The DynamIQ Cycle Models have PMU events instrumented for performance analysis. Profiling can be enabled by right-clicking on the CPU model and selecting the Profiling menu or using the Profile button on the top of the GUI. Any or all of the PMU events can be enabled. Any simulation done with profiling enabled will write the selected PMU events into the System Analyzer database. The transaction activity can also be profiled to measure bandwidth and latency of the traffic generated by various configurations. The new heterogeneous cluster may require a fresh analysis of interconnect and memory controller performance to make sure the best architecture is chosen for new designs. The picture below shows just a subset of all of the performance information available in the DynamIQ cluster.
Arm DynamIQ Cycle Models can be used in both SoC Designer and SystemC. The SoC Designer models have software profiling built in and the SystemC models have TArmAC output for all DynamIQ cores.
SoC Designer models also have also have debugger support in the recently released Arm DS-5 5.27.1 and allow memory and register views along with single stepping and other common debugging features.
The announcement of the first Arm DynamIQ CPUs represents the first heterogeneous cluster design. Immediate support in the Arm toolset makes it possible to start learning about the latest technology far ahead of silicon availability. Arm Models are available now to begin looking at system design, performance analysis, and software development. Please refer to developer.arm.com for more information on Arm Development Tools and the latest information on Arm IP.