The latest high-performance ARMv8-A processor is the Cortex-A72.The press release reports that the A72 delivers CPU performance that is 50x greater than leading smartphones from five years ago and will be the anchor in premium smartphones for 2016. The Cortex-A72 delivers 3.5x the sustained performance compared to an ARM Cortex-A15 design from 2014. Last week ARM began providing more details about the Cortex-A72 architecture. AnandTech has a great summary of the A72 details.
The Carbon model of the A72 is now available on Carbon IP Exchange along with 10 Carbon Performance Analysis Kits (CPAKs). Since current design projects may be considering the A72, it’s a good time to highlight some of the differences between the Cortex-A72 and the Cortex-A57.
Carbon IP Exchange Portal Changes
IP Exchange enables users to configure, build, and download models for ARM IP. There are a few differences between the A57 and the A72. The first difference is the L2 cache size. The A57 can be configured with 512 KB, 1 MB, or 2 MB L2 cache, but the A72 can be configured with a fourth option of 4MB.
Another new configuration which is available on IP Exchange for the A72 is the ability to disable the GIC CPU interface. Many designs continue to use version 2 of the ARM GIC architecture with IP such as the GIC-400. These designs can take advantage of excluding the GIC CPU interface.
The A72 also offers an option to include or exclude the ACP (Accelerator Coherency Port) interface.
The last new configuration option is the number of FEQ (Fill/Evict Queue) Entries on the A72 has been increased to include options of 20, 24, and 28 compared to the A57 which offers 16 or 20 entries. This feature has been important to Carbon users doing performance analysis and studying the impact of various L2 cache parameters.
The Cortex-A72 configuration from IP Exchange is shown below.
ACE Interface Changes
The main change to the A72 interface is the width of the transaction ID signals has been increased from 6 bits to 7 bits. The wider *IDM signals only apply when the A72 is configured with an ACE interface. The main impact occurs when connecting an A72 to a CCI-400 which was used with A53 or A57. Since those CPUs have the 6-bit wide *IDM signals the CCI-400 will need to be reconfigured for 7-bit wide *IDM signals. All of the A72 CPAKs which use CCI-400 have this change made to them so they operate properly, but it’s something to watch if upgrading existing systems to A72.
This applies to the following signals for A72:
System Register Changes
A number of system registers are updated with new values to reflect the A72. The primary part number field in the Main ID register (MIDR) for A72 is 0xD08 vs the A57 value of 0xD07 and the A53 value of 0xD03. Clearly, the 8 was chosen well before the A72 number was assigned. A number of other ID registers change value from 7 on the A57 to 8 on the A72.
New PMU Events
There are a number of new events tracked by the Cortex-A72 Performance Monitor Unit (PMU). All of the new events have event numbers 0x100 and greater. There are three main sections covering:
- Branch Prediction
The screenshots below from the Carbon Analyzer show the PMU events. All of these are automatically instrumented by the Carbon model and are recorded without any software programming.
The A72 contains many micro-architecture updates for incremental performance improvement. The most obvious one which was described is the L2 FEQ size, and there are certainly many more in the branch prediction, caches, TLB, pre-fetch, and floating point units. As an example, I ran an A57 CPAK and an A72 CPAK with the exact same software program. Both CPUs reported about 21,500 instructions retired. This is the instruction count if the program were viewed as a sequential instruction stream. Of course, both CPUs do a number of speculative operations. The A57 reported about 37,000 instructions speculatively executed and the A72 reported 35,700.
The screenshots of the instruction events are shown below, first A72 followed by A57. All of the micro-architecture improvements of the A72 combine to provide the highest performance CPU created by ARM to date.
Carbon users easily can run the A57, A53, and now the A72 with various configuration options and directly compare and contrast the performance results using their own software and systems. The CPAKs available from Carbon System Exchange provide a great starting point and can be easily modified to investigate system performance topics.