Since 2014, there has been an ever increasing number devices shipping with ARMv8-A based Cortex Processors – ranging from $65 smartphones to premium flagship devices. This is a wide range and evidence of the ways in which the transition to 64-bit continues the advance in system design and process technology in the mobile space; enabling a fresh wave of innovation on the ARM architecture. I thought now would be a good time to explore the degrees of freedom ARM partners have in building SoCs based on the ARM CPU architecture.When designing a CPU, ARM IP offers two levels of possible differentiation through the ARM licensing model – proprietary or custom microarchitecture and an ARM Cortex processor with system design and implementation choices. Both are fully compatible with the ARM architecture.
This allows our partners to license one of the architectures (e.g. ARMv8-A or ARMv7) and have their own implementation of the ARM ISA. The ISA remains unaltered in these cases but partners can choose their own approach to design a CPU from the ground up that complies with the ARM architecture specification.
ARM partners do this to target unique design points or features to address specific segments of the market, albeit it at higher development cost. It is important to remember that independently developed, proprietary microarchitecture CPUs based on the ARM architecture have to pass an ARM mandated compliance suite to ensure that they are 100% compatible with the ARM architecture. This ensures the ecosystem value of the ARM partnership is preserved and enhanced - code written for custom ARM Architecture CPUs will run on other ARM CPUs.
Partners license ARM designed implementations of the ARM Architecture, such as the ARM Cortex-A processors. At ARM, we are focused on sustaining and growing the largest ecosystem on the planet for efficient computing. Software developed for one ARM-based SoC will run on any other ARM-based SoC that uses the same or newer version of the ARM architecture.
When licensing any combination of Cortex Processors, partners configure the cores to suit their applications without modifying the microarchitecture. This retains the strong foundation of software compatibility. We take great care to ensure that no special modifications are made that could break this compatibility – it is extremely important that all ARM SoCs in a given profile (Cortex-A, Cortex-R, Cortex-M) are software compatible so that the ecosystem is as broad and deep as possible.
Even with a “standard” Cortex CPU, there are many ways that partners can in fact differentiate.
Partners who license ARM CPUs can choose the cache size (L1 and L2), bus interface (e.g. AMBA4 or AMBA5), number of cores in a cluster (1 to 4), and how many CPU clusters to use in the design (2 clusters in a big.LITTLE. design for example). We have seen that partners have built 2+4 big.LITTLE configurations with 2 high performance cores and 4 max efficiency cores for midrange and premium smartphone markets, and 4+4 topologies for higher end smartphone and tablet markets. Similarly, we have seen partners build 2 clusters of 4 LITTLE cores to deliver Octacore capabilities at low to mid-range price points.
L2 size is an important factor in performance on many benchmarks, so high-end designs often push L2 sizes to 2MB for the high performance CPU cluster; low-end and mid-range designs can sometimes play this trade-off differently, with a 1MB L2 for the high performance cluster, or 512kB L2 cache size for a high efficiency CPU cluster in a big.LITTLE SoC, trading off performance for cost savings. This range of configurability allows ARM partners to tailor the CPU capacity in their SoC to their target markets, while retaining full compatibility with the ARM architecture and full access to the benefits of the ARM ecosystem.
Cortex-A CPU IP comes with optional power domains around each CPU core, the L2 subsystem, and other areas of the design. Partners can choose how to implement these voltage domains, and can choose to share or group some domains. Further to this, ARM introduced state retention modes for CPU cores and for the Advanced SIMD units in some of our more recent CPUs that partners can optionally use to offer finer grained power management in the SoC.
There are of course numerous peripherals and interfaces beyond the CPU, GPU, and other processing subsystems that can differentiate an SoC. By taking standard Cortex-A CPUs, some partners choose to devote more of their engineering resources to optimizing and tuning specific peripherals and interfaces to differentiate their SoCs.
Although every Cortex-A CPU is equivalent to every other Cortex-A CPU of the same revision in terms of performance within the CPU, often CPU performance depends quite heavily on memory system performance, and we can observe two Cortex-A CPUs of the same type delivering significantly different performance as a result of this. As one example, the latency to L2 memory depends on the number of slices a partner uses to meet timing for their target frequency; a partner with lower latency to the L2 will have an advantage in performance benchmarks that spill outside the L1 instruction or data caches. As another example, the latency to main memory can differ a lot from one SoC to another - if one SoC has a memory latency of 100 cycles and the other 140 cycles, the 100 cycle latency memory system will be a big advantage in many (but not all) of the key benchmarks, and is often an observable advantage in terms of delivered performance on real-world workloads.
Often partners seek to differentiate on memory system performance, recognizing the large impact this has on overall performance even against other SoCs with the same Cortex-A CPU. One last point on the topic of memory system performance; CPU performance is very sensitive to latency to main memory, and GPU performance is more sensitive to bandwidth to main memory, so ARM partners will optimize and balance between latency and bandwidth in the design of the memory system for their target applications.
The way in which a given SoC manages power incorporates several different mechanisms to slow down or shut down components when under light or zero demand during different phases of use. With so many different components in an SoC design, ARM partners have a lot of ways in which they can manage power, and some partners differentiate on the power management mechanisms in the SoC, the big.LITTLE tuning and power management framework, or the software that organizes the management of component shutdown and presents it to the OS or middleware.
There are several system and implementation choices which further offer ways for ARM partners to differentiate when using Cortex-A standard CPUs:
As a result of all of these opportunities for differentiation, any 2 Cortex-A57, Cortex-A72, or Cortex-A53-based processors can be quite different in their system, power, and performance characteristics, while still being identical from a software perspective.
A quick listing of ways the performance can differ (summarizing some of the points made above):
Beyond all these of course, our partners innovate around the core with their own IP blocks and design techniques.
To sum up, ARM prioritizes the value of the ecosystem - that ability to design code for all ARM-based CPUs of a given architecture release - and offers partners two ways in which to achieve this – through proprietary microarchitecture or by licensing standard ARM Cortex CPUs. There remain numerous important ways in which ARM partners can differentiate and as our partners can and do differentiate along all of these dimensions, it is very important to analyze these characteristics when assessing one SoC based on an ARM Cortex-A core against another.
A benefit of this range of configurability and differentiation is that ARM CPU IP can scale to address a broad range of different markets, and the ARM partnership can respond quickly as new markets start to emerge. An example of this is the recent emergence of the wearables market. ARM partners have repurposed low-end smartphone SoCs, based on the incredibly low-power consuming Cortex-A7, to service the initial wave of watches, along with even lower power Cortex-M CPUs (an order of magnitude less power) for fitness bands and other wearables that don’t require a UI, complex display, or MMU-based OS. Now we are starting to see Cortex-A7 based designs optimized specifically for wearable product, and the targeted physical implementation enables low power wearable implementation that runs under 10mW at 100MHz for a full Apps core - this coming from the same Cortex-A CPU that is shipping in 8 core 2GHz versions for low-cost mid-range smartphones.
Clearly it is important for OEMs to assess the many differentiating factors when choosing between ARM-based SoCs for devices, and it is even more critical for ARM partners to differentiate along each of these paths in the competitive market for SoCs in the ARM ecosystem. It is through this freedom of choice that the ARM partnership has innovated so rapidly and will continue to do so as the ARM ecosystem expands to more fully serve other markets.
Good summary on how different ARM based solutions can be fine tuned by partners based on the target market, cost and the end application requirements. Possible nnovations are many!!! Thanks for sharing.