How the ARM Architecture has fostered differentiation through diversity?
Since 2014, there has been an ever increasing number devices shipping with ARMv8-A based Cortex Processors – ranging from $65 smartphones to premium flagship devices. This is a wide range and evidence of the ways in which the transition to 64-bit continues the advance in system design and process technology in the mobile space; enabling a fresh wave of innovation on the ARM architecture. I thought now would be a good time to explore the degrees of freedom ARM partners have in building SoCs based on the ARM CPU architecture.
When designing a CPU, ARM IP offers two levels of possible differentiation through the ARM licensing model – proprietary or custom microarchitecture and an ARM Cortex processor with system design and implementation choices. Both are fully compatible with the ARM architecture.
This allows our partners to license one of the architectures (e.g. ARMv8-A or ARMv7) and have their own implementation of the ARM ISA. The ISA remains unaltered in these cases but partners can choose their own approach to design a CPU from the ground up that complies with the ARM architecture specification.
ARM partners do this to target unique design points or features to address specific segments of the market, albeit it at higher development cost. It is important to remember that independently developed, proprietary microarchitecture CPUs based on the ARM architecture have to pass an ARM mandated compliance suite to ensure that they are 100% compatible with the ARM architecture. This ensures the ecosystem value of the ARM partnership is preserved and enhanced - code written for custom ARM Architecture CPUs will run on other ARM CPUs.
ARM Cortex Processor
Partners license ARM designed implementations of the ARM Architecture, such as the ARM Cortex-A processors. At ARM, we are focused on sustaining and growing the largest ecosystem on the planet for efficient computing. Software developed for one ARM-based SoC will run on any other ARM-based SoC that uses the same or newer version of the ARM architecture.
When licensing any combination of Cortex Processors, partners configure the cores to suit their applications without modifying the microarchitecture. This retains the strong foundation of software compatibility. We take great care to ensure that no special modifications are made that could break this compatibility – it is extremely important that all ARM SoCs in a given profile (Cortex-A, Cortex-R, Cortex-M) are software compatible so that the ecosystem is as broad and deep as possible.
Innovation and differentiation within the ARM ecosystem
Even with a “standard” Cortex CPU, there are many ways that partners can in fact differentiate.
- CPU configuration:
Partners who license ARM CPUs can choose the cache size (L1 and L2), bus interface (e.g. AMBA4 or AMBA5), number of cores in a cluster (1 to 4), and how many CPU clusters to use in the design (2 clusters in a big.LITTLE. design for example). We have seen that partners have built 2+4 big.LITTLE configurations with 2 high performance cores and 4 max efficiency cores for midrange and premium smartphone markets, and 4+4 topologies for higher end smartphone and tablet markets. Similarly, we have seen partners build 2 clusters of 4 LITTLE cores to deliver Octacore capabilities at low to mid-range price points.
L2 size is an important factor in performance on many benchmarks, so high-end designs often push L2 sizes to 2MB for the high performance CPU cluster; low-end and mid-range designs can sometimes play this trade-off differently, with a 1MB L2 for the high performance cluster, or 512kB L2 cache size for a high efficiency CPU cluster in a big.LITTLE SoC, trading off performance for cost savings. This range of configurability allows ARM partners to tailor the CPU capacity in their SoC to their target markets, while retaining full compatibility with the ARM architecture and full access to the benefits of the ARM ecosystem.
- Power domains:
Cortex-A CPU IP comes with optional power domains around each CPU core, the L2 subsystem, and other areas of the design. Partners can choose how to implement these voltage domains, and can choose to share or group some domains. Further to this, ARM introduced state retention modes for CPU cores and for the Advanced SIMD units in some of our more recent CPUs that partners can optionally use to offer finer grained power management in the SoC.
There are of course numerous peripherals and interfaces beyond the CPU, GPU, and other processing subsystems that can differentiate an SoC. By taking standard Cortex-A CPUs, some partners choose to devote more of their engineering resources to optimizing and tuning specific peripherals and interfaces to differentiate their SoCs.
- Memory system performance:
Although every Cortex-A CPU is equivalent to every other Cortex-A CPU of the same revision in terms of performance within the CPU, often CPU performance depends quite heavily on memory system performance, and we can observe two Cortex-A CPUs of the same type delivering significantly different performance as a result of this. As one example, the latency to L2 memory depends on the number of slices a partner uses to meet timing for their target frequency; a partner with lower latency to the L2 will have an advantage in performance benchmarks that spill outside the L1 instruction or data caches. As another example, the latency to main memory can differ a lot from one SoC to another - if one SoC has a memory latency of 100 cycles and the other 140 cycles, the 100 cycle latency memory system will be a big advantage in many (but not all) of the key benchmarks, and is often an observable advantage in terms of delivered performance on real-world workloads.
Often partners seek to differentiate on memory system performance, recognizing the large impact this has on overall performance even against other SoCs with the same Cortex-A CPU. One last point on the topic of memory system performance; CPU performance is very sensitive to latency to main memory, and GPU performance is more sensitive to bandwidth to main memory, so ARM partners will optimize and balance between latency and bandwidth in the design of the memory system for their target applications.
- SoC level power management:
The way in which a given SoC manages power incorporates several different mechanisms to slow down or shut down components when under light or zero demand during different phases of use. With so many different components in an SoC design, ARM partners have a lot of ways in which they can manage power, and some partners differentiate on the power management mechanisms in the SoC, the big.LITTLE tuning and power management framework, or the software that organizes the management of component shutdown and presents it to the OS or middleware.
There are several system and implementation choices which further offer ways for ARM partners to differentiate when using Cortex-A standard CPUs:
- Process node: ARM IP is shipped as synthesizable RTL that can be implemented on several different process nodes. Today (early 2015) partners at the highest premium end of the market are building with ARM IP on 16nm and 14nm, while many premium designs are being built and currently shipping on 20nm, with a range of designs targeting 28nm for lower-cost premium SoC platforms for the mid-range and entry level. The frequency and power characteristics can vary significantly for the same ARM CPU implemented on different process nodes, so the choice of process remains one of main ways (and most obvious) that partners differentiate on ARM IP
- Physical implementation: The time and effort spend on physical placement, routing, and optimization of the logic and RAM arrays in a design can significantly differentiate one Cortex-A CPU from another. For example, investment in physical design can produce higher maximum frequency for the same design, lower power at the same maximum frequency, or some combination of the two. Also, partners sometimes iterate on the physical design of a CPU, such that the 2nd or 3rd generation of a product can be significantly improved in power, performance, and area (cost) characteristics due to improvements in the physical implementation of the same Cortex-A CPU, providing further differentiation for the partner. ARM POP IP has been a factor in improving the quality of results that can be achieved in physical design by partners, and also improves the next differentiation factor in this list… time to market.
- Time to market: Release windows are critical in markets like high-end smartphones and premium tablets, where a delay of one month can mean missing a whole year design cycle for devices with an annual refresh. Some partners differentiate on being very fast to market based on designs with Cortex-A CPUs. Often in those fast markets, the initial SoC product will be followed with a revised version that improves on the original.
- GPU, ISP, video and audio subsystems: In a modern mobile SoC, the performance of the chip is often influenced even more strongly by the performance of the graphics processor, the image processing, the video and audio subsystem, and of course the way these components all work together. ARM provides industry leading IP in the Mali GPU and video subsystem, but we allow our partners to mix and match between our IP, their own IP, and that of 3rd parties. This allows the ARM partnership to experiment with different combinations of IP, iterate rapidly, and compete for the best combination in each device generation. This competitive iteration has led to rapid innovation in smartphones and tablets and is a key benefit of the ARM ecosystem, a benefit that is now well established in networking markets, and making inroads into server markets, for example.
- System design: The way in which the CPU, GPU, ISP, video subsystem, coherent interconnect, and memory system work together as a combined system is an increasingly important factor in modern SoCs, and a key way for partners to differentiate their chips. Examples of differentiation in the system design include the use and configuration of cache coherent interconnect, next level cache memories, dynamic memory controllers, and the software that configures the system and optimizes things like power down modes and operating points at run-time.
- Software: Beyond the hardware IP and custom components in an SoC, there is of course the software that configures and operates the SoC. The key attribute we have been discussing is the compatibility of all ARM-based designs, so that the Linux kernel, application software, and middleware all run the same on ARM-based CPUs. ARM partners can differentiate along all of the dimensions listed above, and still maintain full software compatibility that allows them to tap in to the vast wealth of software written for the ARM architecture. The chip support and board support packages with a given SoC can be a point of differentiation for ARM partners that invest there.
As a result of all of these opportunities for differentiation, any 2 Cortex-A57, Cortex-A72, or Cortex-A53-based processors can be quite different in their system, power, and performance characteristics, while still being identical from a software perspective.
A quick listing of ways the performance can differ (summarizing some of the points made above):
- Max frequency (and max sustainable frequency - influences by power)
- Power (affects sustained frequency in a thermally constrained environment)
- Latency to the L2
- Latency to main memory
- Bandwidth to main memory
- L2 size (and L1 size for some ARM CPUs)
- big.LITTLE topology - number of cores
- big.LITTLE tuning and scheduling policy
- Coherent interconnect
Beyond all these of course, our partners innovate around the core with their own IP blocks and design techniques.
To sum up, ARM prioritizes the value of the ecosystem - that ability to design code for all ARM-based CPUs of a given architecture release - and offers partners two ways in which to achieve this – through proprietary microarchitecture or by licensing standard ARM Cortex CPUs. There remain numerous important ways in which ARM partners can differentiate and as our partners can and do differentiate along all of these dimensions, it is very important to analyze these characteristics when assessing one SoC based on an ARM Cortex-A core against another.
A benefit of this range of configurability and differentiation is that ARM CPU IP can scale to address a broad range of different markets, and the ARM partnership can respond quickly as new markets start to emerge. An example of this is the recent emergence of the wearables market. ARM partners have repurposed low-end smartphone SoCs, based on the incredibly low-power consuming Cortex-A7, to service the initial wave of watches, along with even lower power Cortex-M CPUs (an order of magnitude less power) for fitness bands and other wearables that don’t require a UI, complex display, or MMU-based OS. Now we are starting to see Cortex-A7 based designs optimized specifically for wearable product, and the targeted physical implementation enables low power wearable implementation that runs under 10mW at 100MHz for a full Apps core - this coming from the same Cortex-A CPU that is shipping in 8 core 2GHz versions for low-cost mid-range smartphones.
Clearly it is important for OEMs to assess the many differentiating factors when choosing between ARM-based SoCs for devices, and it is even more critical for ARM partners to differentiate along each of these paths in the competitive market for SoCs in the ARM ecosystem. It is through this freedom of choice that the ARM partnership has innovated so rapidly and will continue to do so as the ARM ecosystem expands to more fully serve other markets.