The announcement of the ARM Cortex-A35 processor marked the beginning of a new family of ultra high efficiency application processors from ARM. Today, ARM announced the second member of that family, the Cortex-A32, a new 32-bit processor.
In this blog, I’ll provide the market context and some highlights of the Cortex-A32 while answering the question: Why did we create the Cortex-A32?
The embedded market is incredibly diverse. It covers innumerable products – almost everything that is not a phone, a PC, or a server - and spans a huge range of processing requirements. The diversity of requirements in embedded is well served by the three major processor families from ARM: Cortex-A, Cortex-R and Cortex-M. The fundamental differences between the A, R, and M families are shown below:
Much has been written about Cortex-M processors in the embedded market – they are incredibly prevelant. Less attention has been given so far to the growing use of Cortex-A processors in embedded applications. This blog focuses on these rich-embedded applications, where a full OS is required. These are the sweet spot for Cortex-A.
Two fundamental aspects make rich-embedded applications different than the traditional embedded applications using Cortex-R and Cortex-M processors. The first is rich operating system support that requires virtual memory and memory management unit. The vast majority of Cortex-A based embedded products run full virtual memory based OSes like Linux, Android, and Windows. The second aspect is higher performance. The performance needed is again very diverse, and in some cases embedded applications need performance approaching that of smartphones and laptops, which of course Cortex-A processors can deliver.
The rich embedded market is already well established. According to VDC estimates, ARM based devices occupy over 70% market share in the rich embedded segment (SoCs). Just like the embedded market as a whole, the rich embedded market is extremely diverse. There are many use cases, some high performance and others more cost and power sensitive. Let’s look at a few examples - industrial devices, smart watches, smart glasses, and a whole range of products for the home - from thermostats to media hubs. These devices all use Cortex-A, and deliver a richer experience to users.
The rich embedded market is growing rapidly, fueled by two key drivers:
Today, more than 100 Cortex-A based Single Board Computers (SBCs) are available in various performance and cost points. Rich operating systems, open source and proprietary, have become more accessible, and this has opened up embedded development to a wider range of developers. The software ecosystem for Cortex-A processor also includes support from the leading RTOS and embedded tools vendors. Their interest in Cortex-A is driven by demand from their customers, who want to take advantage of Cortex-A performance, compatibility, wide availability, and the benefits of multiple suppliers and price/performance points.
Much has been said lately about 64-bit, which is driving in smartphone and open compute markets, however in embedded the majority of the software ecosystem is focused on 32-bit software. While there are some embedded applications that are moving to 64-bit, like high-end SBCs, NAS, and ADAS systems, many embedded applications are sticking with 32-bit to keep costs and complexity low. We can expect a significant number of embedded devices to remain 32-bit for the foreseeable future.
We built the Cortex-A32 for embedded, first and foremost. Embedded is an exciting market and wanted to continue to processors that accelerate the innovation in this market. So, what benefits does the Cortex-A32 processor offer for rich embedded?
Let us look at some details for each one of these key offerings.
Cortex-A32 is the only ARMv8-A processor optimised for 32-bit compute. As such, the Cortex-A32 offers an ARMv8 upgrade path for applications that today use ARMv7-A processors like Cortex-A5 and Cortex-A7 or classic ARM processors like ARM926 and ARM1176.
The ARMv8-A architecture supports both 32-bit and 64-bit compute capabilities in the AArch32 and AArch64 execution states. Cortex-A32 is optimized to support the A32/T32 instruction set in the AArch32 execution state, which is ideal for 32-bit rich embedded applications that need the lowest cost and power. Even in AArch32, ARMv8-A adds more than 100 new instructions – and the Cortex-A32 benefits from all of these.
Cortex-A32 is 25% more efficient (more performance per mW) than Cortex-A7 in the same process node. Cortex-A32 delivers this efficiency through performance improvements and power reduction, two often conflicting design goals that the Cortex-A32 team managed to deliver in tandem.
The Cortex-A32 also delivers performance improvements compared to Cortex-A5 and Cortex-A7 processors. The performance improvements relative to the Cortex-A5 range from 30% to a massive1300% across a range of benchmarks relevant to embedded markets. Streaming and crypto are key benchmarks at the top end of this scale. Compared to the Cortex-A7, the Cortex-A32 offers 5% to 25% higher performance. To put things in perspective, the Cortex-A32 delivers similar performance to the Cortex-A9, which was the premium smartphone standard just a few years ago. That performance is coming to the lowest cost rich embedded devices now, and at significantly less power.
For integer workloads, the combination of performance improvements and power reduction provided by the Cortex-A32 translates into a greater than 25% efficiency gain over the Cortex-A7 and more than 30% efficiency gain over the Cortex-A5. Compared to Cortex-A35, the Cortex-A32 offers same 32-bit performance but consumes 10% less power and has a 13% smaller core. This means that Cortex-A32 is 10% more efficient than Cortex-A35 processor in the 32-bit world.
Given the diversity of embedded applications, we knew we had to make the Cortex-A32 scalable. Cortex-A32 therefore offers a wide range of configuration options. The diagram below shows two configurations of Cortex-A32 but there is a range of possibilities in between.
The configuration on the left in the diagram above shows a typical performance optimized multi-core configuration - quad core, larger cache sizes and includes optional features like NEON and Crypto engines. This configuration provides excellent performance for most rich embedded applications and retains ARM’s low power leadership – consuming less than 75mW per processor core, when running at 1.0 GHz on a 28nm process node. At the other extreme, the smallest configuration of the Cortex-A32 processor, with a physical implementation optimized for area, occupies less than quarter of mm2 and consumes less than 4mW at 100 MHz in the same 28nm process node. With this scalability, the Cortex-A32 is suitable for a wide range of rich embedded applications.
In summary, the lowest cost rich embedded applications are about to get a lot more exciting. Cortex-A is already the number 1 CPU architecture for rich embedded. The Cortex-A32 expands the Cortex-A family and adds our most efficient 32-bit application processor yet. The Cortex-32 is set to drive future innovation in rich embedded and IoT – I can’t wait to see what our partners will build with it.
[CTAToken URL = "https://developer.arm.com/products/processors/cortex-a/cortex-a32" target="_blank" text="Learn more about the Cortex-A32" class ="green"]
Next to expect a AArch64 only CPU. Wonder if this wouldn't be the smallest 64bit core around if the the AArch32 deadwood was removed.