Consider this: The performance of smartphones, nearly all of which are powered by ARM processors, has grown by 100x since 2009. One hundred times in seven years! With that has emerged entirely new functionality, lightning-fast user responsiveness, and immersive user experiences – all in the same power footprint. It’s really an unrivaled engineering achievement, given the challenging design constraints in the mobile space.
This performance, functionality and user experience dynamic has driven a truly remarkable market, which will see more than 1.5 billion handsets sold in 2016.
With this consumer embrace, smart phone design has become, in many ways, the platform for future innovation. Augmented and virtual reality, ultra-HD visualization, object-based audio processing or computer vision all underlie the demand for extra system performance. At the same time, smart phone designs have slimmed considerably in recently years, which limits thermal dissipation and ratchets up the need for thoughtful power management design. Battery capacity improvement cannot continue as smartphones have gotten as large as they practically can. To continue delivering more immersive user experiences and staying on the smartphone innovation path we’ve blazed in the past decade, we need to deliver more sustained performance with higher efficiency.
To this end, ARM has announced its latest high-performance processor, the Cortex-A73. After introducing Cortex-A72 just last year, ARM is accelerating its innovation pace with the Cortex-A73 processor, which will power premium smartphones by early 2017.
The Cortex-A73 is designed and optimized specifically for mobile and consumer devices. The aspects of Cortex-A73 that I’m most excited about are all about efficient performance:
I’ve had the privilege of sitting alongside the design team that has created the Cortex-A73, with the specific intent of meeting this challenge: to be the most efficient and highest performance ARM processor. What follows is an overview of the main features and key enhancements of the Cortex-A73 and their resulting benefits.
The Cortex-A73 includes a 128-bit AMBA 4 ACE interface enabling integration in ARM big.LITTLE systems, either with the highly efficient Cortex-A53 in premium designs or with our latest ultra-efficient Cortex-A35 processor in mid-range and more cost constrained designs.
The Cortex-A73 processor is designed for your next-generation premium smartphone. When implemented in the advanced 10nm technology, the Cortex-A73 delivers 30% more sustained performance than our most recent previous high-performance CPU, the Cortex-A72. Running at frequencies up to 2.8GHz, the Cortex-A73 also delivers the highest peak performance, almost matched by the sustained performance of its extreme energy efficiency. What you’ll notice in the chart below is that the Cortex-A73 can sustain operation at nearly peak frequency, a rarity in mobile phone processors today, where real-world frequencies get throttled back.
The Cortex-A73 micro-architecture includes several interesting performance optimizations that I can share (and quite a few others that I can’t share). It supports a 64kB instruction cache, state-of-art branch prediction based on the most advanced algorithms, and high-performance instruction prefetching. The main performance improvements are actually implemented in the data memory system. It uses advanced L1 and L2 data prefetchers, with complex pattern detection. We have also optimized the store buffer for continuous write streams and increased the data cache to 64kB without any timing impacts.
These enhancements translate into a performance uplift of up to 10% in mobile use cases compared to Cortex-A72 at iso-frequency. We expect silicon designs with Cortex-A73 to push further on frequency than in previous generations, a venture that is assisted by the increased efficiency. Moreover the Cortex-A73 consistently beats Cortex-A72 in all memory workloads by at least 15% to increase the performance across multiple applications, operating system operations or complex compute execution as NEON processing.
To deliver the uplift in performance, the Cortex-A73 requires less power than the Cortex-A72. The Cortex-A73 implements several optimizations such as an aggressive clock-gating scheme, power optimized RAM organization, and optimal resource sharing for AArch32 and AArch64 execution to reduce power.
Compared to Cortex-A72, the power saving for a combination of integer workload is above 20% and even higher for workloads such as floating-point or memory access. This power efficiency enables a better user experience and extends the battery life. Or it can also be used to give extra headroom to the rest of the SoC, enabling the overall system and the graphics processor to increase performance and to provide better visual effects, higher frame rate or new functionality.
In addition to delivering the highest sustained and peak performance, the Cortex-A73 is even more compelling as it delivers this performance in the smallest area for an ARMv8-A premium processor. This translates into a premium experience at mid-range costs for the increasingly important mid-range smartphone market. The Cortex-A73 is smaller than the ARMv7-A Cortex-A15; when compared to the Cortex-A57 and Cortex-A72, it offers 70% and 46% area reduction respectively, well over the benefit of the technology itself. At iso-process, Cortex-A73 core is up to 25% smaller than Cortex-A72. Optimal for implementation in advanced technology nodes such as 16nm and 10nm, the Cortex-A73 also scales very efficiently in mass-market nodes such as 28nm to provide significant performance uplift for mid-range devices. The reduced footprint offers silicon area for integrating more functionality or increasing the performance of the other IPs in premium systems, or to decrease SoC and device costs in mid-range systems.
With our big.LITTLE technology and CoreLink CCI, ARM provides a great scalability to enable our partners to differentiate and optimize their system. What does that mean? SoC designs can create designs with 1 or 2 big cores and 2 or 4 LITTLE cores that rival the performance and user experience of premium designs. An exclusive L2 cache can scale down to 1MB and still provide enough cache to support the big cores in real-world high performance workloads. big.LITTLE software can adapt to all of these scalable configurations by placing work optimally based on an energy model.
big.LITTLE technology is widely deployed in the mobile market today. The Cortex-A73, combined with Cortex-A53, will power the next-generation of premium smartphones, typically in an octa-core configuration. In addition, Cortex-A73 provides the opportunity to boost the mid-range user experience to a higher level. For example, in a hexa-core big.LITTLE configuration, a dual-core Cortex-A73 and quad-core Cortex-A53 or Cortex-A35 enables significant performance uplift in the same or less area than an octa-core Cortex-A53 - a common topology that has been very successful in entry and mid-range devices. In comparison to an octa-core Cortex-A53, the Cortex-A73 hexa-core delivers 30% more multi-core performance and twice the single-thread peak performance resulting in a considerable improvement of the user experience, thanks to a reduced response time for applications such as web browsing and interface scrolling.
In summary, I am proud to have worked alongside the team that has developed the most efficient high-performance processor, all in pursuit of the continuous improvement of user experience that has come to characterize mobile devices based on the ARM architecture. With the Cortex-A73 processor, you get more for less: more performance, more battery life for less power and less area. Later this year and in 2017, our partners will integrate the Cortex-A73 bringing new functionality and new innovation into premium smartphones, tablets, clamshells, DTVs, and a wide range of consumer devices. I can’t wait to see what they will build.
Related stories:
Congratulations to ARM on the launch of the new ARM premium mobile suite of IP. Synopsys is pleased to stand in support of ARM and its early adopters with their successful tape-outs using Synopsys tools.