At the heart of everything, we do at Arm is the need to deliver performance with efficiency. We need the performance improvements to enable and improve the range of digital immersion use-cases on mobile devices. These cover common productivity, communication, security and camera-based tasks, right through to advanced gaming, XR (augmented reality and virtual reality) and machine learning (ML) based experiences. However, we also need to be mindful of efficiency demands, such as shrinking SoC areas, new mobile form factors and demands for longer battery life.
Arm Cortex-A78 - designed for high-end performance at best efficiency
The new Arm Cortex-A78 CPU represents the very best of Arm’s drive for high-end performance at best in-class efficiency. Cortex-A78 transforms next-generation user experiences on smartphones through double digit improvements for sustained performance. It provides a 20 percent sustained performance improvement over Arm Cortex-A77 CPU in the same mobile thermal power envelope¹. Higher sustained performance is important for mobile devices that have a limited capacity to dissipate power. This avoids power throttling for applications demanding a lot of performance, which improves the user experience by avoiding lag or frame-rate drops.
The major push on power efficiency translates into higher energy efficiency. At high-performance points, such as those that are the peak for current mobile devices, Cortex-A78 offers 50 percent energy savings over 2019 devices at the same performance as Cortex-A77¹. This makes Cortex-A78 the most energy-efficient premium Cortex-A CPU ever designed.
The focus on sustained performance through the Cortex-A78 supports the next wave of mobile innovation, from new device form factors coming to market to improved digital immersion experiences supported by 5G.
Power efficiency and energy efficiency combined
Today’s smartphones are the hub of everything we do as consumers, from everyday tasks, such as messaging, shopping and banking, through to entertainment, such as video streaming, gaming, and XR. As a result, we are now living in a world of digital immersion through our smartphones. These use-cases and experiences will only continue to advance in the future, but at an even grander scale.
AAA mobile gaming is one exciting gaming use-case that is further improved by Cortex-A78, especially when combined with Arm’s Mali GPUs. In fact, the new Arm Mali-G78 GPU is helping to bring high-fidelity gaming experiences to mobile. The greater performance of the new Cortex-A78 and Mali-G78, coupled with the fast speeds and high bandwidth of 5G will enable premium gaming experiences on mobile. Moreover, the efficiency benefits of Cortex-A78 provide longer battery life on smartphones for extended and enhanced ‘all-day-play’. Through our ecosystem work, we are further enhancing performance and building richer gaming experiences. One example is our work with Unity to bring the power of the Burst Compiler to Android, further enhancing multiprocessor performance and power management.
As the CPU sits at the heart of the compute system, it has the flexibility to run any type of ML-based workload and task. Therefore, the CPU is the first-choice processor for ML computing on mobile. Our CPUs support the most popular real-world applications and use-cases on smartphones, such as social media filters, dictation, translation, and security. This year we are making efficiency improvements alongside the performance uplift which has accelerated in recent years. Compared to Cortex-A77, Cortex-A78 uses 8 percent less power, on average, for ML-based tasks, leading to 10 percent efficiency improvements overall.
Cortex-A78 has the same architecture as the previous generation. However, we have added microarchitectural features that push performance in an area and power efficient manner. Essentially, we are saving area and power, while maintaining the performance needed.
Microarchitecture for improved performance at lower power and area
The performance benefits are enabled through additional microarchitectural features that optimize width and depth. We have added greater branch prediction for bandwidth and accuracy, and instruction fusion cases. These microarchitecture improvements enable a 7 percent increase in single-thread performance over the Cortex-A77².
We have maximized efficiency through reducing structures that have low performance and area, such as on the L1-I and L1-D caches. We have then optimized existing structures to consume less power, such as the branch prediction structures. This leads to 4 percent less power for performance per mW and 5 percent less area for performance per mm2 compared to Cortex-A77².
At a cluster level, Cortex-A78 keeps the focus on sustained performance at best-in class efficiency. The DynamIQ cluster of 4x Cortex-A77 CPUs and 4x Cortex-A55 CPUs can be upgraded to 4x Cortex-A78 CPUs and 4x Cortex-A55 CPUs. This provides 20 percent sustained performance improvements in 15 percent less area³. The sustained performance push through the DynamIQ cluster makes a big difference for any applications that require several high-performance threads in parallel, such as high-fidelity gaming.
Over the past 25 years, the mobile form factor has evolved to a point where it is completely unrecognizable from the ‘brick-style’ mobile phones of the mid-90s. This is highlighted through this infographic showing how key Arm milestones have coincided with the development of new phone features and designs. The smartphone of one big touch screen has remained largely the same for the past decade. However, we are beginning to see form factor innovation through foldable phones and multiple and larger screens. The improvements in sustainable performance are ideal for the increasing compute demands of these new smartphone device designs where SoC space could be limited. This is supported by the enhanced area efficiency of the Cortex-A78 DynamIQ cluster compared to the previous generation.
The new foldable form factor of smartphones
The 5G rollout is accelerating throughout 2020 in large urban areas worldwide. An Arm-commissioned report by Newzoo will be on the impact of 5G on gaming predicts that over one billion 5G-ready smartphones are in the market by the end of 2022. 5G provides far faster speeds, far lower latency and far faster and more ubiquitous connectivity for mobile devices – especially for high-bandwidth applications. This is driving the need for higher-performing smartphones and more performant and efficient IP.
Cortex-A78 aims to get smartphones 5G-ready through the range of performance and efficiency improvements. The performance uplift of Cortex-A78 provides smartphones with the compute power needed to deliver new digital immersion experiences and deploy existing applications quicker. At the same time, the efficiency improvements allow smartphones to use the performance in a sustainable and effective manner.
We believe that Cortex-A78 represents the perfect balance of performance and efficiency. On the performance front, it provides the capabilities to enable quicker and more advanced digital immersion experiences that are coming from the 5G rollout and ML improvements. It also provides the power efficiency boost to provide sustained performance for longer battery life on smartphones, while helping to facilitate new and emerging mobile form factors.
Mobile innovation is continuing to accelerate. Through 5G, new device designs and new and improved digital immersion experiences, expect to see smartphones continue to evolve and innovate. Arm is at the center of this continuous innovation, with Cortex-A78’s drive for sustained performance getting smartphones ready for this new future.
[CTAToken URL = "https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78" target="_blank" text="Learn more about Cortex-A78" class ="green"][CTAToken URL = "https://www.arm.com/company/news/2020/05/new-arm-ip-delivers-true-digital-immersion-for-the-5g-era" target="_blank" text="Visit the Arm newsroom blog" class ="green"]
¹ Comparing Arm single core performance on Cortex-A78 to Cortex-A77 in 1W and energy consumption for 30 SPECint2006, Including architectural and process improvements (compared to 2019 devices). Measured estimates on SPECint* base2006 (SPECspeed* Integer component of SPEC CPU* 2006) Arm single-core performance estimated for mobile platform. Results are measured estimates using specific computer systems, software, components, operations, and functions and changes to any of these factors will cause the results to vary.
² Measured estimates on SPECint*_base2006 (SPECspeed* Integer component of SPEC CPU* 2006) Arm single-core performance estimated for mobile platform. Results are measured estimates using specific computer systems, software, components, operations, and functions and changes to any of these factors will cause the results to vary.
³ Comparing Arm single core performance at 1 watt on Cortex-A78 and Cortex-77 and comparing cluster area on Cortex-A78/Cortex-A55 4+4 topology to Cortex-A77/Cortex-A55 4+4 topology, including architectural and process improvements (compared to 2019 devices).