Following the great success of last year’s Arm Cortex-A76 CPU, we are excited to launch the third generation DynamIQ ‘big’ core Arm Cortex-A77 CPU. It is the second in the family of high-performance, highly efficient CPU products built on the Cortex-A76 design template, continuing our path towards compute performance leadership. Our CPU roadmap from August 2018 showed unmatched year-over-year performance gains and, with 20 percent more single-thread performance over Cortex-A76, Cortex-A77 provides a significant performance boost for next generation devices.
Cortex-A77 is built with the next generation of smartphones, laptops and other mobile devices in mind. It stands ready to take advantage of new and emerging opportunities that will fundamentally improve the user experience on devices, such as the 5G rollout beginning this year, the growth of augmented reality (AR) and a number of advanced machine learning (ML) use cases.
Cortex-A77 is built to fit in smartphone power budgets while delivering maximum performance. This means best-in-class power and efficiency for constant performance on sustained workloads. Compared to Cortex-A76, Cortex-A77 demonstrates a number of performance improvements, including 20 percent plus more integer performance, 35 percent plus more FP performance and 15 percent plus more memory bandwidth improvements. The continuous performance innovation is enabled by the second generation 7nm designs following on from Cortex-A76.
The high-performance capabilities mean Cortex-A77 is perfect for advanced use cases and premium experiences, such as ML and AR. The high performance of Cortex-A77 enables more compute power on the device for secure ML at the edge, with the DynamIQ technology accelerating ML performance through Arm’s Project Trillium platform. The CPU is the most common denominator for ML experiences from edge to cloud. In fact, the Cortex-A range of CPUs has enabled continuous ML performance improvements on big and little cores. ML use cases on mobile devices are becoming more complex, so having a CPU that can support this increasing compute demand is vital. Just some of these use cases on devices include AI Cameras, Visual Scene Detection, 3D Scanning, Biometric user ID (face recognition), Voice Recognition, ML in gaming and ML in AR.
The greater performance also means better responsiveness for new apps and virtual experiences enabled by AR. This includes mobile gaming apps that are increasingly utilizing AR technology to enhance the overall gaming experience for users. You only have to read the Arm-commissioned report from Newzoo to see that AR in mobile gaming is on track for big growth in the next few years. More performant mobile devices are therefore needed to meet the increasing compute demand that will come from more advanced AR.
Cortex-A77 will support the range of 5G-ready devices set to come to the market following the 5G rollout in 2019. Ericsson’s Mobility Report from November 2018 predicts that by 2024 there will be around 1.5 billion smartphone devices that have 5G capabilities. For compute intensive ML, AR and other new and emerging use cases on devices, 5G is an essential requirement. It will bring faster speeds, hyper capacity (between 5 and 20 Gbps), new viewing experiences, 8K resolution streaming, and 360-degree video.
While Cortex-A77 provides a number of performance and efficiency improvements, the new CPU is also a continuation of a range of features made possible by Cortex-A76. Casting your mind back to 2018, the longer battery life of 20 hours plus was a key feature of Cortex-A76, with Cortex-A77 continuing this trend. This means the same multi-day battery life features of Cortex-A76, but with the extra performance. For the user, their devices will have greater productivity 'on-the-go' without needing to charge them every three to four hours. There are also further positive implications for more prolonged high-fidelity mobile gaming, which eats away at the battery life of mobile devices.
In addition, the ability of devices to be Always On, Always Connected remains. Cortex-A77 enables various mobility features on devices through the LTE functions. Greater connectivity on devices presents many opportunities for users. As I mentioned previously, the imminent 5G rollout is set to accelerate what is possible on devices, making user experiences faster and better. Moreover, using the LTE connection on devices rather than public Wi-Fi networks provides greater security and privacy protections.
Micro-architecture upgrades across the entire design, from the front-end through the core to back-end, have enabled Cortex-A77 performance and efficiency improvements across a range of workloads. Cortex-A77 contains several key features that makes this possible, including Armv8.2 architecture, AArch32 and AArch64 support, 64KB L1 I/D caches, 256KB and 512KB private L2 caches, and up to 4MB share L3 cache.
Focusing on the front-end design, the aim is to propel performance further on the CPU. There is double the branch-prediction bandwidth, various next generation improvements to branch prediction accuracy (which lessens the number of costly branch mispredicts), and increased branch target buffer (BTB) capacity with a 33 percent larger L2 BTB and 4x larger L1 BTB. Meanwhile, the introduction of a Macro-op (Mop) Cache is a fundamental enabler of performance. This feature enables higher fetch bandwidth, lower fetch latency, and dynamic code optimizations to alter the instruction sequence to run more optimally on the downstream core and back-end.
Improvements have also been made to the out-of-order core to push microarchitecture width and depth. This is enabled through a 50 percent increase in dispatch bandwidth, with up to 6-instructions/cycle through dispatch, and a 25 per cent increase in out-of-order window size to 160 instructions.
The execution core increases available bandwidth through a 50 percent increase in integer execution bandwidth. There are also latency improvements with integer multiplies. Furthermore, a second AES encryption pipe has been added.
Finally, the memory subsystem also has targeted performance enhancements. There is up to 25 percent increase in window growth for in-flight loads and stores, which exposes more memory-level parallelism, and 2x the amount of dedicated load-store issue bandwidth. Moreover, there are many new and improved features around data prefetching, to increase performance and power-efficiency. The data prefetchers can now dynamically alter their behavior based on different memory subsystem configurations, and utilization within the DynamIQ cluster.
Cortex-A77 shows that continuous innovation for more performance at great efficiency is still achievable.
It will allow OEMs and SiPs to deliver products for more intelligent devices through greater performance and best-in-class efficiency for a range of more complex workloads on mobile devices. Whether it’s prolonged mobile gaming and greater productivity ‘on-the-go’ through the extended battery life, utilizing new and more complex AR and ML workloads, or taking advantage of the opportunities from the 5G rollout, Cortex-A77 aims to significantly enhance and improve the user experience on mobile devices.
We are not slowing in our efforts to redefine mobile device performance, as we continue on the path to compute performance leadership through our Premium Cortex-A CPUs.
Read our newsroom blog about the launch of the Premium IP suite on Arm.com.
[CTAToken URL = "https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a77" target="_blank" text="Learn more about Cortex-A77" class ="green"]