Cortex-A76: Laptop-Class Performance With Mobile Efficiency

Last year we brought you the very first CPUs based on Arm’s innovative new DynamIQ technology. By adding even greater flexibility to the already popular big.LITTLE model, DynamIQ allows partners greater differentiation and scalability to target the specific requirements of their SoC. We demonstrated this added flexibility earlier this year when we launched our Mainstream Solutions, with a proposed 1+7, 2+6 or even 4 little, configuration. Now it’s time to revisit the high end and change the face of intelligent mobile computing as we introduce our latest premium CPU, Arm Cortex-A76 

 Cortex-A76 intelligent computing

Built on the same v8.2 architecture as its predecessor, Cortex-A76 features a brand new microarchitecture designed from day-one for extreme performance and power efficiency. Cortex-A76 is also the second-generation premium core in the DynamIQ big.LITTLE combination. 

Mobile convenience, laptop performance

As mobile compute becomes ever more complex with new virtual experiences and AI/ML applications, performance and efficiency are key to meeting next-gen requirements whilst still being able to operate in the specific requirements of the mobile form factor. It’s not just about mobile though, Cortex-A76 continues Arm’s innovations in the laptop space. As our smartphones have become capable of so much more than the basic call and text functions they were originally intended for, they’ve also become more and more central to our lives, adding value and allowing us to complete tasks we never could have dreamed of on a mobile phone. The flip side of this impressive growth is that our laptops have arguably become less impressive. We can’t really expect to work, untethered, for a whole day, without running into some serious battery issues at the very least, and this is no longer acceptable to a user who is used to having serious compute functionality permanently at their fingertips.

Addressing this need has been a key focus for us, and this is where Cortex-A76’s performance uplift of 35% over the current generation makes a real-world difference to the tasks it can perform. When you add in the equally vital 40% power efficiency improvement, you can perform these complex use cases for longer than ever before. 

Cortex-A76 represents the best fit for the laptop space because the performance uplifts allow exceptional delivery of the most important productivity apps such as the Microsoft Office suite, providing a much faster, smoother user experience. Cortex-A76 based laptops are expected to deliver twice the performance on the current Arm based generation. However, whilst it would be comparatively easy to achieve this uplift if power wasn’t a concern, providing mobile-style longevity was key. In focusing on this delicate balance, we’ve bridged the performance gap without compromising on efficiency, facilitating a responsive, always-on, mobile experience on laptop for the very first time. So not only do you get a better user experience, but a much longer battery life. 

Performance Cortex-A76 CPU

On the smartphone it’s the more traditional annual innovation beat we’ve targeted, as use cases gain complexity, our partners SoCs must be able to keep up, and the convergence of mobile and laptop functionality means the same priorities are relevant across both types of device. As you’ll have seen from Arm’s Project Trillium, we recognise that ML is key across all tiers, and that Arm’s ability to support it across all the major processors of an SoC allows our partners to differentiate and optimize against the specific trade-off and priorities of their individual markets. Vital to this always-on lifestyle is the ability to perform ML inference at the edge, saving the latency and security concerns of constant interaction with the cloud. Cortex-A76 therefore achieves 4x the at-the-edge ML performance of the previous product, for low-precision inference algorithms. Given the challenge of improving single threaded performance, the 35% uplift represents both a traditional innovation set and a serious gain for the form factor. Even more important is the fact that, compared to standard Cortex-A75 systems, 40% more performance in the same power budget is the element critical for significantly improving user experience, and this is why we work so hard to maximise gains without compromising on the efficiency we’re famous for.  

Cortex-A76 CPU laptop performance

Whilst the Arm architecture is the same as the previous generation, the Cortex-A76 benefits from ground-up microarchitecture improvements, providing the foundation for a new family of performance-efficient processors. Significant gains have been achieved through a series of performance bottleneck removals and microarchitectural area and power optimizations, resetting the design and providing a huge step up in performance for both mobile and laptop 

Cortex-A76 microarchitecture: Big changes, big gains 

Several major microarchitectural improvements are included in the Cortex-A76 to increase the performance, through instruction per cycle uplift or deeper memory level parallelism.  

Some of the key enhancements include: 

  • Decoupled branch prediction and instruction fetch: Built to hide latency at high bandwidth, the in-order Cortex-A76 front-end is able to fetch 4 to 8 instructions per cycle, using multi-level branch target caches and hybrid indirect predictor to sustain the maximum throughput. 
  • A wider machine: Cortex-A76 is Arm’s first 4-wide decode core, increasing the maximum instruction per cycle capability. Up to 8 operations per cycle can then be dispatched to the out-of-order core, supporting a wider area-/power-optimized instruction window 
  • More integer and vector execution throughput: Quad-issue integer units are integrated in the core including 3x simple ALU and 1x multi-cycle integer. Moreover, Cortex-A76 supports dual-issue native 16B (128-bit) vector and floating-point units, twice the throughput of any previous Arm CPU. Vitally, it can deliver the 4x ML performance improvements we mentioned earlier. 
  • Enhanced memory system: The full cache hierarchy is co-optimized for latency and bandwidth, with a sophisticated 4th generation prefetcher, deep memory-level parallelism.  

Designing the SoC includes being able to implement quickly to your Performance-Power-Area (PPA) targets. Along with the new suite of system IP, Arm offers POP technology that supports Cortex-A76, and its LITTLE core companion Cortex-A55, for the process technologies that matter the most to our customers. The Cortex-A76 POP IP for TSMC 16FFC delivers the fastest performance in one of the most cost-effective process technologies available, bringing great user experience in mass-market devices. For those customers looking for leading-edge process technologies and targeting premium and high-end applications, the Cortex-A76 and Cortex-A55 POP IPs for TSMC 7FF also will be available by Q4 2018. In addition to helping to meet PPA, the Arm POP IP accelerates the implementation cycle, reducing time-to-market to take advantage of the flexibility of DynamIQ big.LITTLE.  

Performance Cortex-A76 CPU

With these intelligent improvements in the underlying architecture, as well as the ever advancing ability to integrate products seamlessly across the Cortex and Mali product families, Cortex-A76 represent a significant step forward for mobile computing, whether you choose the smartphone or laptop as your device of choice. 

Learn about Cortex-A76