ARM Cortex-A75: ground-breaking performance for intelligent solutions

Its seems like it was only yesterday when we announced the ARM Cortex-A73, the most power-efficient high-performance Cortex application processor to date. Today we can see Cortex-A73 in production, powering cutting-edge devices in the mobile and consumer world. Silicon vendors have combined it with the Cortex-A53 in big.LITTLE configurations. The combination of Cortex-A73 and Cortex-A53 delivers excellent performance with great battery life, so you can use your devices all day. This lets designers and OEMs create products in the slimmest form factors available on the market today. 

This combination of efficiency and performance, combined with the work of OS and app developers in the mobile ecosystem, has brought great new use cases: using your phone to immerse yourself in a world of augmented or virtual reality, shooting DSLR-like photos or turning your smartphone into a fully-fledged desktop computer using a compact docking station. These and other performance-intensive use cases have created strong and increasing demand for more compute performance. In response, mobile SoC performance has skyrocketed in recent years, and we don’t see it stopping or slowing down. 

Today we are announcing two new processors – the Cortex-A75 high-performance processor and Cortex-A55 high-efficiency processor to deliver the performance upgrade needed for your devices:

Introducing the new Cortex-A75 and Cortex-A55 processors

 

Both Cortex-A75 and Cortex-A55 are built for DynamIQ technology, ARM's new multi-core technology that we announced back in March 2017. Cortex-A75 brings a brand-new architecture that continues the rapid upward trend in processor performance, while retaining the critical power efficiency achieved by its highly successful predecessor. The new Cortex-A75 CPU will extend performance and it will also expand the capabilities of the CPU to handle advanced workloads that are transforming applications and business in exciting ways.

AI – the technical innovation that will transform businesses and industries

Artificial intelligence (AI) and machine learning (ML) are coming to your device, on the ‘edge’, in addition to being a key capability delivered from the data center or the ‘cloud’. This is one of the strongest new trends we see emerging in every kind of edge device.  From your connected thermostat, to your self-driving or autonomous car, to your mobile handset and wearable tech, you are starting to see the emerging use of ML algorithms to enable machines to help us live better lives.

There are many ways developers can deal with the rise of ML workloads on devices. Modern SoCs contain several processing units: a CPU, a GPU (like the newly introduced Mali-G72) --sometimes combined with DSP and specific acceleration units-- that help speed convolutional neural networks (CNNs), recurrent neural networks (RNNs), and other ML workloads. There are challenges software developers and silicon vendors face though:

  1. Additional hardware on the silicon is expensive – any additional component on the die costs the silicon vendor a lot of money so resources need to be spent very - and I mean very - carefully. Premium devices may include specialized accelerators, but mainstream mobile devices, which drive a significant amount of the worldwide volume, do not usually include specialized hardware for ML. Yet app developers want to deploy ML capabilities that work across all deployed devices in each generation.

  2. It is challenging for software developers to shift tasks around in the system as this takes time and can be costly in terms of performance – for a fixed task, for example, like graphics running on a GPU, you have fixed mapping and drivers locally optimized for efficient graphics processing. The same is true for an accelerator or DSP that deals with fixed computational functions that can be written once and deployed in the edge-device firmware, where they can often operate more efficiently than they would on a CPU. Yet there is a grey zone of workloads where developers’ accessibility of the CPU make it the simplest choice for deploying ML capabilities across the range of mobile edge devices.

  3. New workloads and their processing requirements are still evolving, so fixed-function dedicated hardware accelerators may not address the newest algorithms. It makes sense, in that case, to have general CPU capacity to augment the optimized acceleration blocks found on premium devices.

These characteristics have led us to the conclusion that heterogeneous distribution of workloads is the right approach. There is no one-size-fits-all approach that addresses the challenges outlined above.  Combining general purpose processing, dedicated accelerators and GPU compute technology can allow SoCs to reach highest system efficiency. This enables scalability from premium designs that feature multiple compute units, down to cost constrained lower end devices that still benefit from some heterogeneity while removing some blocks.

Software is critical to enable ML. You might have seen the announcement of the free open-source that boosts performance of AI and ML workloads by 10x-15x, on the CPU alone. This is great news for all existing devices out there – ARM-based SoCs can make immediate use of this new library.  This is one great example where, if tuned properly, we can get more performance out of the existing hardware using better software. ARM has been innovating on both software and hardware. Let’s dive next into the details of our latest hardware improvements to benefit ML and general compute requirements.

DynamIQ – technology for more scalability, enabling new performance levels for broad markets

We recently announced the ARM DynamIQ processor technology – enabling new levels of performance, efficiency, scalability and responsiveness.  It is a new CPU cluster architecture and memory hierarchy that brings with it a new hardware design paradigm for wider scalability. It also delivers new features that can improve AI through ML algorithm performance increases of 50x over the next 3-5 years, through a combination of software and hardware.

Increased performance for AI workloads a key focus for current and future ARM IP

DynamIQ represents also one of the biggest steps since the introduction of the multi-processor design that has brought dual- and quad-cores in the mobile industry.  With DynamIQ, a single cluster now can contain up to 8 processors, potentially with different physical design characteristics (power, frequency, area) and independent voltage and power rails for individual CPUs or groups of cores. The flexibility and scalability allow silicon vendors to target a wide range of markets including smartphones, increasingly autonomous cars, servers and network infrastructure, home automation, smarter DTVs, and more. There is a great write-up with more details on DynamIQ by my colleague, Govind Wathan, that you can find here.

The Cortex-A75 processor: The first high-performance processor based on DynamIQ, delivering ground-breaking performance and efficiency

I am proud to introduce the new Cortex-A75 processor – ARM's latest and highest performance application CPU which is also the 1st high-performance CPU to be based on the new DynamIQ technology.  Cortex-A75 improves performance 20% over the Cortex-A73 when compared at same frequencies. This additional compute capability, combined with significant improvements we have made for ML and other advanced used cases, will enable demanding applications to run more smoothly and provide a new baseline for even more complex workloads to be developed.

Cortex-A75 delivers the new levels of performance for mobile and infrastructure systems

The Cortex-A75 will bring more compelling applications and user experiences to its target markets and will make it a great follow-on to the Cortex-A73.  It is targeted across a broad selection of markets, from edge to the cloud – it goes beyond mobile phones and laptop/clamshell devices, enabling new performance in network infrastructure, automotive designs, and potentially even servers. The efficiency of Cortex-A75 is still best in class. We took many of the insights gathered while building the Cortex-A73 and leveraged them for Cortex-A75.

Some of the key microarchitecture enhancements in Cortex-A75 include:

  • Superscalar processor core, decoding, issuing and executing more instructions than our previous generations. Full out-of-order processing, non-blocking high-throughput L1 caches and advanced instruction and data prefetching.
  • Private L2 caches close to the processing cores. Configurable in size, these private L2s shorten latencies to memory and keep workloads closer to the cores for faster processing and lower power consumption.
  • Unified shared L3 cache in the DynamIQ Shared Unit (DSU) that can be used across all processors in the cluster, including the Cortex-A75 and Cortex-A55. 

ARM partners can use the Cortex-A75 either standalone with up to 4 high-performance processors, or in big.LITTLE combination with the Cortex-A55 processor, with up to 8 processors total.  The choice of the final system will depend on the integrator (usually the silicon supplier) and depends on the tradeoffs between performance levels and cost.

Ground-breaking performance delivered at uncompromised efficiency

Cortex-A75 provides a significant boost in single thread performance that will benefit all markets.  With over 20% more integer core performance when compared on same clock frequencies to last year's CPU, Cortex-A75 provides a significant boost for a new generation of devices.  When compared to devices running at expected top frequencies of up to 3GHz, this performance advantage grows even further when comparing to other devices, as illustrated below.

On other metrics like floating point, NEON SIMD processing or memory performance, Cortex-A75 provides an even greater improvement, some getting close to 50% like for Octane. Cortex-A75 delivers an additional 15% more memory throughput on memory copy over Cortex-A73.  Additional memory performance is important as it is used extensively in operating systems and applications.

Cortex-A75 delivers significant performance enhancements across a broad range of workloads

DynamIQ big.LITTLE – Cortex-A75 and Cortex-A55 combined

Cortex-A75 delivers great performance at market-leading efficiency.  However, there are many applications that don’t require the performance of a high-performance processor, and even in high-performance applications, the highest CPU performance levels are sometimes only required for only about 10% of the time. This is a great fit for big.LITTLE technology to save power in the high hundreds of milliwatts, extending battery life and enabling the big cores to go even faster because the LITTLE cores handle the low-level work. Cortex-A55, ARMs most efficient LITTLE processor to date, provides exactly that.  It’s a follow-on to the successful Cortex-A53. The Cortex-A55 is the ideal LITTLE companion to the Cortex-A75. DynamIQ enables combinations like 1+7 with great area efficiency, which will enable a great upgrade path for mid-range devices:

DynamIQ big.LITTLE enables new performance points for mid-range devices

Make sure to also check out Govind’s blogs on Cortex-A55 and DynamIQ big.LITTLE to get more details.

Cortex-A75 from the edge to the cloud, and nearly everywhere in between

Cortex-A75 is widely applicable across markets.  Many of the features built into the processor and the DynamIQ cluster extend far beyond the mobile and consumer use cases. As one example, we also expect Cortex-A75 to be used in demanding networking and server applications. With 40% more infrastructure performance when compared to Cortex-A72 based systems, we are going to see a significant boost in infrastructure system performance using Cortex-A75:

Increased infrastructure performance using the new Cortex-A75 and CMN-600

Features like cache stashing, atomic transactions between agents, cache way allocation and prioritization, plus advanced RAS capabilities make Cortex-A75 an ideal fit for large scale systems where its efficiency leads to higher compute density.

What about the system-on-chip (SoC) designs the new DynamIQ processors will be used in?

Designing the SoC includes being able to implement quickly to your Performance-Power-Area (PPA) targets.  Along with the new suite of system IP, ARM offers POP technology that support Cortex-A75 and Cortex-A55 for the process technologies that matter the most to our customers.  The Cortex-A75 POP IP for TSMC 16FFC offers the fastest performance in one of the most cost-effective process technologies available.  For those customers looking for leading-edge process technologies, the Cortex-A75 and Cortex-A55 POP IP for TSMC 7FF also will be available by Q4 2017. In addition to helping to meet PPA, the ARM POP IP can help customers accelerate the implementation cycle to take advantage of the flexibility of DynamIQ big.LITTLE.  The Cortex-A75 and Cortex-A55 POP IP offers the most common configurations for SoC designs focused on applications from the edge to the cloud.

ARM has also a long-standing investment in validating our IP in example SoC designs. As the ARM IP portfolio has grown, so has the complexity and scope of these example systems. This work includes everything from SoC architecture to detailed pre-silicon analysis. ARM is delivering this knowledge as System Guidance. Alongside the new CPUs there is a range of new system guidance deliverables covering both mobile and infrastructure systems.  CoreLink SGM-775 System Guidance for Mobile has been designed and optimized with Cortex-A75, Cortex-A55 and Mali-G72. CoreLink SGI-775 System Guidance for Infrastructure describes the types of infrastructure SoC that can be built using the new ARM IP. Both deliverables come with documentation, models and software, and are available for free to ARM partners.

When will these processors arrive in the market?

We are very excited about the Cortex-A75, the Cortex-A55, and the capabilities of DynamIQ. I believe the new flexibility, combined with the performance of the new CPUs, is going to meaningfully increase the capabilities of devices that we all rely upon. It is going to be great to see the new waves of devices coming out that will bring richer experiences and differentiation to the market place.

With more than 10 licensees combined for both CPUs and DynamIQ, we should be seeing some cool new devices coming out soon.  I am expecting initial devices sometime early 2018 and can’t wait to get my hands on one.

Related content

Accelerating AI experiences from edge to cloud

ARM Cortex-A55: Efficient performance from edge to cloud

Mail-G72: Enabling tomorrow's technology today

How to start developing software for ARM Cortex-A55 and Cortex-A75 processors now

Anonymous
Related