Accelerating AI experiences from edge to cloud

May 29, 2017

6 minute read time.

First processors based on Arm DynamIQ technology take a big step towards boosting AI performance by more than 50x over the next 3-5 years
Arm Cortex-A75 delivers massive single-thread compute uplift for premium performance points
Arm Cortex-A55 is the world’s most versatile high-efficiency processor
Arm Mali-G72 GPU expands VR, gaming and Machine Learning capabilities on premium mobile devices with 40 percent more performance
New supporting IP includes Arm Compute Library, comprehensive suite of developer tools and POP IP

Artificial intelligence (AI) is already simplifying and transforming many of our lives and it seems that every day I read about or see proofs of concept for potentially life-saving AI innovations. However, replicating the learning and decision-making functions of the human brain starts with algorithms that often require cloud-intensive compute power. Unfortunately, a cloud-centric approach is not an optimal long-term solution if we want to make the life-changing potential of AI ubiquitous and closer to the user for real-time inference and greater privacy. In fact, survey data we will share in the coming weeks shows 85 percent of global consumers are concerned about securing AI technology, a key indicator that more processing and storing of personal data on edge devices is needed to instill a greater sense of confidence in AI privacy.

Enabling secure and ubiquitous AI is a fundamental guiding design principle for Arm considering our technologies currently reach 70 percent of the global population. As such, Arm has a responsibility to rearchitect the compute experience for AI and other human-like compute experiences. To do this, we need to enable faster, more efficient and secure distributed intelligence between computing at the edge of the network and into the cloud.

Arm DynamIQ technology, which we first previewed back in March, was the first milestone on the path to distributing intelligence from chip to cloud. Today we hit another key milestone, launching our first products based on DynamIQ technology, the Arm Cortex-A75 and Cortex-A55 processors. Both processors include:

Dedicated instructions for AI performance tasks via DynamIQ technology, setting Arm on a trajectory to deliver 50x AI performance increases over the next 3-5 years
Increased multicore functionality and flexibility in a single compute cluster with DynamIQ big.LITTLE
The secure foundation for billions of devices, Arm TrustZone technology, to fortify the SoC in edge devices
Increased functional safety capabilities for ADAS and autonomous driving

To further optimize SoCs for distributed intelligence and device-based Machine Learning (ML), we are also launching the latest premium version of the world’s No. 1 shipping GPU, the Mali-G72. The new Mali-G72 graphics processor, based on the Bifrost architecture, is designed for the new and demanding use cases of ML on device, as well as High Fidelity mobile gaming and mobile VR.

Cortex-A75: Breakthrough single-threaded performance

I have been at Arm for over a dozen years and can't remember being this excited about a product delivering such a boost to single threaded performance without compromising our efficiency leadership. The Cortex-A75 delivers a massive 50 percent uplift in performance and greater multicore capabilities, enabling our partners to address multiple high-performance use cases including laptops, networking and servers, all within a smartphone power profile. Additional performance data and a deep dive on technical features can be found in this blog from Stefan Rosinger.

Cortex-A55: The new industry leader in high-efficiency processing

SoCs based on Cortex-A53 came to market in 2013 and since then Arm partners have shipped a staggering 1.5 billion units, and that volume is continuing to grow rapidly. That’s an extremely high bar for any follow-on product to surpass. Yet, the Cortex-A55 is not your typical follow-on product. With dedicated AI instructions and up to 2.5x the performance-per-milliwatt efficiency relative to today's Cortex-A53 based devices, the Cortex-A55 is the world’s most versatile high-efficiency processor. For more performance data and technical details, visit this blog from Govind Wathan.

Flexible big.LITTLE performance for more everyday devices

When distributing intelligence from the edge to the cloud, there is a diverse spectrum of compute needs to consider. DynamIQ big.LITTLE provides more multicore flexibility across more tiers of performance and user experiences by enabling configuration of big and LITTLE processors on a single compute cluster for the first time.

new big.LITTLE performance levels - Arm DynamIQ

The flexibility of DynamIQ big.LITTLE is at the heart of the system-level approach distributed intelligence requires. The combination of flexible CPU clusters, GPU compute technology, dedicated accelerators, and the new Arm Compute Library work together to efficiently enhance and scale AI performance. The free, open-source Arm Compute Library is a collection of low-level software functions optimized for Cortex CPU and Mali GPU architectures. This is just the latest example of Arm’s commitment to investing more in software to get the most performance out of hardware without compromising efficiency. On the CPU alone, Arm Compute Library can boost performance of AI and ML workloads by 10x-15x on both new and existing Arm-based SoCs.

Mali-G72: Optimized for next-generation real-world content

Our system-level approach enables innovation across multiple blocks of compute IP, including the GPU. The Mali-G72 GPU builds on the success of its predecessor, the Mali-G71. The Bifrost architecture enhancements boost the performance by up to 40 percent in the Mali-G72, enabling our partners to advance the mobile VR experience and push High Fidelity mobile gaming into the next realm. We have also designed the Mali-G72 to provide the most efficient and perfomant ML thanks to arithmetic optimizations and increased caches, thus reducing bandwidth for a 17 percent ML efficiency gain.

Mali-G72 efficiency gain

With 25 percent higher energy efficiency, 20 percent better performance density, and the new ML optimizations, Arm can distribute intelligence more efficiently across the SoC. To read additional technical details on the Mali-G72, visit this blog by Freddi Jeffries.

Distributed intelligence starts here

Today we’ve announced the next generation of CPU and GPU IP engines designed to power the most advanced compute. The image below represents the most optimized Arm-based SoC for your edge device. A full suite of compute, media, display, security and system IP designed and validated together to deliver the highest-performing and most efficient mobile compute experience. This suite of IP is supported by a range of new System Guidance for Mobile (SGM-775) which includes everything from SoC architecture to detailed pre-silicon analysis documentation, models and software, and this is available for free to Arm partners. For accelerated time-to-market and optimized implementations to ensure highest performance and efficiency, Arm POP IP is available for the Cortex-A75.

Arm DynamIQ - distributed intelligence

Leading software ecosystem, from edge to cloud

Software is central to future highly-efficient and secure distributed intelligence. The Arm ecosystem is uniquely positioned to deliver the breadth of disruptive software innovation required to kickstart the AI revolution. To further support our latest CPU and GPU IP, we are also releasing Arm's complete software development environment. Our ecosystem now has the opportunity to develop software optimized for DynamIQ ahead of hardware availability through a combination of Arm virtual prototypes and DS-5 Development Studio.

As Arm prepares to work with its partners to ship the next 100 billion Arm-based chips by 2021, we are more agile than ever in enabling our ecosystem to guide the transformation from a physical computing world into a more natural computing world that’s always-on, intuitive and of course, intelligent. Today’s launch puts us one step closer to our vision of Total Computing and transforming intelligent solutions everywhere compute happens.

Top Comments

Carl Williamson over 6 years ago +1

We've updated this blog to include a video where Nandan Nayampally, VP & GM of ARM Compute Product Group, introduces the next generation of Arm CPUs Cortex-75 and Cortex-55, new cores for new artificial...

Carl Williamson over 6 years ago

We've updated this blog to include a video where Nandan Nayampally, VP & GM of ARM Compute Product Group, introduces the next generation of Arm CPUs Cortex-75 and Cortex-55, new cores for new artificial intelligence experiences everywhere.
- Cancel
- Up +1 Down
- Reply
- More
- Cancel

Architectures and Processors blog

Deep dive into the PMU value of L2D_CACHE_WR on the Neoverse N2 server

Ker Liu

In-depth analysis of what the PMU of L2D_CACHE_WR counts on the Neoverse N2 server.
- April 15, 2024
Arm SPE: SoC Telemetry & Performance Analysis using Statistical Profiling Extension

Brian Jeff

We refer to the SPE performance methodology whitepaper published by Arm for details on the content of this blog.
- December 8, 2023
Implementing the WebAssembly bitmask operations on the 64-bit Arm architecture

Anton Kirilov

We discuss some of the challenges that we face when we are trying to implement the WebAssembly SIMD bitmask operations on the 64-bit Arm architecture.
- December 6, 2023

AI and ML blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded blog

Graphics, Gaming, and VR blog

High Performance Computing (HPC) blog

Infrastructure Solutions blog

Internet of Things (IoT) blog

Operating Systems blog

SoC Design and Simulation blog

Tools, Software and IDEs blog