Arm Cortex-R82: Combining high-performance 64-bit real-time and applications processing for the next generation of storage devices

September 3, 2020

9 minute read time.

Real-time embedded systems, such as Solid-State Drives (SSDs), have relied heavily on the proven, 32-bit Arm Cortex-R5 and Cortex-R8 processors for successful system architectures for generations of products. These systems have historically required less then 4GB of DRAM and addressable space and have not had a need to run Linux. With continually increasing storage capacities and performance requirements to saturate increasing throughput of storage host interfaces, the 4GB limit and inability to run Linux are adding complexity, and in some cases, becoming barriers.

There is a need for higher performance, real-time compute with more addressable space and the ability to run Linux to enable the next generation of computational storage devices.

Introducing Arm’s first 64-bit high-performance real-time and applications processor

The Cortex-R82 processor is optimized for systems where high-performance real-time is required. Designed to easily handle demanding workloads and provide more addressable space, it enables new capabilities of future storage devices, bringing:

Faster response time and reduced latency

The Cortex-R82 processor maintains its heritage as a classic Cortex-R real-time processor with:

Deterministic, lower latency response times
A wide range of ports for peripherals and memories, including Tightly Coupled Memories (TCMs), caches, and low latency ports

Increased capacity

The Cortex-R82 processor is a high performance, 64-bit real-time processor capable of addressing up to 1TB of address space to fulfill the requirements of growing capacities and emerging memory technologies.

More flexibility

The Cortex-R82 processor easily enables higher performance storage devices, but with Linux support, paves the way for simplified computational storage architectures and flexible SoC designs that can reallocate compute resources dynamically based upon changing workloads or different products. Cortex-R82 leverages the Arm Linux ecosystem that has been ported, optimized and validated on Arm. The ecosystem development, that was accelerated through the Linaro partnership that started in 2010, and Linux, or any other High Level Operating System, that today work on Arm Cortex-A series processors will seamlessly work on Cortex-R82.

Applications

The Cortex-R82 processor is able to run high-level operating systems, such as Linux, and other application code by including an optional Memory Management Unit (MMU).

Machine learning

The Cortex-R82 processor optionally supports Arm Neon technology for accelerating ML workloads that will be at the heart of computational storage applications.

Cortex-R82 is optimized for high-performance real time and high-level applications

Cortex-R82 is the first Armv8-R 64-bit processor that retains classic Cortex-R real-time compute but provides the higher compute performance needed to run new workloads such as machine learning (ML). It is also Arm’s first Cortex-R processor to support a trusted and robust ecosystem of rich operating systems and software components that already exist in the Linux and cloud development ecosystem. 

The Cortex-R82 processor represents a significant uplift compared to Cortex-R8 and Cortex-R5, implementing a whole range of new features and enhancements. Let’s review a few key features in detail.

First Arm processor that can combine MPU and optional MMU for real-time and Linux

Cortex-R82 is the first Arm processor that combines both real-time contexts and MMU-based contexts in a single core.

In traditional Cortex-R real-time behavior, a Cortex-R82 core can still be configured with a Memory Protection Unit (MPU) to run bare metal and RTOS. In Cortex-R82, that same core can also be configured with an optional MMU to allow a High-Level Operating System, like Linux, to execute. Both the real-time and MMU contexts can be handled by the same core simultaneously, or selected cores in a cluster can be dedicated to real-time or Linux, which increases the flexibility of an SoC design to accommodate multiple products and markets. This choice is handled by software and can even be changed dynamically, enabling the balance to be dynamically adjusted depending on demand.

Cortex-R82 has three Exception levels (ELs). EL2 is the highest level that enables a Secure enclave and separation/isolation of virtual machines for OEM code and customer code. More specifically, a Memory Protection Unit (MPU, real-time) context running at EL2 handles context switches between MPU and MMU contexts at EL1 with OEM and/or OS code while user code runs at EL0. Linux can be running and when a real-time event occurs, the processor can switch to handle the real-time event, then switch back to Linux. The security enables isolation of the main firmware and enables end customers of Cortex-R82 based devices to add custom software, either real time or Linux based.

64-bit processor with 40-bit addressing to access up to 1TB of address space

Cortex-R82 is the first 64-bit real-time capable Arm processor with 40 address bits. The 40 address bits allow the processor to directly address up to 1TB of addressable space. The direct addressability enables very large memory or device real-time systems and improved performance over windowing solutions. This large address space can be accessed either over AXI or CHI to enable additional capabilities including atomics and cache stashing.

Major performance uplift over Cortex-R8 on standard benchmarks and 2x on real partner code

The Cortex-R82 processor provides a performance uplift over Cortex-R8 on standard benchmarks and even higher uplift on actual partner code. Partner code execution is showing 74-125% performance uplift compared with Cortex-R8. The Cortex-R82 processor also provides a 21% performance uplift over Cortex-A55 when running SPECINT2006 benchmarks. The performance uplift satisfies the most demanding real-time embedded workloads and easily runs full Linux distributions.

Neon for ML

The Cortex-R82 processor optionally includes the latest Neon instructions to greatly accelerate machine learning (ML) workloads with capabilities such as Dot Product support. This is especially useful for computational storage where the Arm Compute Library and Arm NN library can be accelerated by Neon, for example to search for a specific image in a drive full of images.

Read also our Guide to Computational Storage for more insight.

Creating value across a range of applications through flexibility

The ability to run both real-time and Linux on the same core or cluster of cores is key in emerging technologies such as computational storage. The real-time capability is required for the data transfers through the SSD, just like traditional SSDs. Running Linux and associated software tools directly on the drive facilitates computational workload management and filesystem recognition to perform the on-drive computation and generate insight on the drive greatly reducing data movement, latencies, and energy consumption.

This same capability could be achieved with a cluster of Cortex-R8 cores, for example, and a cluster of Cortex-A cores for Linux, but the overall system architecture is simplified with Cortex-R82 since it can handle both. This reduces die size, cost, and most importantly, enables flexibility. The same SoC can be used for an ordinary enterprise SSD and reconfigured for a CSD product, saving the large mask-set costs in smaller processes to create multiple SoCs. The same product can even be dynamically configured through software to run SSD functions during the day and switch to Computational Storage at night.

One storage controller tapeout for both pure storage and computational storage applications with Cortex-R82 cores

Adjusting the types of workload running on the storage controller based external demands with Cortex-R82 cores

The performance required for the next  generation of storage solutions

The Cortex-R82 processor provides a significant performance uplift over the Cortex-R8 processor.

Cortex-R82 Performance

Using the Arm Compiler 6.14 with O3 as optimization level, the EEMBC Consumer benchmark is significantly improved thanks to the Neon SIMD instructions. Note that the generic benchmarks, which only typically exercise the core pipeline capabilities, do not all demonstrate the major system enhancements that greatly improve real-world applications. What really matters are the actual Customer code benchmarks that show 74% to 125% improvement over Cortex-R8.

Performance measurements when using the MMU Linux also show a 21% SPECINT2006 improvement over Cortex-A55 and 23% improvement on SPECFP2006. These results from our performance model show this is clearly a significant uplift compared with the current high-efficiency Cortex-A cores.

Processor power, performance, and area are highly dependent on process, libraries, and optimizations. The following table estimates a typical four-core cluster implementation of the Cortex-R82 processor on mainstream low-power process technology (5 nm) with standard-performance cell libraries. Each core is configured with:

32KB L1 instruction cache
32KB L1 data cache
32KB of ITCM
32KB of DTCM
Full floating-point and Advanced SIMD engine

The processor cluster is configured with an integrated 1MB L2 shared cache.

Cortex-R82 four-core cluster	5 nm
Maximum clock frequency	Above 1.8 GHz
Performance	3.41 / 4.32 / 8.67 DMIPS/MHz* 5.82 CoreMark/MHz**
Total area (including Cluster+Cores+RAM+Routing)	From 2.0 mm2***
Efficiency	From 30 DMIPS/mW***

* Benchmark built with GCC 9.2. The first result abides by all of the 'ground rules' laid out in the Dhrystone documentation, the second permits inlining of functions (not just the permitted C string libraries) while the third additionally permits link time optimizations. All are with the version 2.1 of Dhrystone and ANSI-C-style function declarations.

** Benchmark built with Green Hills Software compiler 2020.1.4 using “-Ospeed -Omax -OI -OB -OV” between others.

*** Preliminary estimates, subject to be refined once the product is released

Hear how Cortex-R82 can meet the most demanding real-time workloads in the storage market at Arm DevSummit.

A full solution for faster development

Arm has a suite of technologies and tools to support, speed up, and reduce risk of the development of Cortex-R82 based storage controllers. Arm Development Studio and Fast Models enable early hardware and software co-development and Cycle Models allow custom benchmarking and performance optimization ahead of silicon availability. Arm training and design review services and Cortex-R82 Artisan^® Physical IP and POP IP accelerate time to market and reduce risk. Arm is developing a TSMC 7FF POP to deliver the best PPA required for Cortex-R82 use cases.

Accelerating the next revolution in data storage

Cortex-R82, Arm’s first 64-bit Cortex-R processor, is accelerating the computational storage revolution.

With 40 bits of addressing, it enables the higher performance and larger RAMs required for the next generation of storage solutions
The optional MMU enables rich operating systems and cloud native workloads to run in the storage controller, enabling generation of insight local to the drive
The optional Arm Neon technology accelerates ML workloads that will be at the centre of computational storage applications
Versatile, Cortex-R82 provides the flexibility to deliver a range of next generation storage solutions from one tape-out

Visit our website for more information on the Arm Computational Storage solution.

Learn more about Cortex-R82

Internet of Things (IoT) blog

Deploying PaddlePaddle models on Arm Ethos-U85: A step-by-step tutorial

Liliya Wu

Build the future of edge AI: streamline PaddlePaddle deployment on Arm for performance where it matters most.
- October 1, 2025
Transforming smart home privacy and latency with local LLM inference on Arm devices

Fidel Makatia

Learn how Raspberry Pi 5 and Arm-based local LLM inference can power a fully private, cloud-free smart home assistant with real-time performance
- August 19, 2025
Kickstarting 2025 with the Arm Developer Workshop at KNUST

Derrick Edem Sosoo

We kicked off 2025 at KNUST with a hands-on Arm Developer Workshop focused on IoT, learning paths, and community-driven innovation.
- May 12, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Arm Cortex-R82: Combining high-performance 64-bit real-time and applications processing for the next generation of storage devices

Introducing Arm’s first 64-bit high-performance real-time and applications processor

Cortex-R82 is optimized for high-performance real time and high-level applications

Creating value across a range of applications through flexibility

The performance required for the next  generation of storage solutions

A full solution for faster development

Accelerating the next revolution in data storage

Deploying PaddlePaddle models on Arm Ethos-U85: A step-by-step tutorial

Transforming smart home privacy and latency with local LLM inference on Arm devices

Kickstarting 2025 with the Arm Developer Workshop at KNUST

Arm Cortex-R82: Combining high-performance 64-bit real-time and applications processing for the next generation of storage devices

Introducing Arm’s first 64-bit high-performance real-time and applications processor

Cortex-R82 is optimized for high-performance real time and high-level applications

Creating value across a range of applications through flexibility

The performance required for the next generation of storage solutions

A full solution for faster development

Accelerating the next revolution in data storage​

The performance required for the next  generation of storage solutions

Accelerating the next revolution in data storage