Redefining storage with Arm Cortex-R82 and Neoverse CMN-S3

September 30, 2025

4 minute read time.

The changing face of storage

In today’s digital economy, storage is about more than capacity. Modern applications, such as cloud computing, AI/ML, edge analytics, and 5G services demand low latency, high throughput, security, and scalability. Storage devices must act not only as repositories but as active participants in the data pipeline. This shift is redefining what storage architectures need to deliver.
Arm’s Cortex-R82 processor and Neoverse CMN-S3 interconnect are a powerful combination for building the next generation of storage systems. Together, they address deterministic control paths and the massive data movement challenges that define storage workloads today.

Cortex-R82 and Neoverse CMN-S3: The core of modern storage

At the heart of modern SSDs and storage devices is the controller. Increasingly, the interconnect fabric ties the controller to memory and accelerators. The Cortex-R82 delivers the performance and determinism storage demands. Neoverse CMN-S3 ensures that data can move seamlessly across the SoC.

Cortex-R82 brings:

64-bit execution (Armv8-R AArch64), with support for up to 2 TB of memory for large flash arrays and caching DRAM.
Optional MMU for dual-mode operation: bare-metal for deterministic I/O, or Linux for computational storage workloads.
Advanced memory subsystem with private L1 caches, an optional shared L2 cache up to 4 MB, and tightly coupled memories (up to 1 MB each per core) for ultra-low latency paths.
Up to 8 cores per cluster, enabling SoC designers to build multi-core real-time compute islands.
Optional support for AMBA CHI, enabling Cortex-R82 clusters to connect directly into CMN-S3 fabrics for coherent, high-bandwidth communication.
Full ECC protection (SECDED/DED) across caches and memories for enterprise-grade reliability.
Trace and debug features for profiling, QoS enforcement, and system validation.

Cortex-R82

CMN-S3 complements this by:

Delivering low-latency, high-bandwidth links between cores, accelerators, and memory controllers.
Maintaining system-wide coherency, so CPUs, encryption engines, and compression blocks can share data seamlessly.
AMBA CHI protocol compliance, which makes it the natural fabric to tie together multiple Cortex-R82 clusters (up to 8 cores per cluster) into a single coherent system alongside accelerators, memory controllers and I/O.
Supporting CHI-C2C & CXL, enabling memory expansion and pooling across servers.
Embedding RAS and security features, to ensure data integrity at hyperscale.

Together, Cortex-R82 and CMN-S3 provide both the deterministic control and the scalable data movement needed for modern storage architectures. These range from SSD controllers to high-bandwidth flash memory modules and multi-cluster storage SoCs.

Demonstrating the advantage

The impact of this combination is clear in both compute and memory benchmarks.

CPU efficiency: Cortex-R82 achieves 3.71 DMIPS/MHz and 6.28 CoreMark/MHz. This delivers around a 48% uplift in DMIPS and 36% uplift in CoreMark compared to the previous generation Cortex-R8. This uplift gives storage controllers headroom to manage complex I/O pipelines and support Linux-based services.
Memory throughput: On the STREAM 256KB benchmark, Cortex-R82 shows up to 4x higher sustained bandwidth across copy, scale, sum, and triad kernels. With CMN-S3 providing coherency and efficient data sharing, this bandwidth uplift accelerates data transfers between caches, flash, and host interfaces. These gains are critical for next-generation high-bandwidth flash modules.

STREAM 256K benchmark showing Cortex-R82 performance

Latency-sensitive operations: LMbench results show strong memory copy, zeroing, and streaming performance, with bandwidths exceeding 12 GB/s. These results are particularly relevant for storage workloads like garbage collection, wear leveling, and metadata updates. In these cases, R82’s determinism and CMN-S3’s fabric efficiency work hand in hand.

Cortex-R82 LM-Bench Score

Why Cortex-R82 + CMN-S3 are perfect for storage

Individually, each IP block is powerful. Together, they address the two key challenges of control-plane determinism and data-plane scalability:

SSD Controllers
- Cortex-R82 provides predictable latency for NVMe command handling and wear-leveling.
- • CMN-S3 ensures inline accelerators for compression, deduplication, or encryption remain coherent with minimal overhead.
Computational Storage Devices
- Cortex-R82 runs real-time control tasks alongside Linux applications.
- CMN-S3 links CPUs with AI/ML accelerators for near-data processing. This reduces data movement bottlenecks.
Hyperscale & Distributed Storage Systems
- Cortex-R82 delivers deterministic execution for control-plane tasks such as transaction processing, flash translation, and error management. These capabilities are essential for scaling storage across thousands of nodes.
- CMN-S3 integrates multiple Cortex-R82 clusters (up to 8 cores per cluster) over the AMBA CHI protocol. It provides the coherent mesh backbone needed for hyperscale deployments.
- Together, they enable multi-cluster, multi-die storage architectures. These support advanced services such as erasure coding, replication, tiered caching, and disaggregated storage.
CXL Memory Pooling Devices
- With CMN-S3’s CXL support, Arm partners are already building memory pooling solutions that allow memory to be treated as a shared resource across servers. This unlocks new levels of efficiency for storage architectures.

Looking ahead

Storage is no longer passive. From NVMe SSDs to computational storage devices and datacentre storage nodes, the requirements are converging:

Deterministic performance to meet strict I/O SLAs.
Scalability to handle exponential data growth.
Security and reliability to ensure trust at scale.

The Arm Cortex-R82 processor and Neoverse CMN-S3 interconnect provide the building blocks for storage solutions that are predictable, scalable, efficient, and future-proof.
Arm enables partners to design differentiated controllers, multi-cluster high-bandwidth flash memory modules, CXL-enabled memory pooling devices, and advanced storage nodes. This powers the transformation of storage into an active, intelligent part of the compute fabric.

Partner perspective

As one of our customers summarized it best:

Servers and Cloud Computing blog

Refining MurmurHash64A for greater efficiency in Libstdc++

Zongyao Zhang

Discover how tuning MurmurHash64A’s memory access pattern yields up to 9% faster hashing performance.
- October 16, 2025
How Fujitsu implemented confidential computing on FUJITSU-MONAKA with Arm CCA

Marc Meunier

Discover how FUJITSU-MONAKA secures AI and HPC workloads with Arm v9 and Realm-based confidential computing.
- October 13, 2025
Pre-silicon simulation and validation of OpenBMC + UEFI on Neoverse RD-V3

odinlmshen

In this blog post, learn how to integrate virtual BMC and firmware simulation into CI pipelines to speed bring-up, testing, and developer onboarding.
- October 13, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog