Arm Neoverse E1 Platform: Empowering the infrastructure to meet next generation throughput demands

February 20, 2019

6 minute read time.

We are moving towards the world of 1 trillion devices and new technology that will enable these changes are beginning to be rolled out today. 5G wireless technology is a key piece of this coming transformation, promising lower latency, better response time and higher bandwidth availability to the edge devices, which in turn provides customers with a better user experience.

Compute model will change as more 5G devices roll in

The new 5G end devices will include wide range of use cases and demands and not just limited to the mobile devices we use to today. These devices will come in all shapes and sizes including autonomous vehicles, intelligent connected cameras, and tiny Internet of Things (IoT) sensors and will generate a massive amount of data. 5G will require a rethinking in how the data is transported from edge to core, as well as a new infrastructure to handle this upcoming data deluge.

IoT devices data consumption and latency

Throughput requirements over the next 10 years are estimated to be 10 times what we consume today. This drives a requirement for 1.6x more throughput performance per generation assuming a two-year deployment cadence of new infrastructure devices. This deployment will take time. New economic drivers and business models will be tested which requires the infrastructure to be flexible in order to adapt to these changes. This infrastructure flexibility will come in the form of Software Defined Networking (SDN) and the end solution will require a mix of compute and accelerators.

5G technology will require new infrastructure and along with it a new class of compute capability to keep up with the increased throughput demands.n The Neoverse E1 Platform is a highly efficient platform designed for next-generation throughput compute workloads. The microarchitecture for Neoverse E1 was developed around two design goals: maximizing throughput while balancing compute and efficiency requirements. The Neoverse E1 offers 2.7x throughput performance with 2.4x throughput-to-power efficiency and 2.1x compute performance over the popular Cortex-A53. Its impressive compute and throughput performance in a highly efficient package allow for deployment in locations where power is limited, and where general-purpose server processors do not fit. For example, using Neoverse E1 in an 8-core Power-over-Ethernet (PoE) driven wireless access device or low-power 5G edge transport node would be ideal use cases. In addition, the highly flexible and scalable architecture of Neoverse E1 allows the platform to scale up to multi-port 100Gbps devices like a firewall appliance.

Arm Neoverse E1 Platform features

Enhanced microarchitecture for maximum throughput efficiency

For throughput workloads, we investigated the system behavior and found that cache misses dominate up to 80% of processing cycles on a conventional small core, high efficiency processor like Cortex-A53 or Cortex-A55. This means during cache misses, the core is stalled waiting for data to become available. One method to address this inefficiency is the out-of-order pipeline in the Neoverse E1 microarchitecture. This allows the instructions that do not depend on the missing data to execute ahead and reduces the stall cycles down to 50%. However, the design approach taken in the Out-of-Order design is consistent with highly efficient core design, for example limiting the reorder buffer to 40 instructions, and the reservation stations to 8 entries – providing the benefit of out of order issue and retire, with the power and area cost of a long OoO window.

Our engineers did not stop there. We incorporated simultaneous multithreading (SMT) into the design. The Neoverse E1 can process two threads concurrently resulting in higher aggregated system throughput and improved efficiency. From a software perspective, the two threads appear as two individual CPUs. They can be at different exception levels and running different operating systems altogether. SMT lowers the stall cycle in throughput workloads to 30% on typical throughput workloads, a big reduction from the 70~80% achieved on Cortex A-55 without OoO and SMT.

Arm Neoverse E1 Stall Cycles

Additional infrastructure features that help improve the throughput performance are cache stashing and the Accelerator Coherency Port (ACP). The ACP allows an accelerator to have low latency, closely coupled tie into the processor cluster. An accelerator agent can utilize cache stashing feature by sending a cache stashing hint to send the data into each core’s private L2 cache or the cluster L3 cache making the data available to the CPUs before an instruction need them. This in effect reduces the number of cache misses and the overall latency.

Software prototype simulating 5G small cell transport

To demonstrate the performance of the Neoverse E1 platform, we develop a 5G small cell transport software prototype which simulates packet processing workloads at a 5G base station. The simulated device would process data packets in two directions – from the wireless interface upstream to an aggregator or edge cloud (uplink), and on the opposite downlink direction. The prototype runs on Linux operating system and utilizes several open-source libraries such as Data Plane Development Kit (DPDK) and OpenSSL.

Arm Neoverse E1 platform software

For uplink processing, the device must translate cellular packets received from the cell tower to IP packets, perform IPsec encryption, and process the packets according to the network rules which may include fragmenting larger packets into smaller pieces. Since Neoverse E1 can process the data in parallel across multiple threads/cores, the software needs to reorder the egress packets to have the same sequence as at ingress.

On the other hand, the device must reassemble the encrypted IP packets receiving from the edge cloud and perform decryption before converting them to cellular packet suitable for radio transmission. Packet reordering also applies in this direction.

With this software prototype of a 5G base station, we put the Neoverse E1 Edge Reference Design through its paces. The Neoverse E1 Edge Reference Design includes sixteen Neoverse E1 cores arranged in two clusters of eight cores, connected through the high-performance CMN-600 mesh interconnect, MMU-600 system MMU, and 2-channel DDR4-3200.

The Neoverse E1 Edge Reference Design reaches more than 50Gbps aggregated throughput on all-software packet processing. In term of power, the Neoverse E1 cores in the Edge Reference Design consume less than 4W at 2.3GHz. This is significant because power availability at the edge of the network is very limited; usually less than 15W is available for SOC power budget. Less CPU power consumption means more power is available for other peripherals such as radio, DSP or other accelerators.

Arm Neoverse E1 Platform performance 5G

Highly scalable throughput design for multi-100Gbps devices

When the requirement calls for both control plane and data plane functionalities in the same device, the Neoverse E1 can provide generous compute performance for light control plane tasks. In a multi-cluster Neoverse E1 design, we can set aside a single cluster for control plane workloads while the rest of the Neoverse E1 clusters are dedicated to the data plane. Fixed-function accelerators, such as crypto engine, can tie into the cluster via ACP to offload and shorten the packet processing time.

Neoverse E1 and Neoverse N1 processors can be combined in a heterogenous design for high-performance systems. The Neoverse N1 clusters are assigned to the control plane and the Neoverse E1 clusters are used for data plane functions in this scenario. Some of the example devices utilizing this kind of system are firewall appliances with deep packet inspection and intrusion detection capabilities, multi-100Gbps routers and SmartNICs.

Arm Neoverse E1 Platform 100gbps feature

The Neoverse E1 Platform for next-generation throughput demands

The Neoverse E1 platform will help transform the networking infrastructure to meet the throughput demands of the next decade as the infrastructure evolves towards one capable of enabling the world of 1 trillion devices. Neoverse E1 delivers powerful throughput performance and efficiency. The flexibility of the Neoverse E1 platform allows it to scale from low-power device to high-performance 100Gbps+ appliances while leveraging common Arm AArch64 software and a diverse software ecosystem of cloud-native software that is increasingly being optimized for Arm. We are very excited to see the innovative solutions that our partners will bring to market using the Neoverse E1 platform. To learn more visit the Neoverse E1 page on developer, and read about our recent product announcement.

0 comments
0 members are here

Architectures and Processors blog

Introducing GICv5: Scalable and secure interrupt management for Arm

Christoffer Dall

Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
- April 28, 2025
Getting started with AARCHMRS Features.json using Python

Joh

A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
- April 8, 2025
Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

Samer El-Haj-Mahmoud

Arm and 9elements Cyber Security have brought a prototype of OpenBMC to the Arm Neoverse Compute Subsystem (CSS) to advancing server manageability.
- January 28, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Arm Neoverse E1 Platform: Empowering the infrastructure to meet next generation throughput demands

Compute model will change as more 5G devices roll in

Enhanced microarchitecture for maximum throughput efficiency

Software prototype simulating 5G small cell transport

Highly scalable throughput design for multi-100Gbps devices

The Neoverse E1 Platform for next-generation throughput demands

Introducing GICv5: Scalable and secure interrupt management for Arm

Getting started with AARCHMRS Features.json using Python

Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC