Introduced just over two years ago, the Neoverse N1 has been hugely successful for Arm and our partners. A growing number of cloud providers' users are seeing phenomenal performance per watt and performance per dollar. On the 5G and networking side, partners are building differentiated solutions that are already powering the next wave of 5G solutions.
Neoverse N2 continues where Neoverse N1 and Neoverse V1 leave off, pulling in new features from Armv8.4, Armv8.5, Armv8.6, and Armv9. In the next few sections, I would like to highlight a few of these features.
Neoverse N2 is our first infrastructure core with Scalable Vector Extension version two (SVE2). SVE2 builds on the foundations of Scalable Vector Extension (SVE) to bring scalable SIMD vector performance and advanced auto-vectorization capabilities to a wider range of software, including ML, DSP, regular expressions, and 5G RAN. With traditional architectures, every time a new vector length is introduced in hardware, code must be rebuilt and optimized to take advantage of the additional vector bandwidth. Both SVE and SVE2 are vector-length agnostic SIMD instruction sets that allow users to write and optimize code once, compile once, and run on a diverse set of hardware. SVE/SVE2 automatically adjust the code to take full advantage of the available vector bandwidth. As new technology enables us to build larger vector machines, code that is written and compiled today using SVE/SVE2 will automatically scale to these larger machines. The SVE2 simpler programming model, along with a new vector-length-agnostic version of the Neon instructions, enables compliers to auto-vectorize your code more easily. This allows programmers to take advantage of vectorization without doing anything special. Although Neoverse N2 continues to fully support NEON for preexisting/pre-optimized code, we recommend focusing new development/optimizations efforts on SVE2. Because it is vector-length agnostic, SVE2 can greatly increases the useful life of the SW and ROI on the development efforts.For more information, visit our SVE/SVE2 developer page.
Neoverse N2 brings a very large 40 percent IPC uplift (SPECint2006 est.) over Neoverse N1. And it can achieve this uplift while retaining an area/power efficiency that is very similar to Neoverse N1, and is a balanced CPU. This performance gain has not come from any one microarchitectural feature, but rather from improvements across the board. This uplift is not limited to synthetic benchmarks only, we also see strong uplifts on real server workloads.As CPU designers push for performance, it becomes increasingly difficult to gain performance without paying an exponentially higher penalty in power and area efficiency. When designing Neoverse N2 we were extremely focused on maintaining the power and area efficiency of the CPU, but not at the cost of performance. To de-conflict these contradictory goals, we have managed to position Neoverse N2 at the knee of this curve. We had an extremely high bar for new microarchitectural features in N2 and they had to have a strong ROI on power and area. Additionally, we spent significant time optimizing existing structures to improve performance and efficiency.Relative to Neoverse V1, Neoverse N2 relies less on the width and depth of the pipeline to achieve its performance. Neoverse N2 also has a more modest amount of speculation, vector bandwidth and load/store bandwidth. Neoverse N2 retains many of the efficient features that went into Neoverse V1, including branch prediction algorithms, data-prefetching algorithms, and replacement policies. Also, Neoverse N2 includes the Mop cache that was introduced in Neoverse V1 which provides strong performance gains on small kernels, often found in infrastructure workloads. All of this done is to maintain the balanced nature of the core, while achieving a strong performance uplift on the workloads that matter to the cloud-to-edge segment.
As Neoverse N2 is a part of our N-Series, it has to be a very scalable CPU and provide our partners latitude for the cloud-to-edge space. Partners can build low core-count, low frequency, power envelope optimized systems or take the same N2 core and build high core count, high frequency, large memory bandwidth monsters for the datacenter. In such large systems, the efficiency profile allows partners to fit more threads per socket. As infrastructure SoCs grow, it is becoming increasingly important to manage shared resources. I would like to introduce a few new features that we have added to further improve the scalability of these large systems. For more information on the performance and scalability of Neoverse N2 see our blog "Breaking down Arm Neoverse performance leadership".
MPAM bounds process interaction and interference in shared resources and provides a mechanism to track and control access to shared system resources like cache and memory bandwidth at a process granularity. Neoverse N2 assigns a tag to transactions as they leave the CPU. These tags remain with the transaction as they work their way through the system shared-resources. This provides a mechanism for intelligent agents along the path to monitor, limit or guarantee the number of resources that are available for a process. In large systems, MPAM can help mitigate noisy neighbors and help implement service level agreements (SLAs).
CBusy provides a mechanism to automatically regulate CPU traffic requests based on overall system congestion. As we all are very familiar from our local freeway systems, if there is a traffic jam nobody makes much forward progress. CBusy provides a means to regulate CPUs in the system to prevent such congestion and associated retries, which only compound the problems. CBusy signals start by throttling speculative transactions, but if congestion is sufficiently severe, it can also throttle all transactions. The goal of CBusy is to maintain an optimal use of queuing resources, while limiting congestion. This allows Neoverse N2 in an uncongested environment to utilize all the available bandwidth, while in a congested situation, it can throttle back its usage to allow better system-level performance. We have observed a 15 percent performance uplift using CBusy in our reference design.
Neoverse V1 introduced two key mechanisms to help with power management: Max Power Mitigation Mechanism (MPMM) and Dispatch Throttling (DT). These mechanisms allow partners to build systems that can maximize performance within a power budget. Both are included in Neoverse N2.In Neoverse N2, we have also added a new mechanism called Performance Defined Power Management (PDP). PDP aims to right-size the power consumption to the workload, i.e., PDP allows the CPU to scale the microarchitecture dynamically to maximize the power efficiency for a given workload. PDP has several levers spread across the CPU that allow it to change the width, depth and speculation of the CPU to match the microarchitecture to the workload being run. Not only can PDP influence the total power of the core, but it can also improve the power efficiency of the core.
The importance of security in our infrastructure cannot be overstated, whether we are talking about cloud servers or 5G base stations, security is a primary factor in any design. Neoverse N2 provides many new features to help address this need.
Today, the secure world is all or nothing. If an application is trusted and placed in the secure world, it is very hard to limit its access within a trusted zone. However, as we move to a world of very many different applications provided by different vendors, many of which live in the secure world, it becomes increasingly important to be able to isolate these applications for each other. Secure EL2 extension adds support for virtualization in the secure world. This brings the features that are available for virtualization in the non-secure state, to the secure state, and enables secure partition managers. Secure Partition Managers enable secure space partitioning, giving partitions the secure access they need, while isolating them from each other.
As a part of our N2 Platform, we also have developed optimal physical implementations under the Neoverse POP IP umbrella to accelerate time to market. Neoverse N2 POP IPs are available on the cutting-edge 5nm processes, a transition many are in the midst of making. If we compare the N1 on the left to the N2 on the right, we are looking at a very large 40 percent IPC uplift. In addition, with the jump to 5nm we have a potential for a 10 percent frequency uplift, while keeping power and area roughly equivalent. If Neoverse N1 PPA was a good fit for your workload and power envelope on 7nm, then Neoverse N2 on 5nm is a great fit.
Along with our Neoverse cores, we offer partners key resources needed to rapidly kickoff designs. We are offering the Neoverse N2 Reference design along with the CPU. This Reference design is targeted at 5G, networking, SmartNIC, and hyperscale.
Included in the reference design:
Our goal with these reference designs is to enable partners to boot an OS on day one. To learn more about our Reference Design visit the Neoverse Reference Design page.
The Neoverse N2 platform significantly raises the bar for cloud-to-edge performance efficiency. It builds on the phenomenal traction that the Arm Neoverse ecosystem has built with Neoverse N1, while bringing key performance, power efficiency, and security upgrades. We anticipate Neoverse N2-based silicon from our partners to be sampling by the end of 2021. And we can’t wait for customers to experience the added benefits Neoverse N2 will bring to cloud-to-edge solutions. To learn more:
[CTAToken URL = "https://www.arm.com/products/silicon-ip-cpu/neoverse/neoverse-n2" target="_blank" text="Explore Neoverse N2" class ="green"]
[CTAToken URL = "https://www.arm.com/company/news/2021/04/transforming-compute-for-next-generation-infrastructure" target="_blank" text="Read Chris Bergey's Launch Blog" class ="green"]