The Arm Neoverse V1 Performance Analysis Methodology whitepaper is now available to help you optimize your application code for V1-based production systems.
The whitepaper is an update to the previous "Arm Neoverse N1: Performance Analysis Methodology," and it covers the new features and updates from N1 to V1 cores. This resource can be used to understand and optimize the performance of your application on the V1 platform.
To make the most of your time spent profiling and optimizing, it is important to select the right PMU events and follow a structured methodology with user-friendly SW metrics. In the whitepaper, we present the Arm top-down analysis methodology for Neoverse V1.
In this blog, we outline the updates from N1 to V1 cores and provide an overview of the contents of this whitepaper. We also include references to other useful resources to take full advantage of the Neoverse V1 platform.
The Arm Neoverse V1 platform is the first Arm core to support the full set of events and metrics for top-down methodology level 1 metrics. These metrics are a great value add for performance analysis and optimization.
These metrics provide a detailed breakdown of processor pipeline utilization at the SLOT level, enabling the evaluation of processor efficiency and identification of bottlenecks. This feature is a major enhancement to the performance analysis capabilities of the Arm Neoverse V1 platform, in addition to other micro-architecture exploration metrics that can be used for further analysis.
The Arm Neoverse V1 telemetry specification, including software product-specific event descriptions and derived metrics for analysis, can be found in the Appendix B&C of the Arm Neoverse V1 Performance Analysis Methodology whitepaper.
Download the whitepaper
The telemetry data, provided in machine readable JSON files, and stress workload suite referenced in the whitepaper are now available in the GitLab telemetry-solution repository.
Familiarizing yourself with the Arm Neoverse microarchitecture, including its complex pipelines and multi-level memory hierarchy, can be helpful in this process. As the Neoverse cores provide over 100 hardware counters to select from, it is important to prioritize which events to focus on. To assist with this task, we have created cheat sheets that list events and their corresponding derived metrics.
Table 1. Neoverse V1 core events cheat sheet
The following two documents provide all the necessary information for conducting performance analysis on Neoverse V1 and are our recommended go-to reference:
1) Arm Neoverse V1 Performance Analysis Methodology White paper: This whitepaper presents a performance analysis methodology and shows how to conduct workload characterization on the Arm Neoverse V1 platform. It is an update to the previous whitepaper on Neoverse N1, which presents a performance analysis methodology and shows how to conduct workload characterization on the Arm Neoverse N1 platform. If you are new to Arm platforms and performance analysis tools such as Linux perf, we recommend reading this whitepaper first.
2) Arm Neoverse V1 PMU Guide (direct download): This document provides a comprehensive overview of all hardware PMU events, including micro-architecture and architecture details that are necessary for using the events effectively in performance analysis.
Arm Neoverse V1 is a core designed to deliver maximum single-threaded performance for demanding cloud, HPC, and AI/ML-assisted workloads. Neoverse V1 is the first Neoverse processor to include Scalable Vector Extensions (SVE) for maximum vector performance, HPC code reuse, and longevity. Neoverse V1 supports Bfloat16 and Int8 MatMul instructions. These instructions can offer up to 3x the performance for machine learning frameworks like TensorFlow, PyTorch, OneDNN, and others compared to Neoverse N1. The Neoverse V1 CPU is available today on AWS EC2 instances powered by the AWS Graviton3 and AWS Graviton3E processors.
Our top-down methodology analysis and telemetry specification is now available for the Neoverse V1 platform. We will begin upstreaming this information to the Linux perf tool soon. V-series cores, like V1, are designed to deliver maximum single-threaded performance within the Neoverse family of CPU IP. The Neoverse V1 Performance Analysis Methodology whitepaper paper and V1 PMU Guide can help developers extract maximum performance from the V1 architecture. We encourage all developers using V1-based platforms (including AWS Graviton3 and Graviton3E) to check it out.