Arm Neoverse V1 – Top-down Methodology for Performance Analysis & Telemetry Specification

February 6, 2023

4 minute read time.

The Arm Neoverse V1 Performance Analysis Methodology whitepaper is now available to help you optimize your application code for V1-based production systems.

The whitepaper is an update to the previous Arm Neoverse N1: Performance Analysis Methodology, and it covers the new features and updates from N1 to V1 cores. This resource can be used to understand and optimize the performance of your application on the V1 platform.

To make the most of your time spent profiling and optimizing, it is important to select the right PMU events and follow a structured methodology with user-friendly SW metrics. In the whitepaper, we present the Arm top-down analysis methodology for Neoverse V1.

In this blog, we outline the updates from N1 to V1 cores and provide an overview of the contents of this whitepaper. We also include references to other useful resources to take full advantage of the Neoverse V1 platform.

Arm Neoverse V1 supports top-down level 1 metrics

The Arm Neoverse V1 platform is the first Arm core to support the full set of events and metrics for top-down methodology level 1 metrics. These metrics are a great value add for performance analysis and optimization.

These metrics provide a detailed breakdown of processor pipeline utilization at the SLOT level, enabling the evaluation of processor efficiency and identification of bottlenecks. This feature is a major enhancement to the performance analysis capabilities of the Arm Neoverse V1 platform, in addition to other micro-architecture exploration metrics that can be used for further analysis.

Arm Neoverse V1 telemetry specification
Events & metrics for performance analysis

The Arm Neoverse V1 telemetry specification, including software product-specific event descriptions and derived metrics for analysis, can be found in the Appendix B&C of the Arm Neoverse V1 Performance Analysis Methodology whitepaper.

Download the whitepaper

Arm Telemetry solution repository

The telemetry data, provided in machine readable JSON files, and stress workload suite referenced in the whitepaper are now available in the GitLab telemetry-solution repository.

Neoverse V1 PMU events and metrics cheat-sheets

Familiarizing yourself with the Arm Neoverse microarchitecture, including its complex pipelines and multi-level memory hierarchy, can be helpful in this process. As the Neoverse cores provide over 100 hardware counters to select from, it is important to prioritize which events to focus on. To assist with this task, we have created cheat sheets that list events and their corresponding derived metrics.

Events cheat sheet

Topdown Level 1	Cycle accounting	Misses per kilo instructions
BR_MIS_PRED (r10) CPU_CYCLES (r11) OP_RETIRED (r3a) OP_SPEC (r3b) STALL_SLOT (r3f) STALL_SLOT_BACKEND (r3d) STALL_SLOT_FRONTEND (r3e)	CPU_CYCLES (r11) STALL_BACKEND (r24) STALL_FRONTEND (r23)	BR_MIS_PRED_RETIRED (r22) DTLB_WALK (r34) INST_RETIRED (r08) ITLB_WALK (r35) L1D_CACHE_REFILL (r03) L1D_TLB_REFILL (r05)	L1I_CACHE_REFILL (r01) L1I_TLB_REFILL (ro2) L2D_CACHE_REFILL (r17) L2D_TLB_REFILL (r2d) LL_CACHE_MISS_RD (r37)
Branch effectiveness	Instructions TLB effectiveness	Miss ratio
BR_MIS_PRED_RETIRED (r22) BR_RETIRED (r21) INST_RETIRED (RO8)	INST_RETIRED (ro8) ITLB_WALK (r35) L1I_TLB (r26) L1I_TLB_REFILL (r02) L2D_TLB (r2f) L2D_TLB_REFILL (r2d)	BR_MIS_PRED_RETIRED (r22) BR_RETIRED (r21) DTLB_WALK (r34) ITLB_WALK (r35) L1D_CACHE (ro4) L1D_CACHE_REFILL (r03) L1D_TLB (r25) L1D_TLB_REFILL (r05) L1I_CACHE (r14)	L1I_CACHE_REFILL (r01) L1I_TLB (r26) L1I_TLB_REFILL (ro2) L2D_CACHE (r16) L2D_CACHE_REFILL (r17) L2D_TLB (r2f) L2D_TLB_REFILL (r2d) LL_CACHE_MISS_RD (r37) LL_CACHE_RD (r36)
Data TLB effectiveness	L1 Instruction cache effectiveness	L1 data cache effectiveness
DTLB_WALK (r34) INST_RETIRED (r08) L1D_TLB (r25) L1D_TLB_REFILL (r05) L2D_TLB (r2f) L2D_TLB_REFILL (r2d)	INST_RETIRED (r08) L1I_CACHE (r14) L1I_CACHE_REFILL (r01)	INST_RETIRED (ro8) L1D_CACHE (r04) L1D_CACHE_REFILL (r03)
L2 unified cache effectiveness	Last level cache effectiveness	Speculation operation mix
INST_RETIRED (r08) L2D_CACHE (r16) L2D_CACHE_REFILL (r17)	INST_RETIRED (r08) LL_CACHE_MISS_RD (r37) LL_CACHE_RD (r36)	ASE_SPEC (r74) BR_IMMED_SPEC (r78) BR_INDIRECT_SPEC (r7a) CRYPTO_SPEC (r77) SP_SPEC (r73)	INST_SPEC (r1b) LD_SPEC (r70) ST_SPEC (r71) VFP_SPEC (r75)

Table 1. Neoverse V1 core events cheat sheet

Key references

The following two documents provide all the necessary information for conducting performance analysis on Neoverse V1 and are our recommended go-to reference:

1) Arm Neoverse V1 Performance Analysis Methodology White paper: This whitepaper presents a performance analysis methodology and shows how to conduct workload characterization on the Arm Neoverse V1 platform. It is an update to the previous whitepaper on Neoverse N1, which presents a performance analysis methodology and shows how to conduct workload characterization on the Arm Neoverse N1 platform. If you are new to Arm platforms and performance analysis tools such as Linux perf, we recommend reading this whitepaper first.

Download the whitepaper

2) Arm Neoverse V1 PMU Guide (direct download): This document provides a comprehensive overview of all hardware PMU events, including micro-architecture and architecture details that are necessary for using the events effectively in performance analysis.

Arm Neoverse V1 core

Arm Neoverse V1 is a core designed to deliver maximum single-threaded performance for demanding cloud, HPC, and AI/ML-assisted workloads. Neoverse V1 is the first Neoverse processor to include Scalable Vector Extensions (SVE) for maximum vector performance, HPC code reuse, and longevity. Neoverse V1 supports Bfloat16 and Int8 MatMul instructions. These instructions can offer up to 3x the performance for machine learning frameworks like TensorFlow, PyTorch, OneDNN, and others compared to Neoverse N1. The Neoverse V1 CPU is available today on AWS EC2 instances powered by the AWS Graviton3 and AWS Graviton3E processors.

Conclusion

Our top-down methodology analysis and telemetry specification is now available for the Neoverse V1 platform. We will begin upstreaming this information to the Linux perf tool soon. V-series cores, like V1, are designed to deliver maximum single-threaded performance within the Neoverse family of CPU IP. The Neoverse V1 Performance Analysis Methodology whitepaper paper and V1 PMU Guide can help developers extract maximum performance from the V1 architecture. We encourage all developers using V1-based platforms (including AWS Graviton3 and Graviton3E) to check it out.

Download the whitepaper

0 comments
0 members are here

Servers and Cloud Computing blog

Harness the Power of Retrieval-Augmented Generation with Arm Neoverse-powered Google Axion Processors

Na Li

This blog explores the performance benefits of RAG and provides pointers for building a RAG application on Arm®︎ Neoverse-based Google Axion Processors for optimized AI workloads.
- April 7, 2025
Arm CMN S3: Driving CXL storage innovation

John Xavier Lionel

CXL are revolutionizing the storage landscape. Neoverse CMN S3 plays a pivotal role in enabling high-performance, scalable storage devices configured as CXL Type 1 and Type 3.
- February 24, 2025
Streamline Arm adoption with GitHub Copilot and Arm64 Runners

Michael Gamble

The Arm for GitHub Copilot extension is here to change the way developers approach architecture migration.
- February 19, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Arm Neoverse V1 – Top-down Methodology for Performance Analysis & Telemetry Specification

Arm Neoverse V1 supports top-down level 1 metrics

Arm Neoverse V1 telemetry specification
Events & metrics for performance analysis

Arm Telemetry solution repository

Neoverse V1 PMU events and metrics cheat-sheets

Events cheat sheet

Key references

Arm Neoverse V1 core

Conclusion

Harness the Power of Retrieval-Augmented Generation with Arm Neoverse-powered Google Axion Processors

Arm CMN S3: Driving CXL storage innovation

Streamline Arm adoption with GitHub Copilot and Arm64 Runners

Arm Neoverse V1 – Top-down Methodology for Performance Analysis & Telemetry Specification

Arm Neoverse V1 supports top-down level 1 metrics

Arm Neoverse V1 telemetry specificationEvents & metrics for performance analysis

Arm Telemetry solution repository

Neoverse V1 PMU events and metrics cheat-sheets

Events cheat sheet

Key references

Arm Neoverse V1 core

Conclusion

Arm Neoverse V1 telemetry specification
Events & metrics for performance analysis