Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Research Collaboration and Enablement
    • DesignStart
    • Education Hub
    • Innovation
    • Open Source Software and Platforms
  • Forums
    • AI and ML forum
    • Architectures and Processors forum
    • Arm Development Platforms forum
    • Arm Development Studio forum
    • Arm Virtual Hardware forum
    • Automotive forum
    • Compilers and Libraries forum
    • Graphics, Gaming, and VR forum
    • High Performance Computing (HPC) forum
    • Infrastructure Solutions forum
    • Internet of Things (IoT) forum
    • Keil forum
    • Morello Forum
    • Operating Systems forum
    • SoC Design and Simulation forum
    • 中文社区论区
  • Blogs
    • AI and ML blog
    • Announcements
    • Architectures and Processors blog
    • Automotive blog
    • Graphics, Gaming, and VR blog
    • High Performance Computing (HPC) blog
    • Infrastructure Solutions blog
    • Innovation blog
    • Internet of Things (IoT) blog
    • Operating Systems blog
    • Research Articles
    • SoC Design and Simulation blog
    • Tools, Software and IDEs blog
    • 中文社区博客
  • Support
    • Arm Support Services
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • Community Help
  • More
  • Cancel
Arm Community blogs
Arm Community blogs
Infrastructure Solutions blog Arm Neoverse V1 – Top-down Methodology for Performance Analysis & Telemetry Specification
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI and ML blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded blog

  • Graphics, Gaming, and VR blog

  • High Performance Computing (HPC) blog

  • Infrastructure Solutions blog

  • Internet of Things (IoT) blog

  • Operating Systems blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • High Performance Computing (HPC)
  • performance
  • Debug and Analysis
  • Server and Infrastructure
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Arm Neoverse V1 – Top-down Methodology for Performance Analysis & Telemetry Specification

Jumana Mundichipparakkal
Jumana Mundichipparakkal
February 6, 2023
3 minute read time.

The Arm Neoverse V1 Performance Analysis Methodology whitepaper is now available to help you optimize your application code for V1-based production systems.

The whitepaper is an update to the previous "Arm Neoverse N1: Performance Analysis Methodology," and it covers the new features and updates from N1 to V1 cores. This resource can be used to understand and optimize the performance of your application on the V1 platform.

To make the most of your time spent profiling and optimizing, it is important to select the right PMU events and follow a structured methodology with user-friendly SW metrics. In the whitepaper, we present the Arm top-down analysis methodology for Neoverse V1.

In this blog, we outline the updates from N1 to V1 cores and provide an overview of the contents of this whitepaper. We also include  references to other useful resources to take full advantage of the Neoverse V1 platform.

Arm Neoverse V1 supports top-down level 1 metrics.

The Arm Neoverse V1 platform is the first Arm core to support the full set of events and metrics for top-down methodology level 1 metrics. These metrics are a great value add for performance analysis and optimization.

These metrics provide a detailed breakdown of processor pipeline utilization at the SLOT level, enabling the evaluation of processor efficiency and identification of bottlenecks. This feature is a major enhancement to the performance analysis capabilities of the Arm Neoverse V1 platform, in addition to other micro-architecture exploration metrics that can be used for further analysis. 

Arm Neoverse V1 telemetry specification: Events & metrics for performance analysis.

The Arm Neoverse V1 telemetry specification, including software product-specific event descriptions and derived metrics for analysis, can be found in the Appendix B&C of the Arm Neoverse V1 Performance Analysis Methodology whitepaper.

Download the whitepaper

Arm Telemetry Solution Repository

The telemetry data, provided in machine readable JSON files, and stress workload suite referenced in the whitepaper are now available in the GitLab telemetry-solution repository.

Neoverse V1 PMU events and metrics cheat-sheets

Familiarizing yourself with the Arm Neoverse microarchitecture, including its complex pipelines and multi-level memory hierarchy, can be helpful in this process. As the Neoverse cores provide over 100 hardware counters to select from, it is important to prioritize which events to focus on. To assist with this task, we have created cheat sheets that list events and their corresponding derived metrics.

V1 Events Cheat Sheet

Table 1. Neoverse V1 core events cheat sheet

Key References

The following two documents provide all the necessary information for conducting performance analysis on Neoverse V1 and are our recommended go-to reference:

1) Arm Neoverse V1 Performance Analysis Methodology White paper: This whitepaper presents a performance analysis methodology and shows how to conduct workload characterization on the Arm Neoverse V1 platform. It is an update to the previous whitepaper on Neoverse N1, which presents a performance analysis methodology and shows how to conduct workload characterization on the Arm Neoverse N1 platform. If you are new to Arm platforms and performance analysis tools such as Linux perf, we recommend reading this whitepaper first.

Download the whitepaper

2) Arm Neoverse V1 PMU Guide (direct download): This document provides a comprehensive overview of all hardware PMU events, including micro-architecture and architecture details that are necessary for using the events effectively in performance analysis.

Arm Neoverse V1 Core

Arm Neoverse V1 is a core designed to deliver maximum single-threaded performance for demanding cloud, HPC, and AI/ML-assisted workloads. Neoverse V1 is the first Neoverse processor to include Scalable Vector Extensions (SVE) for maximum vector performance, HPC code reuse, and longevity. Neoverse V1 supports Bfloat16 and Int8 MatMul instructions. These instructions can offer up to 3x the performance for machine learning frameworks like TensorFlow, PyTorch, OneDNN, and others compared to Neoverse N1. The Neoverse V1 CPU is available today on AWS EC2 instances powered by the AWS Graviton3 and AWS Graviton3E processors.

Conclusion

Our top-down methodology analysis and telemetry specification is now available for the Neoverse V1 platform. We will begin upstreaming this information to the Linux perf tool soon. V-series cores, like V1, are designed to deliver maximum single-threaded performance within the Neoverse family of CPU IP. The Neoverse V1 Performance Analysis Methodology whitepaper paper and V1 PMU Guide can help developers extract maximum performance from the V1 architecture. We encourage all developers using V1-based platforms (including AWS Graviton3 and Graviton3E) to check it out.

Download the whitepaper

Anonymous
Infrastructure Solutions blog
  • Improve Memcached performance up to 41% with Alibaba Cloud Yitian 710 instances

    Ker Liu
    Ker Liu
    In this blog we demonstrate the advantage of running Memcached on Arm-based Alibaba Yitian 710 instances over x86-based instances.
    • March 14, 2023
  • Spark on AWS Graviton2 best practices: K-Means clustering case study

    Masoud Koleini
    Masoud Koleini
    This report provides an in-depth tuning guide for running a Spark application on a Graviton EC2 instance cluster. And we make recommendations to improve performance and reduce cost.
    • March 7, 2023
  • Arm Neoverse V1 – Top-down Methodology for Performance Analysis & Telemetry Specification

    Jumana Mundichipparakkal
    Jumana Mundichipparakkal
    In this blog we introduce the Arm Neoverse V1 Performance Analysis Methodology whitepaper.
    • February 6, 2023