Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Servers and Cloud Computing blog Arm Neoverse N1 – Performance Analysis Methodology to Tune Production Systems and Application Code
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • performance analysis
  • Neoverse N1
  • infrastructure
  • Telemetry
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Arm Neoverse N1 – Performance Analysis Methodology to Tune Production Systems and Application Code

Jumana Mundichipparakkal
Jumana Mundichipparakkal
October 7, 2021
2 minute read time.

Software developers rely on performance profilers to collect detailed performance data during program execution. Performance profilers measure important parameters, like instructions, cycles, cache hits, and branch misses, so developers can characterize CPU workloads and analyze code execution. The results of profiling make it easier to fine-tune the efficiency of prototypes, production systems, and application code, and even help plan for the future, by identifying design requirements for next-generation systems.

The Arm Neoverse N1 CPU is truly revolutionary, delivering industry-leading socket performance at half the power, with server-class thread performance. Neoverse N1 powers leading cloud provider infrastructure such as the AWS Graviton2 processors and Oracle OCI Ampere A1. Cloud providers choose Neoverse N1 due to its clear benefit of delivering 40% better price performance over comparable current generation x86-based instances for a wide variety of workloads.

To deliver upon these performance gains and power savings, we need to understand the methodology behind the performance analysis. For software workload analysis, both raw hardware events as well as some useful data points derived from them can be used for correlating the events to derive actionable insights. To achieve that, we use something called a Performance Monitoring Unit (PMU). PMU is a hardware-based feature that gathers hardware execution data while the application is running. The PMU doesn’t increase overhead or impact performance, because profiling is done in hardware, outside the application’s process. There’s nothing inserted in the code, and the order of execution remains unchanged. The Neoverse N1 PMU is designed for use with the Linux perf tool, a performance tool API that helps to collect the metrics from the hardware counters. Linux perf also helps to annotate code with samples of events for easy correlation between micro-architectural behavior and software execution. A performance profiling setup of counting and sampling using Linux Perf during workload execution takes developers from high-level, big-picture analysis to detailed, event-specific examinations for identifying root causes of performance issues.

Effective Profiling

Choosing the right PMU events and following a methodology can make time spent on profiling and optimization more effective. It helps to have a high-level understanding of the Arm Neoverse N1 micro-architecture, since it includes complex pipelines and use a multi-level memory hierarchy. It also helps to know which events to focus on, since the Neoverse cores support more than 100 hardware counters.

Insider Tips

To help save time and effort, so you can quickly refine your analysis and go deeper into the details of software optimization, we’ve put together two key documents that tells you what you need to know.

1) Arm Neoverse N1 PMU Guide: This document gives a better description of all the hardware PMU events, with micro-architecture and architecture details required for the usage of the events while conducting performance analysis.

2) Arm Neoverse N1 Performance Analysis Methodology White paper: This white paper presents the performance analysis methodology and demonstrates how to conduct workload characterization on Arm Neoverse N1.

Download the white paper

We introduce the three subsystems of the CPU, suggest the raw events and derived metrics to use with the initial workload characterization, identify the four key perf functions used for counting and event-based sampling, and include a  case study for demonstrating the methodology.

Anonymous
Servers and Cloud Computing blog
  • Harness the Power of Retrieval-Augmented Generation with Arm Neoverse-powered Google Axion Processors

    Na Li
    Na Li
    This blog explores the performance benefits of RAG and provides pointers for building a RAG application on Arm®︎ Neoverse-based Google Axion Processors for optimized AI workloads.
    • April 7, 2025
  • Arm CMN S3: Driving CXL storage innovation

    John Xavier Lionel
    John Xavier Lionel
    CXL are revolutionizing the storage landscape. Neoverse CMN S3 plays a pivotal role in enabling high-performance, scalable storage devices configured as CXL Type 1 and Type 3.
    • February 24, 2025
  • Streamline Arm adoption with GitHub Copilot and Arm64 Runners

    Michael Gamble
    Michael Gamble
    The Arm for GitHub Copilot extension is here to change the way developers approach architecture migration.
    • February 19, 2025