Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Research Collaboration and Enablement
    • DesignStart
    • Education Hub
    • Innovation
    • Open Source Software and Platforms
  • Forums
    • AI and ML forum
    • Architectures and Processors forum
    • Arm Development Platforms forum
    • Arm Development Studio forum
    • Arm Virtual Hardware forum
    • Automotive forum
    • Compilers and Libraries forum
    • Graphics, Gaming, and VR forum
    • High Performance Computing (HPC) forum
    • Infrastructure Solutions forum
    • Internet of Things (IoT) forum
    • Keil forum
    • Morello Forum
    • Operating Systems forum
    • SoC Design and Simulation forum
    • 中文社区论区
  • Blogs
    • AI and ML blog
    • Announcements
    • Architectures and Processors blog
    • Automotive blog
    • Graphics, Gaming, and VR blog
    • High Performance Computing (HPC) blog
    • Infrastructure Solutions blog
    • Innovation blog
    • Internet of Things (IoT) blog
    • Operating Systems blog
    • Research Articles
    • SoC Design and Simulation blog
    • Smart Homes
    • Tools, Software and IDEs blog
    • Works on Arm blog
    • 中文社区博客
  • Support
    • Arm Support Services
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • Community Help
  • More
  • Cancel
Arm Community blogs
Arm Community blogs
Tools, Software and IDEs blog Arm Neoverse N1 – Performance Analysis Methodology to Tune Production Systems and Application Code
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI and ML blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded blog

  • Graphics, Gaming, and VR blog

  • High Performance Computing (HPC) blog

  • Infrastructure Solutions blog

  • Internet of Things (IoT) blog

  • Operating Systems blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • performance analysis
  • Neoverse N1
  • infrastructure
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Arm Neoverse N1 – Performance Analysis Methodology to Tune Production Systems and Application Code

Jumana Mundichipparakkal
Jumana Mundichipparakkal
October 7, 2021
2 minute read time.

Software developers rely on performance profilers to collect detailed performance data during program execution. Performance profilers measure important parameters, like instructions, cycles, cache hits, and branch misses, so developers can characterize CPU workloads and analyze code execution. The results of profiling make it easier to fine-tune the efficiency of prototypes, production systems, and application code, and even help plan for the future, by identifying design requirements for next-generation systems.

The Arm Neoverse N1 CPU is truly revolutionary, delivering industry-leading socket performance at half the power, with server-class thread performance. Neoverse N1 powers leading cloud provider infrastructure such as the AWS Graviton2 processors and Oracle OCI Ampere A1. Cloud providers choose Neoverse N1 due to its clear benefit of delivering 40% better price performance over comparable current generation x86-based instances for a wide variety of workloads.

To deliver upon these performance gains and power savings, we need to understand the methodology behind the performance analysis. For software workload analysis, both raw hardware events as well as some useful data points derived from them can be used for correlating the events to derive actionable insights. To achieve that, we use something called a Performance Monitoring Unit (PMU). PMU is a hardware-based feature that gathers hardware execution data while the application is running. The PMU doesn’t increase overhead or impact performance, because profiling is done in hardware, outside the application’s process. There’s nothing inserted in the code, and the order of execution remains unchanged. The Neoverse N1 PMU is designed for use with the Linux perf tool, a performance tool API that helps to collect the metrics from the hardware counters. Linux perf also helps to annotate code with samples of events for easy correlation between micro-architectural behavior and software execution. A performance profiling setup of counting and sampling using Linux Perf during workload execution takes developers from high-level, big-picture analysis to detailed, event-specific examinations for identifying root causes of performance issues.

Effective Profiling

Choosing the right PMU events and following a methodology can make time spent on profiling and optimization more effective. It helps to have a high-level understanding of the Arm Neoverse N1 micro-architecture, since it includes complex pipelines and use a multi-level memory hierarchy. It also helps to know which events to focus on, since the Neoverse cores support more than 100 hardware counters.

Insider Tips

To help save time and effort, so you can quickly refine your analysis and go deeper into the details of software optimization, we’ve put together two key documents that tells you what you need to know.

1) Arm Neoverse N1 PMU Guide: This document gives a better description of all the hardware PMU events, with micro-architecture and architecture details required for the usage of the events while conducting performance analysis.

2) Arm Neoverse N1 Performance Analysis Methodology White paper: This white paper presents the performance analysis methodology and demonstrates how to conduct workload characterization on Arm Neoverse N1.

Download - Neoverse N1 Performance Analysis Methodology Whitepaper

We introduce the three subsystems of the CPU, suggest the raw events and derived metrics to use with the initial workload characterization, identify the four key perf functions used for counting and event-based sampling, and include a  case study for demonstrating the methodology.

Anonymous
Tools, Software and IDEs blog
  • Product update: Arm Development Studio 2022.2 now available

    Ronan Synnott
    Ronan Synnott
    Arm Development Studio 2022.2 is now available, providing support for PSA-ADAC authenticated debug.
    • December 7, 2022
  • Product update: Arm Development Studio 2022.1 now available

    Ronan Synnott
    Ronan Synnott
    Arm Development Studio 2022.1 (and 2022.b) is now available.
    • July 25, 2022
  • Arm Compiler for Linux: what is new in the 22.0 release?

    Ashok Bhat
    Ashok Bhat
    Arm Compiler for Linux 22.0 is now available with performance improvements and support for new hardware like AWS Graviton 3.
    • May 27, 2022