Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Research Collaboration and Enablement
    • DesignStart
    • Education Hub
    • Innovation
    • Open Source Software and Platforms
  • Forums
    • AI and ML forum
    • Architectures and Processors forum
    • Arm Development Platforms forum
    • Arm Development Studio forum
    • Arm Virtual Hardware forum
    • Automotive forum
    • Compilers and Libraries forum
    • Graphics, Gaming, and VR forum
    • High Performance Computing (HPC) forum
    • Infrastructure Solutions forum
    • Internet of Things (IoT) forum
    • Keil forum
    • Morello Forum
    • Operating Systems forum
    • SoC Design and Simulation forum
    • 中文社区论区
  • Blogs
    • AI and ML blog
    • Announcements
    • Architectures and Processors blog
    • Automotive blog
    • Graphics, Gaming, and VR blog
    • High Performance Computing (HPC) blog
    • Infrastructure Solutions blog
    • Innovation blog
    • Internet of Things (IoT) blog
    • Operating Systems blog
    • Research Articles
    • SoC Design and Simulation blog
    • Tools, Software and IDEs blog
    • 中文社区博客
  • Support
    • Arm Support Services
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • Community Help
  • More
  • Cancel
Arm Community blogs
Arm Community blogs
Architectures and Processors blog Using Perf to enable PMU functionality on Armv8-A CPUs: Enable Arm PMU support for the kernel and install Linux Perf tool
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI and ML blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded blog

  • Graphics, Gaming, and VR blog

  • High Performance Computing (HPC) blog

  • Infrastructure Solutions blog

  • Internet of Things (IoT) blog

  • Operating Systems blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
  • Armv8-A
  • Linux
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Using Perf to enable PMU functionality on Armv8-A CPUs: Enable Arm PMU support for the kernel and install Linux Perf tool

Jiaming Guo
Jiaming Guo
August 8, 2023
5 minute read time.
Part 1 of 3 Blog Series


Perf is a performance monitoring tool for Linux systems. You can use Perf to:

  • Measure performance metrics of the system such as CPU usage, memory usage, disk I/O, network activity 
  • Identify performance bottlenecks, diagnose system performance issues, and optimize application performance

Many features of Perf require hardware support. For Armv8-A CPUs, the hardware support is provided by the Performance Monitor Unit (PMU). The PMU is an optional extension to the Armv8-A CPUs. It provides hardware performance monitoring capabilities.

This blog focuses on how to enable and use PMU-related functionalities. It describes a basic performance analysis workflow on the real Armv8-A platform.

Example platform

This blog uses the Juno r2 platform developmental platform to show you how to enable PMU-related functionalities in Perf. The following table shows the version information of the hardware and software on Juno r2 platform.

Item Version
CPU Cortex-A53 (Armv8.0,PMUv3) and Cortex-A72 (Armv8.0, PMUv3)
OS Debian GNU/Linux 11 (bullseye)
Kernel Linux 6.1.0
Compiler GCC 10.2.1

Prerequisites

This section describes the system requirements for performing performance analysis using Perf:

  • We enable Arm PMU support in the kernel
  • We install Linux Perf tool with the same version of the kernel

Enable Arm PMU support for the kernel

Check whether Arm PMU support is enabled in the kernel. To enable Arm PMU support, the kernel configurations ARM_PMU and HW_PERF_EVENTS must be enabled. By default, these two configurations are enabled when the architecture is specified as ARM64.

If ARM_PMU and HW_PERF_EVENTS are not enabled, set the kernel configurations as follows and rebuild the kernel.

CONFIG_HAVE_PERF_EVENTS=y
CONFIG_PROFILING=y
CONFIG_PERF_EVENTS=y
CONFIG_ARM_PMU=y
CONFIG_HW_PERF_EVENTS=y

After the kernel boots, check if the Arm PMU driver is loaded. An example of successful load on the Juno r2 platform is as follows:

$ dmesg | grep "PMU driver"
[ 0.488888] hw perfevents: enabled with armv8_pmuv3_0 PMU driver, 7 counters available
[ 0.489832] hw perfevents: enabled with armv8_pmuv3_1 PMU driver, 7 counters available

The Juno r2 platform is based on big.LITTLE processor with four Cortex-A53 processors and two Cortex-A72 processors. Each processor has its own PMU. The log above shows that the names of the Arm PMU driver are armv8_pmuv3_0 and armv8_pmuv3_1 respectively. For each processor, seven counters are available for the driver to use, that is one cycle counter and six event counters.

Note: In this example, Juno r2 platform boots using ACPI tables. You may also get the following log if you boot using device tree information.

$ dmesg | grep "PMU driver"
[ 0.461637] hw perfevents: enabled with armv8_cortex_a53 PMU driver, 7 counters available
[ 0.461894] hw perfevents: enabled with armv8_cortex_a72 PMU driver, 7 counters available

Install Linux Perf tool

The version of the Linux Perf tool must be the same as the kernel version. To check the versions of Perf and the Linux kernel, use the following commands:

$ uname -r #check kernel version
$ perf --version #check perf version

If Perf is installed with a correct version, go to the part Performance analysis workflow.

If Perf is not installed, or inconsistent with the kernel version, use one of the following methods to install it.

Method 1: Install from packages

# For Ubuntu, Debian
$ sudo apt update
$ sudo apt-get install linux-tools-common linux-tools-generic linux-tools-`uname -r`

Method 2: Install from sources

For some kernel versions, you must build and install Perf from the kernel sources. Depending on your host, you can choose one of the following compiling methods:

  • Native compile on Arm64 platform
  • Cross-compile for Arm64 on x86 platform
Native compile on Arm64 platform

An example of compiling on the Juno r2 platform is as follows:

$ cd /usr/src/linux-source-6.1/tools/perf 
$ sudo apt-get install gcc flex bison
$ make
$ make install

In the commands above, we acquire the kernel source consistent with the current kernel version of Juno r2 platform, that is, linux-source-6.1. Next, we install the necessary packages for C building. Then, we can make the Perf tool.

The features supported by Perf require package dependencies. If a package is missing in the host environment, it is detected and prompts [OFF] during the make phase. An example log on the Juno r2 platform is as follows. You can install the corresponding package and make the Perf again.

...

Makefile.config:1019: No libcap found, disables capability support, please install libcap-devel/libcap-dev
Makefile.config:1032: No numa.h found, disables 'perf bench numa mem' benchmark, please install numactl-devel/libnuma-devel/libnuma-dev
Makefile.config:1091: No libbabeltrace found, disables 'perf data' CTF format support, please install libbabeltrace-dev[el]/libbabeltrace-ctf-dev

Auto-detecting system features:
...                                   dwarf: [ OFF ]
...                      dwarf_getlocations: [ OFF ]
...                                   glibc: [ on  ]
...                                  libbfd: [ OFF ]
...                          libbfd-buildid: [ OFF ]
...                                  libcap: [ OFF ]
...                                  libelf: [ OFF ]
...                                 libnuma: [ OFF ]
...                  numa_num_possible_cpus: [ OFF ]
...                                 libperl: [ OFF ]
...                               libpython: [ OFF ]
...                               libcrypto: [ on  ]
...                               libunwind: [ OFF ]
...                      libdw-dwarf-unwind: [ OFF ]
...                                    zlib: [ OFF ]
...                                    lzma: [ OFF ]
...                               get_cpuid: [ OFF ]
...                                     bpf: [ on  ]
...                                  libaio: [ on  ]
...                                 libzstd: [ OFF ]

Cross-compile for Arm64 on x86 platform

The features that Perf supports depend on packages. These packages are dynamically linked by default. Therefore, when cross-compiling Perf, you need to specify the linking option as static as follows:

$ sudo apt-get install gcc-aarch64-linux-gnu flex bison
$ cd <kernel_source_path>/tools/perf 
$ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- LDFLAGS=-static

After installation, you can use the command perf test to test the features that perf supports.

Performance analysis workflow

To develop the performance analysis workflow, first clarify the goal. In this example, the goal is to optimize the application. The following code is a simple example of the application:

#define COL_LINE 64
#define ROW_LINE 512000
 
long array[ROW_LINE][COL_LINE]; 
 
void compute_squares()
{    
    int i, j;
    
    for (i=0; i<COL_LINE; i++) {
        for (j=0; j<ROW_LINE; j++) {
            array[j][i] = array[j][i] * array[j][i];
        }
    }
}
 
void array_assign()
{
    int i, j;

    for (i=0; i<COL_LINE; i++) {
        for (j=0; j<ROW_LINE; j++) {
            array[j][i] = i+j;
        }
    }
}
 
int main()
{
    array_assign();
    compute_squares();
    return 0;
}

This application includes two functions:

  • Assign initial values to a two-dimensional array
  • Calculate the square value of each element in the array

Application optimization is performed on a system with a determined CPU microarchitecture and operating system. In this example, the related information is described in the section Example platform.

When you use the Linux Perf tool for performance analysis, the basic workflow consists of four stages, as the following figure shows.

Basic performance analysis workflow (four stages)

Figure 1: Basic performance analysis workflow

Part 2, released on 15 August and Part 3, released on 22 August, describe these four stages.

Anonymous
Architectures and Processors blog
  • Using Perf to enable PMU functionality on Armv8-A CPUs: Stage 3 and Stage 4

    Jiaming Guo
    Jiaming Guo
    Part 3. This blog introduces how to enable PMU feature in Armv8-A CPUs and provides an example of performance analysis workflow on real Armv8-A platform.
    • August 22, 2023
  • Using Perf to enable PMU functionality on Armv8-A CPUs: Stage 1 and Stage 2

    Jiaming Guo
    Jiaming Guo
    Part 2. This blog introduces how to enable PMU feature in Armv8-A CPUs and provides an example of performance analysis workflow on real Armv8-A platform.
    • August 15, 2023
  • Using Perf to enable PMU functionality on Armv8-A CPUs: Enable Arm PMU support for the kernel and install Linux Perf tool

    Jiaming Guo
    Jiaming Guo
    This blog post introduces how to enable PMU feature in Armv8-A CPUs and provides an example of performance analysis workflow on real Armv8-A platform.
    • August 8, 2023