Perf is a performance monitoring tool for Linux systems. You can use Perf to:
Many features of Perf require hardware support. For Armv8-A CPUs, the hardware support is provided by the Performance Monitor Unit (PMU). The PMU is an optional extension to the Armv8-A CPUs. It provides hardware performance monitoring capabilities.
This blog focuses on how to enable and use PMU-related functionalities. It describes a basic performance analysis workflow on the real Armv8-A platform.
This blog uses the Juno r2 platform developmental platform to show you how to enable PMU-related functionalities in Perf. The following table shows the version information of the hardware and software on Juno r2 platform.
This section describes the system requirements for performing performance analysis using Perf:
Check whether Arm PMU support is enabled in the kernel. To enable Arm PMU support, the kernel configurations ARM_PMU and HW_PERF_EVENTS must be enabled. By default, these two configurations are enabled when the architecture is specified as ARM64.
ARM_PMU
HW_PERF_EVENTS
ARM64
If ARM_PMU and HW_PERF_EVENTS are not enabled, set the kernel configurations as follows and rebuild the kernel.
CONFIG_HAVE_PERF_EVENTS=y CONFIG_PROFILING=y CONFIG_PERF_EVENTS=y CONFIG_ARM_PMU=y CONFIG_HW_PERF_EVENTS=y
After the kernel boots, check if the Arm PMU driver is loaded. An example of successful load on the Juno r2 platform is as follows:
$ dmesg | grep "PMU driver" [ 0.488888] hw perfevents: enabled with armv8_pmuv3_0 PMU driver, 7 counters available [ 0.489832] hw perfevents: enabled with armv8_pmuv3_1 PMU driver, 7 counters available
The Juno r2 platform is based on big.LITTLE processor with four Cortex-A53 processors and two Cortex-A72 processors. Each processor has its own PMU. The log above shows that the names of the Arm PMU driver are armv8_pmuv3_0 and armv8_pmuv3_1 respectively. For each processor, seven counters are available for the driver to use, that is one cycle counter and six event counters.
armv8_pmuv3_0
armv8_pmuv3_1
Note: In this example, Juno r2 platform boots using ACPI tables. You may also get the following log if you boot using device tree information.
$ dmesg | grep "PMU driver" [ 0.461637] hw perfevents: enabled with armv8_cortex_a53 PMU driver, 7 counters available [ 0.461894] hw perfevents: enabled with armv8_cortex_a72 PMU driver, 7 counters available
The version of the Linux Perf tool must be the same as the kernel version. To check the versions of Perf and the Linux kernel, use the following commands:
$ uname -r #check kernel version $ perf --version #check perf version
If Perf is installed with a correct version, go to the part Performance analysis workflow.
If Perf is not installed, or inconsistent with the kernel version, use one of the following methods to install it.
# For Ubuntu, Debian $ sudo apt update $ sudo apt-get install linux-tools-common linux-tools-generic linux-tools-`uname -r`
For some kernel versions, you must build and install Perf from the kernel sources. Depending on your host, you can choose one of the following compiling methods:
An example of compiling on the Juno r2 platform is as follows:
$ cd /usr/src/linux-source-6.1/tools/perf $ sudo apt-get install gcc flex bison $ make $ make install
In the commands above, we acquire the kernel source consistent with the current kernel version of Juno r2 platform, that is, linux-source-6.1. Next, we install the necessary packages for C building. Then, we can make the Perf tool.
linux-source-6.1
The features supported by Perf require package dependencies. If a package is missing in the host environment, it is detected and prompts [OFF] during the make phase. An example log on the Juno r2 platform is as follows. You can install the corresponding package and make the Perf again.
[OFF]
... Makefile.config:1019: No libcap found, disables capability support, please install libcap-devel/libcap-dev Makefile.config:1032: No numa.h found, disables 'perf bench numa mem' benchmark, please install numactl-devel/libnuma-devel/libnuma-dev Makefile.config:1091: No libbabeltrace found, disables 'perf data' CTF format support, please install libbabeltrace-dev[el]/libbabeltrace-ctf-dev Auto-detecting system features: ... dwarf: [ OFF ] ... dwarf_getlocations: [ OFF ] ... glibc: [ on ] ... libbfd: [ OFF ] ... libbfd-buildid: [ OFF ] ... libcap: [ OFF ] ... libelf: [ OFF ] ... libnuma: [ OFF ] ... numa_num_possible_cpus: [ OFF ] ... libperl: [ OFF ] ... libpython: [ OFF ] ... libcrypto: [ on ] ... libunwind: [ OFF ] ... libdw-dwarf-unwind: [ OFF ] ... zlib: [ OFF ] ... lzma: [ OFF ] ... get_cpuid: [ OFF ] ... bpf: [ on ] ... libaio: [ on ] ... libzstd: [ OFF ]
The features that Perf supports depend on packages. These packages are dynamically linked by default. Therefore, when cross-compiling Perf, you need to specify the linking option as static as follows:
$ sudo apt-get install gcc-aarch64-linux-gnu flex bison $ cd <kernel_source_path>/tools/perf $ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- LDFLAGS=-static
After installation, you can use the command perf test to test the features that perf supports.
perf test
To develop the performance analysis workflow, first clarify the goal. In this example, the goal is to optimize the application. The following code is a simple example of the application:
#define COL_LINE 64 #define ROW_LINE 512000 long array[ROW_LINE][COL_LINE]; void compute_squares() { int i, j; for (i=0; i<COL_LINE; i++) { for (j=0; j<ROW_LINE; j++) { array[j][i] = array[j][i] * array[j][i]; } } } void array_assign() { int i, j; for (i=0; i<COL_LINE; i++) { for (j=0; j<ROW_LINE; j++) { array[j][i] = i+j; } } } int main() { array_assign(); compute_squares(); return 0; }
This application includes two functions:
Application optimization is performed on a system with a determined CPU microarchitecture and operating system. In this example, the related information is described in the section Example platform.
When you use the Linux Perf tool for performance analysis, the basic workflow consists of four stages, as the following figure shows.
Figure 1: Basic performance analysis workflow
Part 2, released on 15 August and Part 3, released on 22 August, describe these four stages.