Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Research Collaboration and Enablement
    • DesignStart
    • Education Hub
    • Innovation
    • Open Source Software and Platforms
  • Forums
    • AI and ML forum
    • Architectures and Processors forum
    • Arm Development Platforms forum
    • Arm Development Studio forum
    • Arm Virtual Hardware forum
    • Automotive forum
    • Compilers and Libraries forum
    • Graphics, Gaming, and VR forum
    • High Performance Computing (HPC) forum
    • Infrastructure Solutions forum
    • Internet of Things (IoT) forum
    • Keil forum
    • Morello Forum
    • Operating Systems forum
    • SoC Design and Simulation forum
    • 中文社区论区
  • Blogs
    • AI and ML blog
    • Announcements
    • Architectures and Processors blog
    • Automotive blog
    • Graphics, Gaming, and VR blog
    • High Performance Computing (HPC) blog
    • Infrastructure Solutions blog
    • Innovation blog
    • Internet of Things (IoT) blog
    • Operating Systems blog
    • Research Articles
    • SoC Design and Simulation blog
    • Tools, Software and IDEs blog
    • 中文社区博客
  • Support
    • Arm Support Services
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • Community Help
  • More
  • Cancel
Arm Community blogs
Arm Community blogs
Embedded blog Using the ARM Performance Monitor Unit (PMU) Linux Driver
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI and ML blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded blog

  • Graphics, Gaming, and VR blog

  • High Performance Computing (HPC) blog

  • Infrastructure Solutions blog

  • Internet of Things (IoT) blog

  • Operating Systems blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • performance analysis
  • Tutorial
  • Linux
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Using the ARM Performance Monitor Unit (PMU) Linux Driver

Jason Andrews
Jason Andrews
March 8, 2015

The Linux kernel provides an ARM PMU driver for counting events such as cycles, instructions, and cache metrics. My previous article covered how to access data from the PMU automatically within SoC Designer by enabling hardware profiling events. It also discussed how to enable access from a Linux application so the application can directly access the PMU information. This article covers how to use the ARM Linux PMU driver to gather performance information. In the previous article, the Linux application was accessing the PMU hardware directly using system control coprocessor instructions, but this time a device driver and a system call will be used. As before, I used a Carbon Performance Analysis Kit (CPAK) for a Cortex-A53 system running 64-bit Linux.

The steps covered are:

  • Configure Linux kernel for profiling
  • Confirm the device tree entry for the ARM PMU driver is included in the kernel
  • Insert system calls into the Linux application to access performance information

Kernel Configuration

The first step is to enable profiling in the Linux kernel. It is not always easy to identify the minimal set of values to enable kernel features, but in this case I enabled “Kernel performance events and counters which is found under General setup" then under "Kernel Performance Events and Counters".

Arm Performance Monitor Kernel

I also enabled Profiling support on the General setup menu.

Arm Performance Monitor Kernel Configuration

Once these options are enabled recompile the kernel as usual by following the instructions provided in the CPAK.

Device Tree Entry

Below is the device tree entry for the PMU driver. All Carbon Linux CPAKs for Cortex-A53 and Cortex-A57 include this entry so no modification is needed. If you are working with your own Linux configuration confirm the pmu entry is present in the device tree.

Arm Performance Monitor device tree

When the kernel boots the driver prints out a message:

hw perfevents: enabled with arm/armv8-pmuv3 PMU driver, 7 counters available

If this message is not in the kernel boot log check both the PMU driver device tree entry and the kernel configuration parameters listed above. If any of them are not correct the driver message will not appear.

Performance Information from a Linux Application

One way to get performance information from a Linux application is to use the perf_event_open system call. This system call does not have a glibc wrapper so it is called directly using syscall. Most of the available examples create a wrapper function, including the one shown in the manpage to make for easier usage.

Arm Performance Monitor information
 

The process is similar to many other Linux system call. First, get a file descriptor using open() and then use the file descriptor for other operations such as ioctl() and read(). The perf_event_open system call uses a number of parameters to configure the events to be counted. Sticking with the simple case of instruction count, the perf_event_attr data structure needs to be filled in with the desired details.

It contains information about:

  • Start enabled or disabled
  • Trace child processes or not
  • Include hypervisor activity or not
  • Include kernel activity or not

Other system call arguments include which event to trace (such as instructions), the process id to trace, and which CPUs to trace on.

A setup function to count instructions could look like this:

Arm Performance Monitor Unit set up

At the end of the test or interesting section of code it’s easy to disable the instruction count and read the current value. In this code example, get_totaltime() uses a Linux timer to time the interesting work and this is combined with the instruction count from the PMU driver to print some metrics at the end of the test.

Arm Performance Monitor Unit test

Conclusion

The ARM PMU driver and perf_event_open system call provide a far more robust solution for accessing the ARM PMU from Linux applications. The driver takes care of all of the accounting, event counter overflow handling, and provides many flexible options for tracing. For situations where tracing many events is required, it may be overly cumbersome to use the perf_event_open system call.

One of the features of perf_event_open is the ability to use a group file descriptor to create groups of events with one group leader and other group members with all events being traced as a group. While all of this is possible it may be helpful to look at the perf command, which comes with the Linux kernel and provides the ability to control the counters for entire applications.

Anonymous
Embedded blog
  • The flexible approach to adding Functional Safety to a CPU

    James Scobie
    James Scobie
    Find out more about Functional Safety with SoC designs and Software Test Libraries.
    • November 8, 2022
  • The importance of building functional safety into your design right from the start

    Madhusudan Rao
    Madhusudan Rao
    Currently, there are many processors that are not designed with functional safety standards in mind and the use of these can lead to lengthy and costly qualification processes for safety relevant applications…
    • November 8, 2022
  • Arm Safety Ready program: Building confidence into your application

    Madhusudan Rao
    Madhusudan Rao
    To demonstrate Arm’s commitment to functional safety, we announce the launch of our Safety Ready program.
    • November 8, 2022