Hands-on with MPAM: Deploying and verifying on Ubuntu

September 24, 2025

10 minute read time.

Introduction

Modern computing systems often run multiple applications on the same hardware. These applications share critical resources such as the CPU, memory, storage, and other I/O devices. Without proper controls, one application’s heavy usage of these resources can degrade the performance and stability of others. To ensure predictable behavior for each workload, resource isolation becomes essential.

This need is especially important in cloud computing environments. Service providers must deliver Quality of Service (QoS) guarantees to different tenants. It also matters in hybrid deployments where high-priority workloads must coexist with lower-priority or background tasks.

The Memory System Resource Partitioning and Monitoring (MPAM) architecture addresses this challenge by enabling fine-grained control over system-level cache (SLC) and memory bandwidth. Partitioning enables the system to prevent noisy neighbors from monopolizing resources. By monitoring usage, administrators can detect contention and enforce performance policies.

In this blog post, I show how to configure and verify MPAM on Ubuntu Linux. I cover kernel support, configuration steps, and validation tools so you can achieve reliable resource isolation in your environment.

Deployment

Check MPAM support

MPAM needs hardware, firmware, and kernel support. A reliable way to check if a platform supports MPAM is by examining its ACPI tables. You can dump and inspect the ACPI MPAM table to confirm that the necessary entries are present.

# MPAM table should be in the following folder
$ ls /sys/firmware/acpi/tables/
... MPAM ...

# You can dump the MPAM table
$ apt install acpidump
$ cat /sys/firmware/acpi/tables/MPAM > /tmp/MPAM
$ iasl -d /tmp/MPAM # it will generate decode file MPAM.dsl in /tmp/
$ cat /tmp/MPAM.dsl
# In the table, the Revision column is important
# because the MPAM kernel patch we use only support MPAM 2.0.
Revision : 00 # MPAM 1.0
Revision : 01 # MPAM 2.0

Compile the kernel

After confirming that your platform supports MPAM, the next step is to build a compatible kernel. At the time of writing, the MPAM-related kernel patches have not yet been merged into the mainline kernel. They are still under review.

You must obtain them from Jame Morse's kernel fork, which contains the latest MPAM enablement patches.

$ git clone https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git
$ cd linux
$ git checkout mpam/snapshot/v6.16-rc5
$ git rev-parse HEAD 
b8e4905233fe45814b3c73be7e091f172cfb86ce
$ copy /boot/config-6.8.0-64-generic .config
$ make menuconfig
# Make sure the RESCTRL_FS and ARM64_MPAM are set to y in the menuconfig
make -j80 && make modules -j80 && make modules_install INSTALL_MOD_STRIP=1 -j80 &&  make install  INSTALL_MOD_STRIP=1 -j80

Introduce the resctrl file system

The Resource Control (resctrl) file system is a kernel interface for CPU resource allocation. Arm and Intel architectures both use the resctrl interface for managing resource allocation policies.

In MPAM, resource groups are represented as directories within the resctrl file system. Before using MPAM, you must mount the resctrl file system.

$ mount -t resctrl resctrl /sys/fs/resctrl

Before testing MPAM, it is helpful to have a general understanding of how resctrl works. This includes applying resource allocation policies and monitoring MPAM status. The directory structure of resctrl has specific files for configuration and monitoring, each serving a distinct purpose.

$ cd /sys/fs/resctrl
$ tree
.
├── cpus
├── cpus_list
├── info
│   ├── L3
│   │   ├── bit_usage
│   │   ├── cbm_mask
│   │   ├── min_cbm_bits
│   │   ├── num_closids
│   │   ├── shareable_bits
│   │   └── sparse_masks
│   ├── L3_MON
│   │   ├── max_threshold_occupancy
│   │   ├── mon_features
│   │   └── num_rmids
│   ├── MB
│   │   ├── bandwidth_gran
│   │   ├── delay_linear
│   │   ├── min_bandwidth
│   │   └── num_closids
│   ├── MB_MON
│   │   ├── mon_features
│   │   └── num_rmids
│   └── last_cmd_status
├── mode
├── mon_data
│   ├── mon_L3_01
│   │   └── llc_occupancy
│   └── mon_L3_02
│   │   └── llc_occupancy
│   ├── mon_MB_01
│   │   └── mbm_local_bytes
│   └── mon_MB_02
│       └── mbm_local_bytes
├── mon_groups
├── schemata
├── size
└── tasks

Below, we describe each key file and its role in resource control.

Create a resctrl group

cpus	Indicates the logical CPUs controlled by the group, represented as a bitmask. Read/write.
cpus_list	Same as cpus, but represented in range format (e.g., 0-63). Read/write.
info	The info directory contains detailed information about the system's support for MPAM extensions, including resource allocation and monitoring capabilities.
mode	Specifies the current allocation mode for the group. Currently, only the shareable mode is supported.
mon_data	The mon_data directory contains files used for MPAM's resource monitoring features.
mon_groups	This directory is intended for creating subgroups used only for monitoring. Currently not available.
schemata	Specifies the resource allocation scheme for this group. Read/write.
size	Shows the amount of each resource available to this group. Read-only.
tasks	Lists the task PIDs controlled by this group. Read/write.

Info folder

L3
bit_usage	Annotated capacity bitmasks showing how all instances of the resource are used.
cbm_mask	The bitmask which is valid for this resource. This mask is equivalent to 100%.
min_cbm_bits	The minimum number of bits that can be set for L3 cache resources.
num_closids	Maximum number of groups (CLOSIDs) that can be created for the resources. This includes the default group provided after mounting the resctrl filesystem.
shareable_bits	A bitmask indicating the L3 cache portions that are shared with other entities (e.g., I/O devices).
sparse_masks	Indicates whether non-contiguous 1s are allowed in the CBM (Cache Bit Mask): "0": Only contiguous 1s are supported in the CBM. "1": Non-contiguous 1s are supported in the CBM.
L3_MON
max_threshold_occupancy	After a monitoring resource is released, it may not be immediately usable because previous cache occupancy can affect the accuracy of results. This setting specifies the threshold to truly release the monitor.
mon_features	Displays the L3 cache monitoring features available, corresponding to the files under mon_data/mon_L3_xx.
num_rmids	The number of CLOSIDs which are valid for this resource.
MB
bandwidth_gran	The granularity in which the memory bandwidth percentage is allocated.
delay_linear	Indicates if the delay scale is linear or non-linear.
min_bandwidth	The minimum memory bandwidth percentage which user can request.
num_closids	The number of CLOSIDs which are valid for this resource.
MB_MON
mon_features	Lists the monitoring events if monitoring is enabled for the resource. e.g. mbm_total_bytes, mbm_local_bytes
num_rmids	The number of CLOSIDs which are valid for this resource.
last_cmd_status	Contains information about problems encountered by related commands. Reading this file helps diagnose failures in the resctrl filesystem operations.

mon_data folder

mon_L3_01	Shows L3 cache usage under NUMA node 0.
mon_L3_02	Shows L3 cache usage under NUMA node 1.
llc_occupancy	Displays the L3 cache usage of each task in the group in bytes.
mon_MB_01	Shows memory bandwidth occupation under NUMA node 0.
mon_MB_02	Shows memory bandwidth occupation under NUMA node 1.

Use resctrl

Create a resctrl group

A resctrl group can be created only in the root directory. Nesting is not supported. This is a key difference from cgroups. To create a group, make a new directory under the resctrl mount point using mkdir.

mkdir /sys/fs/resctrl/p0

Set resource allocation policy for the p0 group

Setup L3 cache portion partition

The L3 field represents the Last Level Cache (LLC) bitmask. For example, ffff divides the cache into16 equal portions, with each bit corresponding to 1/16 of the cache. The L3 bitmask must be contiguous and meet the minimum continuous length requirement specified in the info directory.

# Allocate 3/4 of L3 cache of numa node 0 and 1 for p0 group
echo "L3:0=0fff;1=0fff" > /sys/fs/resctrl/p0/schemata

Setup memory bandwidth partition

Memory bandwidth allocation is expressed as a percentage. For example, 100 means full bandwidth. Smaller values restrict a group’s available memory throughput.

# limit the memory bandwidth to 50%
echo "MB:0=50;1=50" > /sys/fs/resctrl/p0/schemata

Assign CPUs to the allocation policy

Assign CPUs by writing a bitmask to the cpus file or a range to the cpus_list file under the group directory (e.g., p0).

# add core 10 to 20 the the p0 group
echo "10-20" > /sys/fs/resctrl/p0/cpus_list

If a CPU is already assigned to a different non-root resctrl group, it is automatically removed from that group when reassigned. CPUs released from a group are returned to the root resctrl group. When CPUs are reassigned, all monitoring groups (mon_groups) under the current resctrl group have their CPU lists cleared.

Bind tasks to the allocation policy

By default, new tasks belong to the root resctrl group. Each task can belong to only one resctrl group. Binding a task to a non-root group applies that group’s resource allocation policy.

echo <pid> > /sys/fs/resctrl/p0/tasks

Validation of MPAM

SLC cache portion partition

Basic verification

1. Create a resource partition group and bind CPU cores 0-10 to it.

2. Run stress-ng on cores 0-10 and check mon_data. Without limits, stress-ng should fully utilize the SLC cache.

3. Apply an L3 cache portion limit to the group.

4. Check mon_data again to confirm that SLC cache usage has decreased.

$ cd /sys/fs/resctrl
$ mkdir p0 && cd p0

# put core 0 to 10 to the p0 group
$ echo "0-10" > cpus_list
$ cat schemata
MB:1=100;2=100
L3:1=ffff;2=ffff
 
# Run the cache intensive workload on core 0 to 10, and see the SLC cache usage on node0
$ numactl -m 0 -N 0 taskset -c 0-10 stress-ng --cache 10 --aggressive
$ cat mon_data/mon_L3_01/llc_occupancy
115977280
 
# reduce the SLC cache portion for p0 group and then see the SLC cache usage
$ echo "L3:1=ff;2=ff" > schemata
$ cat mon_data/mon_L3_01/llc_occupancy
56918632
 
# 56918632/115977280=0.49077398607, it is 50% of the total SLC cache which is alien with our setting

Advance verification

To verify that the SLC partitioning is works correctly, use the lat_mem_rd memory latency benchmark from lmbench.This tool measures read latency while gradually increasing the working set size.

In theory, read latency should spike when the working set exceeds the SLC capacity. At that point, data must be fetched from main memory. By setting different SLC cache partitions, you can observe distinct latency curves. Smaller allocations cause the latency increase to occur earlier. Be cautious, the CMC prefetch would influence the lat_mem_rd, so disable it before testing.

Our test results confirm this behavior. The one-way SLC allocation showed an earlier latency jump. The 16-way allocation maintained low latency until much larger working set sizes.

MPAM lmbench memory read latency for different l3 partition

Memory bandwidth partition

Basic verification

We use STREAM to measure memory bandwidth:

1. Create a resource partition group and bind cores 0–10 to it.

2. Run STREAM on cores 0–10 and read the steam output. Without limits, STREAM should fully saturate the available memory bandwidth.

3. Apply a memory bandwidth limit to the group.

4. Re-run STREAM and confirm that the reported bandwidth decreases.

# recovery the setting for p0
# we use stream(https://github.com/jeffhammond/STREAM.git) to get the memory bandwidth
# check the memory bandwidth of core 0 to 10
$ numactl -m 0 -N 0 taskset -c 0-10 ./stream_openmp
...
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:          476414.0     0.008310     0.008060     0.020359
Scale:         470018.6     0.008485     0.008170     0.031410
Add:           482046.2     0.012295     0.011949     0.029877
Triad:         478523.0     0.012297     0.012037     0.022810
...
# the bandwidth is roughly 475GB/s
 
# reduce the BW portion for p0 group and then see the BW
$ echo "MB:1=15;2=15" > schemata
$ numactl -m 0 -N 0 taskset -c 0-10 ./stream_openmp
...
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           82787.4     0.046935     0.046384     0.047483
Scale:          83205.3     0.046706     0.046151     0.047387
Add:            67912.5     0.085259     0.084815     0.085782
Triad:          67826.1     0.085414     0.084923     0.086275
...
# the bandwidth is roughly 75GB/s
# 75/475=0.157
# We see the MPAM memory bandwidth portion works

Advance Verification

To simulate a noisy neighbor scenario, we run both a measurement workload and a competing workload:

STREAM is used as the measurement tool.
stress-ng acts as the noisy neighbor.

We split the CPU cores of NUMA node 0 into two equal groups, each assigned to a separate resctrl group. STREAM is bound to the first group, stress-ng to the second.

We then dynamically adjust the memory bandwidth allocations. Increasing bandwidth for the first group while decreasing it for the second. The STREAM results change accordingly, showing that MPAM enforces bandwidth isolation between workloads.

MPAM memory bandwidth partition

Summary

In conclusion, enabling and validating MPAM on Ubuntu requires careful alignment between hardware capabilities, firmware configuration, and kernel support. By using the resctrl interface, MPAM provides a practical and unified way to manage CPU cache and memory bandwidth resources. This ensures performance isolation and predictability in multi-tenant or mixed-workload environments.

The verification results demonstrate that, when configured correctly, MPAM can be a powerful tool for fine-grained resource control. It is valuable for performance tuning, workload consolidation, and real-time application scenarios on Arm platforms.

Servers and Cloud Computing blog

How Fujitsu implemented confidential computing on FUJITSU-MONAKA with Arm CCA

Marc Meunier

Discover how FUJITSU-MONAKA secures AI and HPC workloads with Arm v9 and Realm-based confidential computing.
- October 13, 2025
Pre-silicon simulation and validation of OpenBMC + UEFI on Neoverse RD-V3

odinlmshen

In this blog post, learn how to integrate virtual BMC and firmware simulation into CI pipelines to speed bring-up, testing, and developer onboarding.
- October 13, 2025
Accelerating early developer bring-up and pre-silicon validation with Arm Neoverse CSS V3

odinlmshen

Discover the Arm Neoverse RD-V3 Software Stack Learning Path—helping developers accelerate early bring-up and pre-silicon validation for complex firmware on Neoverse CSS V3.
- October 13, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Hands-on with MPAM: Deploying and verifying on Ubuntu

Introduction

Deployment

Check MPAM support

Compile the kernel

Introduce the resctrl file system

Create a resctrl group

Info folder

mon_data folder

Use resctrl

Create a resctrl group

Set resource allocation policy for the p0 group

Setup L3 cache portion partition

Setup memory bandwidth partition

Assign CPUs to the allocation policy

Bind tasks to the allocation policy

Validation of MPAM

SLC cache portion partition

Basic verification

Advance verification

Memory bandwidth partition

Basic verification

Advance Verification

Summary

How Fujitsu implemented confidential computing on FUJITSU-MONAKA with Arm CCA

Pre-silicon simulation and validation of OpenBMC + UEFI on Neoverse RD-V3

Accelerating early developer bring-up and pre-silicon validation with Arm Neoverse CSS V3