Modern computing systems often run multiple applications on the same hardware. These applications share critical resources such as the CPU, memory, storage, and other I/O devices. Without proper controls, one application’s heavy usage of these resources can degrade the performance and stability of others. To ensure predictable behavior for each workload, resource isolation becomes essential.
This need is especially important in cloud computing environments. Service providers must deliver Quality of Service (QoS) guarantees to different tenants. It also matters in hybrid deployments where high-priority workloads must coexist with lower-priority or background tasks.
The Memory System Resource Partitioning and Monitoring (MPAM) architecture addresses this challenge by enabling fine-grained control over system-level cache (SLC) and memory bandwidth. Partitioning enables the system to prevent noisy neighbors from monopolizing resources. By monitoring usage, administrators can detect contention and enforce performance policies.
In this blog post, I show how to configure and verify MPAM on Ubuntu Linux. I cover kernel support, configuration steps, and validation tools so you can achieve reliable resource isolation in your environment.
MPAM needs hardware, firmware, and kernel support. A reliable way to check if a platform supports MPAM is by examining its ACPI tables. You can dump and inspect the ACPI MPAM table to confirm that the necessary entries are present.
# MPAM table should be in the following folder $ ls /sys/firmware/acpi/tables/ ... MPAM ... # You can dump the MPAM table $ apt install acpidump $ cat /sys/firmware/acpi/tables/MPAM > /tmp/MPAM $ iasl -d /tmp/MPAM # it will generate decode file MPAM.dsl in /tmp/ $ cat /tmp/MPAM.dsl # In the table, the Revision column is important # because the MPAM kernel patch we use only support MPAM 2.0. Revision : 00 # MPAM 1.0 Revision : 01 # MPAM 2.0
After confirming that your platform supports MPAM, the next step is to build a compatible kernel. At the time of writing, the MPAM-related kernel patches have not yet been merged into the mainline kernel. They are still under review.
You must obtain them from Jame Morse's kernel fork, which contains the latest MPAM enablement patches.
$ git clone https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git $ cd linux $ git checkout mpam/snapshot/v6.16-rc5 $ git rev-parse HEAD b8e4905233fe45814b3c73be7e091f172cfb86ce $ copy /boot/config-6.8.0-64-generic .config $ make menuconfig # Make sure the RESCTRL_FS and ARM64_MPAM are set to y in the menuconfig make -j80 && make modules -j80 && make modules_install INSTALL_MOD_STRIP=1 -j80 && make install INSTALL_MOD_STRIP=1 -j80
The Resource Control (resctrl) file system is a kernel interface for CPU resource allocation. Arm and Intel architectures both use the resctrl interface for managing resource allocation policies.
In MPAM, resource groups are represented as directories within the resctrl file system. Before using MPAM, you must mount the resctrl file system.
$ mount -t resctrl resctrl /sys/fs/resctrl
Before testing MPAM, it is helpful to have a general understanding of how resctrl works. This includes applying resource allocation policies and monitoring MPAM status. The directory structure of resctrl has specific files for configuration and monitoring, each serving a distinct purpose.
$ cd /sys/fs/resctrl $ tree . ├── cpus ├── cpus_list ├── info │ ├── L3 │ │ ├── bit_usage │ │ ├── cbm_mask │ │ ├── min_cbm_bits │ │ ├── num_closids │ │ ├── shareable_bits │ │ └── sparse_masks │ ├── L3_MON │ │ ├── max_threshold_occupancy │ │ ├── mon_features │ │ └── num_rmids │ ├── MB │ │ ├── bandwidth_gran │ │ ├── delay_linear │ │ ├── min_bandwidth │ │ └── num_closids │ ├── MB_MON │ │ ├── mon_features │ │ └── num_rmids │ └── last_cmd_status ├── mode ├── mon_data │ ├── mon_L3_01 │ │ └── llc_occupancy │ └── mon_L3_02 │ │ └── llc_occupancy │ ├── mon_MB_01 │ │ └── mbm_local_bytes │ └── mon_MB_02 │ └── mbm_local_bytes ├── mon_groups ├── schemata ├── size └── tasks
Below, we describe each key file and its role in resource control.
A resctrl group can be created only in the root directory. Nesting is not supported. This is a key difference from cgroups. To create a group, make a new directory under the resctrl mount point using mkdir.
mkdir /sys/fs/resctrl/p0
The L3 field represents the Last Level Cache (LLC) bitmask. For example, ffff divides the cache into16 equal portions, with each bit corresponding to 1/16 of the cache. The L3 bitmask must be contiguous and meet the minimum continuous length requirement specified in the info directory.
# Allocate 3/4 of L3 cache of numa node 0 and 1 for p0 group echo "L3:0=0fff;1=0fff" > /sys/fs/resctrl/p0/schemata
Memory bandwidth allocation is expressed as a percentage. For example, 100 means full bandwidth. Smaller values restrict a group’s available memory throughput.
# limit the memory bandwidth to 50% echo "MB:0=50;1=50" > /sys/fs/resctrl/p0/schemata
Assign CPUs by writing a bitmask to the cpus file or a range to the cpus_list file under the group directory (e.g., p0).
# add core 10 to 20 the the p0 group echo "10-20" > /sys/fs/resctrl/p0/cpus_list
If a CPU is already assigned to a different non-root resctrl group, it is automatically removed from that group when reassigned. CPUs released from a group are returned to the root resctrl group. When CPUs are reassigned, all monitoring groups (mon_groups) under the current resctrl group have their CPU lists cleared.
By default, new tasks belong to the root resctrl group. Each task can belong to only one resctrl group. Binding a task to a non-root group applies that group’s resource allocation policy.
echo <pid> > /sys/fs/resctrl/p0/tasks
1. Create a resource partition group and bind CPU cores 0-10 to it.
2. Run stress-ng on cores 0-10 and check mon_data. Without limits, stress-ng should fully utilize the SLC cache.
3. Apply an L3 cache portion limit to the group.
4. Check mon_data again to confirm that SLC cache usage has decreased.
$ cd /sys/fs/resctrl $ mkdir p0 && cd p0 # put core 0 to 10 to the p0 group $ echo "0-10" > cpus_list $ cat schemata MB:1=100;2=100 L3:1=ffff;2=ffff # Run the cache intensive workload on core 0 to 10, and see the SLC cache usage on node0 $ numactl -m 0 -N 0 taskset -c 0-10 stress-ng --cache 10 --aggressive $ cat mon_data/mon_L3_01/llc_occupancy 115977280 # reduce the SLC cache portion for p0 group and then see the SLC cache usage $ echo "L3:1=ff;2=ff" > schemata $ cat mon_data/mon_L3_01/llc_occupancy 56918632 # 56918632/115977280=0.49077398607, it is 50% of the total SLC cache which is alien with our setting
To verify that the SLC partitioning is works correctly, use the lat_mem_rd memory latency benchmark from lmbench.This tool measures read latency while gradually increasing the working set size.
In theory, read latency should spike when the working set exceeds the SLC capacity. At that point, data must be fetched from main memory. By setting different SLC cache partitions, you can observe distinct latency curves. Smaller allocations cause the latency increase to occur earlier. Be cautious, the CMC prefetch would influence the lat_mem_rd, so disable it before testing.
Our test results confirm this behavior. The one-way SLC allocation showed an earlier latency jump. The 16-way allocation maintained low latency until much larger working set sizes.
We use STREAM to measure memory bandwidth:
1. Create a resource partition group and bind cores 0–10 to it.
2. Run STREAM on cores 0–10 and read the steam output. Without limits, STREAM should fully saturate the available memory bandwidth.
3. Apply a memory bandwidth limit to the group.
4. Re-run STREAM and confirm that the reported bandwidth decreases.
# recovery the setting for p0 # we use stream(https://github.com/jeffhammond/STREAM.git) to get the memory bandwidth # check the memory bandwidth of core 0 to 10 $ numactl -m 0 -N 0 taskset -c 0-10 ./stream_openmp ... Function Best Rate MB/s Avg time Min time Max time Copy: 476414.0 0.008310 0.008060 0.020359 Scale: 470018.6 0.008485 0.008170 0.031410 Add: 482046.2 0.012295 0.011949 0.029877 Triad: 478523.0 0.012297 0.012037 0.022810 ... # the bandwidth is roughly 475GB/s # reduce the BW portion for p0 group and then see the BW $ echo "MB:1=15;2=15" > schemata $ numactl -m 0 -N 0 taskset -c 0-10 ./stream_openmp ... Function Best Rate MB/s Avg time Min time Max time Copy: 82787.4 0.046935 0.046384 0.047483 Scale: 83205.3 0.046706 0.046151 0.047387 Add: 67912.5 0.085259 0.084815 0.085782 Triad: 67826.1 0.085414 0.084923 0.086275 ... # the bandwidth is roughly 75GB/s # 75/475=0.157 # We see the MPAM memory bandwidth portion works
To simulate a noisy neighbor scenario, we run both a measurement workload and a competing workload:
We split the CPU cores of NUMA node 0 into two equal groups, each assigned to a separate resctrl group. STREAM is bound to the first group, stress-ng to the second.
We then dynamically adjust the memory bandwidth allocations. Increasing bandwidth for the first group while decreasing it for the second. The STREAM results change accordingly, showing that MPAM enforces bandwidth isolation between workloads.
In conclusion, enabling and validating MPAM on Ubuntu requires careful alignment between hardware capabilities, firmware configuration, and kernel support. By using the resctrl interface, MPAM provides a practical and unified way to manage CPU cache and memory bandwidth resources. This ensures performance isolation and predictability in multi-tenant or mixed-workload environments.
The verification results demonstrate that, when configured correctly, MPAM can be a powerful tool for fine-grained resource control. It is valuable for performance tuning, workload consolidation, and real-time application scenarios on Arm platforms.