Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Architectures and Processors blog Expanding the Memory Model Tools to System-level architecture
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
  • Memory Model Tool
  • Architecture
  • Memory Architecture
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Expanding the Memory Model Tools to System-level architecture

Jade Alglave
Jade Alglave
October 26, 2020
11 minute read time.
Please note: This blog post was written in tandem with Nikos Nikoleris.

In a previous post, we used litmus7 to generate executable binaries and run litmus tests on existing hardware. These executable binaries run as standalone applications in Exception Level 0 (EL0).

Recently, we developed litmus tests to exercise and demonstrate the ordering semantics of the Virtual Memory System Architecture. Such tests assume full control of the translation tables and how the Memory Management Unit (MMU) handles virtual addresses translation and protection. This functionality is typically controlled by the operating system that runs in Exception Level 1 (EL1). To run such litmus tests on existing hardware, just like operating systems, we need to able to generate and run executable binaries in EL1.

To achieve this, we extended litmus7 and added support for building binaries that run in EL1. More specifically, litmus7 in -mode kvm generates source code that compiles and links with kvm-unit-tests [1].  kvm-unit-tests, as its name indicates, is a collection of unit tests for the Kernel-based Virtual Machine (KVM) [2]. For litmus7, kvm-unit-tests provides the code necessary to boot, initialise and fully control the system. By doing that, we can run tests that modify Translation Table Descriptors, raise MMU faults and report or even handle such faults.

To run litmus tests, litmus7 in -mode kvm relies on hardware virtualisation, a feature implemented by many Arm CPUs and supported by most operating systems. In this post, we explain how we use litmus7 on Linux based operating system with Kernel-based Virtual Machine (KVM) to build and run tests.

Prerequisites

The following instructions have been tested on Ubuntu 22.04.2 LTS. However, the concepts are similar for other Linux distributions.

  1. Install the packages build-essentials and qemu-system-arm:

    sudo apt install build-essential qemu-system-arm

  2. Make sure your user has permissions to use KVM, typically, this is achieved by adding the user to the kvm Unix group:
    sudo usermod -a -G kvm ${USER}
  3. Download a copy of the kvm-unit-tests and build it.
    • Fetch the latest copy of kvm-unit-tests from: https://gitlab.com/kvm-unit-tests/kvm-unit-tests.
    • Follow the instructions in Readme.md to build the source code.
    To test that kvm-unit-tests works in your system, run:
    ./run_tests.sh
  4. Install litmus7.
    • Download a copy of herdtools7, and
    • Follow the instructions in INSTALL.md (section "Source build").

Using litmus7 in -mode kvm to run tests

As a first example, we take our traditional MP litmus test and run it with -mode kvm.

  1. Use litmus7 to generate the sources for MP inside a folder in kvm-unit-tests:
    mkdir litmus-tests
    ~/bin/litmus7 -mach kvm-aarch64 -variant precise -o litmus-tests mp.litmus
    
      
    litmus7 has generated the source code for the litmus test inside the folder “litmus-tests”.
  2. Run make to generate the binary MP.flat
    cd litmus-tests && make && cd ..
  3. Run the MP litmus test:
    sh ./litmus-tests/run.sh

This will produce the output:

Wed 21 Oct 11:56:20 UTC 2020
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Results for MP.litmus %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
AArch64 MP
"PodWW Rfe PodRR Fre"

{0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x;}

 P0          | P1          ;
 MOV W0,#1   | LDR W0,[X1] ;
 STR W0,[X1] | LDR W2,[X3] ;
 MOV W2,#1   |             ;
 STR W2,[X3] |             ;

exists (1:X0=1 /\ 1:X2=0)
Generated assembler
#START _litmus_P0
	nop
	mov w3,#1
	str w3,[x0]
	mov w2,#1
	str w2,[x1]
	nop
#START _litmus_P1
	nop
	ldr w4,[x1]
	ldr w5,[x0]
	nop
/usr/bin/qemu-system-aarch64 -nodefaults -machine virt,gic-version=host,accel=kvm -cpu host -device virtio-serial-device -device virtconsole,chardev=ctd -chardev testdev,id=ctd -device pci-testdev -display none -serial stdio -kernel MP/MP.flat -smp 4 -append -q # -initrd /tmp/tmp.g5R4XsS1BN
Test MP Allowed
Histogram (4 states)
910548:>1:X0=0; 1:X2=0;
56482 :>1:X0=0; 1:X2=1;
38531 *>1:X0=1; 1:X2=0;
994439:>1:X0=1; 1:X2=1;
Ok

Witnesses
Positive: 38531, Negative: 1961469
Condition exists (1:X0=1 /\ 1:X2=0) is validated
Hash=211d5b298572012a0869d4ded6a40b7f
Cycle=Rfe PodRR Fre PodWW
Generator=diycross7 (version 7.54+01(dev))
Com=Rf Fr
Orig=PodWW Rfe PodRR Fre
Observation MP Sometimes 38531 1961469
Time MP 3.75
Revision 8a99be9348124b52f116a26453a3cf3da5c5fb59, version 7.56+02~dev
Command line: /home/niknik01/workspace/mm/herdtools7/_build/litmus/litmus.native -mach kvm-aarch64 -variant precise -o mp mp.litmus
Parameters
#define SIZE_OF_TEST 5000
#define NUMBER_OF_RUN 200
#define AVAIL 4
/* gcc options: -Wall -std=gnu99 -std=gnu99 -ffreestanding -I $(SRCDIR)/lib -I $(SRCDIR)/libfdt -Wall -Werror  -fomit-frame-pointer -Wno-frame-address   -fno-pic  -no-pie -Wmissing-parameter-type -Wold-style-declaration -Woverride-init -O2 $(call optional-ccopt, -mno-outline-atomics) -pthread */
/* barrier: userfence */
/* launch: changing */
/* affinity: none */
/* alloc: dynamic */
/* memory: direct */
/* stride: 1 */
/* safer: write */
/* preload: random */
/* speedcheck: no */
/* proc used: 4 */
LITMUSOPTS=
Wed 21 Oct 11:56:24 UTC 2020

As previously, litmus7 prints the litmus test, the generated assembly code, the command used to run the binary

/usr/bin/qemu-system-aarch64 -nodefaults -machine virt,gic-version=host,accel=kvm -cpu host -device virtio-serial-device -device virtconsole,chardev=ctd -chardev testdev,id=ctd -device pci-testdev -display none -serial stdio -kernel ./litmus-tests/MP.flat -smp 4 -append -q # -initrd /tmp/tmp.yjTA8uEN6k

and the final state of all observed executions of the MP litmus test as before:

Test MP Allowed
Histogram (4 states)
1023984:>1:X0=0; 1:X2=0;
8226  :>1:X0=0; 1:X2=1;
6604  *>1:X0=1; 1:X2=0;
961186:>1:X0=1; 1:X2=1;
Ok

Witnesses
Positive: 6604, Negative: 1993396
Condition exists (1:X0=1 /\ 1:X2=0) is validated

Extended litmus tests syntax – first step

In the previous section, we run the traditional MP test, using litmus7 in -mode kvm. In this section, we use a new variant of MP which uses features available only to software running in EL1. First, we give an example of such a litmus test, and explain the new syntax.

Consider the following test:

AArch64 MP-pte
{ 
 [PTE(x)]=(valid:0);
 [y]=1;
 0:X1=(oa:PA(y),valid:1); 0:X2=PTE(x);
 0:X8=z; 1:X8=z;
 1:X3=x;
}       
 P0          | P1;
 STR X1,[X2] | LDR W7,[X8]   ;
 MOV W7,#1   | L0:LDR W4,[X3];
 STR W7,[X8] |               ;
locations [1:X4;]
exists(1:X7=1 /\ fault(P1:L0,x))

There are several new points worth noting:

  • The initial state references a new type of symbolic location which corresponds to a Translation Table Descriptor. For example, in the initial state for MP-pte, we have: [PTE(x)]=(valid:0,oa:PA(x)). In words, this says that the Translation Table Descriptor for location x is initialised such that its valid bit is 0, and the output address is physical address PA(x)
  • The post-condition 1:X7=1 /\ fault(P1:L0,x), asks if there are any executions where the register X7 of P1 had the value 1 and the execution of the instruction with the label L0 in P1 raised a fault due to a memory access to location x.

In summary, the test MP-pte is asking if the following execution can be happen:

  • We start with the Translation Table Descriptor PTE(x) being invalid (because the valid bit is set to 0), and the location y set to 1
  • A thread P0 updates the Translation Table Descriptor PTE(x), setting the valid bit and changing its output address to PA(y), and sets the flag z to 1
  • A thread P1 reads the flag z and then executes a load to location x

While executing the load of location x, if P1 read from the initial value of PTE(x) it will raise fault because initially PTE(x) had valid:0.

Using litmus7 to run the MP-pte test

We can now run the MP-pte litmus test in the same way as MP. Assuming this test is stored in a file MP-pte.litmus:

~/bin/litmus7 -mach kvm-aarch64 -variant fatal MP-pte.litmus

After executing the test in a VM, we will again see a summary of the final states that were observed.

Tue 19 Sep 2023 12:23:24 BST
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Results for ../catflap/tests/vmsa/MP-pte.litmus %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
AArch64 MP-pte

{
 [PTE(x)]=(oa:PA(x), valid:0);
 [y]=1;
 pteval_t 0:X1=(oa:PA(y)); 0:X2=PTE(x); 0:X8=z;
 1:X8=z; 1:X3=x;
}
 P0          | P1              ;
 STR X1,[X2] | LDR W7,[X8]     ;
 MOV W7,#1   | L0: LDR W4,[X3] ;
 STR W7,[X8] |                 ;

locations [1:X4;]
exists (1:X7=1 /\ fault(P1:L0,x))
Generated assembler
#START _litmus_P0
	nop
	str x1,[x2]
	mov w0,#1
	str w0,[x3]
	nop
#START _litmus_P1
	nop
	ldr w1,[x3]
1:
	ldr w0,[x2]
	nop
/opt/homebrew/bin/qemu-system-aarch64 -nodefaults -machine virt -accel hvf -cpu host -device virtio-serial-device -device virtconsole,chardev=ctd -chardev testdev,id=ctd -device pci-testdev -display none -serial stdio -kernel t/./MP-pte.flat -smp 4 -append -q # -initrd /var/folders/cl/7m7_nx7j0yn5tnsnwydtglcr0000gn/T/tmp.s1ggNsAc
Test MP-pte Allowed
Histogram (4 states)
12672 :>1:X4=1; 1:X7=1; ~fault(P1:L0,x);
1669479:>1:X4=0; 1:X7=0; fault(P1:L0,x,MMU:Translation);
317847:>1:X4=1; 1:X7=0; ~fault(P1:L0,x);
2     *>1:X4=0; 1:X7=1; fault(P1:L0,x,MMU:Translation);
Ok

Witnesses
Positive: 2, Negative: 1999998
Condition exists (1:X7=1 /\ fault(P1:L0,x)) is validated
Hash=0effe3c5c9512116cdc3770032f132fd
Observation MP-pte Sometimes 2 1999998
Topology 2     :> part=0 [[0],[1]]
Faults MP-pte 1669481 P1:1669481
Time MP-pte 6.65

EXIT: STATUS=1
Revision 7ea1aaf8e1e99e8007d4670667648037624dae1f, version 7.56+03
Command line: litmus7 -a 4 -variant fatal,self -mach kvm-armv8.1 -kinds /Users/niknik01/workspace/mm/catflap/tests/vmsa.kinds -o /Users/niknik01/workspace/mm/kvm-unit-tests/t ../catflap/tests/vmsa/MP-pte.litmus
Parameters
#define SIZE_OF_TEST 5000
#define NUMBER_OF_RUN 200
#define AVAIL 4
/* gcc options: -Wall -std=gnu99 -std=gnu99 -ffreestanding -I $(SRCDIR)/lib -I $(SRCDIR)/libfdt -Wall -Werror  -fomit-frame-pointer -Wno-frame-address   -fno-pic  -no-pie -Wmissing-parameter-type -Wold-style-declaration -Woverride-init -O2 $(call optional-ccopt, -mno-outline-atomics) -march=armv8.1-a */
/* barrier: userfence */
/* alloc: static */
/* proc used: 4 */
LITMUSOPTS=
Tue 19 Sep 2023 12:23:31 BST

Extended litmus test syntax – Access flag, dirty bit and hardware management

In general a Translation Table Descriptor such as PTE(x) is a tuple (oa, valid, af, db, dbm, attrs), where:

  • oa is the output address
  • valid is the valid bit
  • af is the access flag
  • db is the dirty bit
  • dbm is the dirty bit management bit
  • attrs is a tuple of the memory attributes

The af, db and dbm bits behave as follows.

  • When FEAT_HAFDBS is not implemented or when TCR_EL1.{HA,HD}=={0,0}:
    • When PTE(x) has the access flag bit af set to 0, a load or store to x raises an Access Flag fault.
    • When PTE(x) has the dirty bit db set to 0, a store to x raises a Permission fault.
    In these cases, software is expected to deal with these faults by updating the Translation Table Descriptor with explicit stores or atomics.
  • When FEAT_HAFDBS is implemented and TCR_EL1.HA==1:
    • When PTE(x) has the access flag af set to 0, a load or store to x results in the MMU setting the access flag af to 1 of PTE(x).
    • When PTE(x) has the dirty bit db set to 0, a store to x raises a Permission fault.
  • When FEAT_HAFDBS is implemented and TCR_EL1.{HA,HD}=={1,1}:
    • When PTE(x) has the access flag af set to 0, a load or store to x results in:
      1. the MMU setting the access flag af of PTE(x) to 1, and
      2. and the execution continuing to the next instruction without a fault.
    • When PTE(x) has the dirty bit db set to 0 and the dirty bit management dbm bit set to 0, a store to x raises a Permission fault.
    • When PTE(x) has the dirty bit db set to 0 and the dirty bit management dbm bit set to 1, a store to x results in:
      1. the MMU setting the dirty bit db of PTE(x) to 1, and
      2. and the execution continuing to the next instruction without a fault.

For stage 1 translation, the dirty bit in our litmus tests corresponds to the AP[2] bit of the Translation Table Descriptor.

The default value for PTE(x) is (oa:PA(x), valid:1, af:1, db:1, dbm:0, attrs:(Normal,Inner-shareable,Inner-write-back,Outer-write-back)). When not explicitly initialised, PTE(x) will have its default value. If certain fields are omitted (for example, in the initial state or the exists clause), they are assumed to have their default values.

When running tests with litmus7, the directive TTHM at the beginning of the test enables hardware management of the accesss flag bit and/or the dirty bit. For example, in the presence of TTHM=HD, will check if FEAT_HAFDBS is implemented and set TCR_EL1.{HD} to 1, if the underlying hardware does not implement FEAT_HAFDBS, it cannot run this test, an error message will be printed and the execution will stop.

Now, consider the following test:

AArch64 LDR-AF
TTHM=HA
{
 int x=1;
 [PTE(x)]=(af:0);
 0:X2=x;
}
P0              ;
L0: LDR W0,[X2] ;
forall 0:X0=1 /\ [PTE(x)]=(af:1) /\ ~fault(P0:L0,x)

Upon execution of this test, we check that the underlying hardware implements FEAT_HAFDBS. Then we initialise TCR_EL1.HA to 1, x to 1, and the access flag af of PTE(x) to 0 (since the other fields are omitted, they will be set to their default values; PTE(x) will be valid and its output address oa will be PA(x)).

P0 performs a load of x.

The post-condition checks that at the end of all executions (see keyword “forall”):

  • The register X0 of P0 is 1
  • PTE(x) has the access flag af set to 1
  • There was no fault at label L0 raised by accessing location x.

Extended litmus test syntax – Memory attributes

As briefly mentioned before, the tuple PTE(x) has the field attrs which controls the memory attributes for accesses to memory locations.

In VMSAv8-64 a memory location can be mapped as normal memory or device memory. A memory location in normal memory can have the following attributes:

  • Write-back
  • Write-through
  • Non-cacheable

Which are specified separately for caches in the inner-shareability domain and the outer-shareability domain. As an example, a location in normal memory can be Inner-write-through and Outer-write-back.

A memory location in device memory is always Non-cacheable and, in addition, it can have one of the following attributes:

  • Device-nGnRnE: non-Gathering, non-Reordering, No Early write Acknowledgement.
  • Device-nGnRE: non-Gathering, non-Reordering, Early write Acknowledgement.
  • Device-nGRE: non-Gathering, Reordering, Early write Acknowledgement.
  • Device-GRE: Gathering, Reordering, Early write Acknowledgement.

For further details on the memory attributes, please refer to the Arm Architecture Reference Manual for A-profile architecture [4].

As an example, take this litmus test:
 

AArch64 MP-xWB-yWT
{
 PTE(x)=(attrs:(Inner-shareable,Inner-write-back,Outer-write-back));
 PTE(y)=(attrs:(Inner-shareable,Inner-write-through,Outer-write-through));
 0:X1=x; 0:X3=y;
 1:X1=y; 1:X3=x;
}
 P0          | P1          ;
 MOV W0,#1   | LDR W0,[X1] ;
 STR W0,[X1] | LDR W2,[X3] ;
 MOV W2,#1   |             ;
 STR W2,[X3] |             ;
exists (1:X0=1 /\ 1:X2=0)

MP-xWB-yWT is very similar to the MP test we used before, the location is x is mapped as a Normal, Inner-Shareable, Inner-write-back and Outer-write-back memory location which is the default memory attributes, however, y is mapped as a Normal, Inner-Shareable, Inner-write-back and Outer-write-back memory location.

litmus7 generates the code to initialise the memory attributes of PTE(x) and PTE(y) before executing the test. We can use the same steps as before to run the MP-xWB-yWT test:

Tue 19 Sep 2023 11:57:58 BST
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Results for ../../misc/blog/MP-xWB-yWT.litmus %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
AArch64 MP-xWB-yWT

{
 [PTE(x)]=(oa:PA(x), Inner-shareable, Inner-write-back, Outer-write-back);
 [PTE(y)]=(oa:PA(y), Inner-shareable, Inner-write-through, Outer-write-through);
 0:X1=x; 0:X3=y;
 1:X1=y; 1:X3=x;
}
 P0          | P1          ;
 MOV W0,#1   | LDR W0,[X1] ;
 STR W0,[X1] | LDR W2,[X3] ;
 MOV W2,#1   |             ;
 STR W2,[X3] |             ;

exists (1:X0=1 /\ 1:X2=0)
Generated assembler
#START _litmus_P0
	nop
	mov w0,#1
	str w0,[x2]
	mov w1,#1
	str w1,[x3]
	nop
#START _litmus_P1
	nop
	ldr w0,[x2]
	ldr w1,[x3]
	nop
/opt/homebrew/bin/qemu-system-aarch64 -nodefaults -machine virt -accel hvf -cpu host -device virtio-serial-device -device virtconsole,chardev=ctd -chardev testdev,id=ctd -device pci-testdev -display none -serial stdio -kernel t/./MP-xWB-yWT.flat -smp 4 -append -q # -initrd /var/folders/cl/7m7_nx7j0yn5tnsnwydtglcr0000gn/T/tmp.tsyINJEb
Test MP-xWB-yWT Allowed
Histogram (4 states)
55877 :>1:X0=0; 1:X2=1;
38767 *>1:X0=1; 1:X2=0;
1112085:>1:X0=0; 1:X2=0;
793271:>1:X0=1; 1:X2=1;
Ok

Witnesses
Positive: 38767, Negative: 1961233
Condition exists (1:X0=1 /\ 1:X2=0) is validated
Hash=62d5771f8f693f5ddae49b9f14aa6dca
Observation MP-xWB-yWT Sometimes 38767 1961233
Topology 38767 :> part=0 [[0],[1]]
Faults MP-xWB-yWT 1041965 P0:1041965
Time MP-xWB-yWT 5.68

EXIT: STATUS=1
Revision 0216f2eeedc4e0a313b4b778a465b3a61a9dfde9, version 7.56+03
Command line: litmus7 -a 4 -variant fatal,self -mach kvm-armv8.1 -kinds /Users/niknik01/workspace/mm/catflap/tests/vmsa.kinds -o /Users/niknik01/workspace/mm/kvm-unit-tests/t ../../misc/blog/MP-xWB-yWT.litmus
Parameters
#define SIZE_OF_TEST 5000
#define NUMBER_OF_RUN 200
#define AVAIL 4
/* gcc options: -Wall -std=gnu99 -std=gnu99 -ffreestanding -I $(SRCDIR)/lib -I $(SRCDIR)/libfdt -Wall -Werror  -fomit-frame-pointer -Wno-frame-address   -fno-pic  -no-pie -Wmissing-parameter-type -Wold-style-declaration -Woverride-init -O2 $(call optional-ccopt, -mno-outline-atomics) -march=armv8.1-a */
/* barrier: userfence */
/* alloc: static */
/* proc used: 4 */
LITMUSOPTS=
Tue 19 Sep 2023 11:58:05 BST

At this point, it is worth noting that when running such tests using KVM or other hypervisors, one has to consider the effects of stage2 translation. For more information on the semantics of memory attributes at stage1 and stage2 translation please refer to the Arm Architecture Reference Manual for A-profile architecture [4].

Acknowledgments

We thank Mark Rutland for the description of FEAT_HAFDBS.

[1] https://github.com/herd/herdtools7/tree/KVM

[2] https://www.linux-kvm.org/page/KVM-unit-tests

[3] https://www.linux-kvm.org/page/Main_Page

[4] https://developer.arm.com/documentation/ddi0487/latest/

Anonymous
Architectures and Processors blog
  • Introducing GICv5: Scalable and secure interrupt management for Arm

    Christoffer Dall
    Christoffer Dall
    Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
    • April 28, 2025
  • Getting started with AARCHMRS Features.json using Python

    Joh
    Joh
    A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
    • April 8, 2025
  • Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

    Samer El-Haj-Mahmoud
    Samer El-Haj-Mahmoud
    Arm and 9elements Cyber Security have brought a prototype of OpenBMC to the Arm Neoverse Compute Subsystem (CSS) to advancing server manageability.
    • January 28, 2025