In a previous post, we used litmus7 to generate executable binaries and run litmus tests on existing hardware. These executable binaries run as standalone applications in Exception Level 0 (EL0).
litmus7
Recently, we developed litmus tests to exercise and demonstrate the ordering semantics of the Virtual Memory System Architecture. Such tests assume full control of the translation tables and how the Memory Management Unit (MMU) handles virtual addresses translation and protection. This functionality is typically controlled by the operating system that runs in Exception Level 1 (EL1). To run such litmus tests on existing hardware, just like operating systems, we need to able to generate and run executable binaries in EL1.
EL1
To achieve this, we extended litmus7 and added support for building binaries that run in EL1. More specifically, litmus7 in -mode kvm generates source code that compiles and links with kvm-unit-tests [1]. kvm-unit-tests, as its name indicates, is a collection of unit tests for the Kernel-based Virtual Machine (KVM) [2]. For litmus7, kvm-unit-tests provides the code necessary to boot, initialise and fully control the system. By doing that, we can run tests that modify Translation Table Descriptors, raise MMU faults and report or even handle such faults.
-mode kvm
To run litmus tests, litmus7 in -mode kvm relies on hardware virtualisation, a feature implemented by many Arm CPUs and supported by most operating systems. In this post, we explain how we use litmus7 on Linux based operating system with Kernel-based Virtual Machine (KVM) to build and run tests.
The following instructions have been tested on Ubuntu 22.04.2 LTS. However, the concepts are similar for other Linux distributions.
sudo apt install build-essential qemu-system-arm
kvm
sudo usermod -a -G kvm ${USER}
kvm-unit-tests
./run_tests.sh
As a first example, we take our traditional MP litmus test and run it with -mode kvm.
MP
mkdir litmus-tests ~/bin/litmus7 -mach kvm-aarch64 -variant precise -o litmus-tests mp.litmus
make
MP.flat
cd litmus-tests && make && cd ..
sh ./litmus-tests/run.sh
This will produce the output:
Wed 21 Oct 11:56:20 UTC 2020 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Results for MP.litmus % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% AArch64 MP "PodWW Rfe PodRR Fre" {0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x;} P0 | P1 ; MOV W0,#1 | LDR W0,[X1] ; STR W0,[X1] | LDR W2,[X3] ; MOV W2,#1 | ; STR W2,[X3] | ; exists (1:X0=1 /\ 1:X2=0) Generated assembler #START _litmus_P0 nop mov w3,#1 str w3,[x0] mov w2,#1 str w2,[x1] nop #START _litmus_P1 nop ldr w4,[x1] ldr w5,[x0] nop /usr/bin/qemu-system-aarch64 -nodefaults -machine virt,gic-version=host,accel=kvm -cpu host -device virtio-serial-device -device virtconsole,chardev=ctd -chardev testdev,id=ctd -device pci-testdev -display none -serial stdio -kernel MP/MP.flat -smp 4 -append -q # -initrd /tmp/tmp.g5R4XsS1BN Test MP Allowed Histogram (4 states) 910548:>1:X0=0; 1:X2=0; 56482 :>1:X0=0; 1:X2=1; 38531 *>1:X0=1; 1:X2=0; 994439:>1:X0=1; 1:X2=1; Ok Witnesses Positive: 38531, Negative: 1961469 Condition exists (1:X0=1 /\ 1:X2=0) is validated Hash=211d5b298572012a0869d4ded6a40b7f Cycle=Rfe PodRR Fre PodWW Generator=diycross7 (version 7.54+01(dev)) Com=Rf Fr Orig=PodWW Rfe PodRR Fre Observation MP Sometimes 38531 1961469 Time MP 3.75 Revision 8a99be9348124b52f116a26453a3cf3da5c5fb59, version 7.56+02~dev Command line: /home/niknik01/workspace/mm/herdtools7/_build/litmus/litmus.native -mach kvm-aarch64 -variant precise -o mp mp.litmus Parameters #define SIZE_OF_TEST 5000 #define NUMBER_OF_RUN 200 #define AVAIL 4 /* gcc options: -Wall -std=gnu99 -std=gnu99 -ffreestanding -I $(SRCDIR)/lib -I $(SRCDIR)/libfdt -Wall -Werror -fomit-frame-pointer -Wno-frame-address -fno-pic -no-pie -Wmissing-parameter-type -Wold-style-declaration -Woverride-init -O2 $(call optional-ccopt, -mno-outline-atomics) -pthread */ /* barrier: userfence */ /* launch: changing */ /* affinity: none */ /* alloc: dynamic */ /* memory: direct */ /* stride: 1 */ /* safer: write */ /* preload: random */ /* speedcheck: no */ /* proc used: 4 */ LITMUSOPTS= Wed 21 Oct 11:56:24 UTC 2020
As previously, litmus7 prints the litmus test, the generated assembly code, the command used to run the binary
/usr/bin/qemu-system-aarch64 -nodefaults -machine virt,gic-version=host,accel=kvm -cpu host -device virtio-serial-device -device virtconsole,chardev=ctd -chardev testdev,id=ctd -device pci-testdev -display none -serial stdio -kernel ./litmus-tests/MP.flat -smp 4 -append -q # -initrd /tmp/tmp.yjTA8uEN6k
and the final state of all observed executions of the MP litmus test as before:
Test MP Allowed Histogram (4 states) 1023984:>1:X0=0; 1:X2=0; 8226 :>1:X0=0; 1:X2=1; 6604 *>1:X0=1; 1:X2=0; 961186:>1:X0=1; 1:X2=1; Ok Witnesses Positive: 6604, Negative: 1993396 Condition exists (1:X0=1 /\ 1:X2=0) is validated
In the previous section, we run the traditional MP test, using litmus7 in -mode kvm. In this section, we use a new variant of MP which uses features available only to software running in EL1. First, we give an example of such a litmus test, and explain the new syntax.
Consider the following test:
AArch64 MP-pte { [PTE(x)]=(valid:0); [y]=1; 0:X1=(oa:PA(y),valid:1); 0:X2=PTE(x); 0:X8=z; 1:X8=z; 1:X3=x; } P0 | P1; STR X1,[X2] | LDR W7,[X8] ; MOV W7,#1 | L0:LDR W4,[X3]; STR W7,[X8] | ; locations [1:X4;] exists(1:X7=1 /\ fault(P1:L0,x))
There are several new points worth noting:
MP-pte
[PTE(x)]=(valid:0,oa:PA(x))
x
valid
0
PA(x)
1:X7=1 /\ fault(P1:L0,x)
X7
P1
1
L0
In summary, the test MP-pte is asking if the following execution can be happen:
PTE(x)
y
P0
PA(y)
z
While executing the load of location x, if P1 read from the initial value of PTE(x) it will raise fault because initially PTE(x) had valid:0.
valid:0
We can now run the MP-pte litmus test in the same way as MP. Assuming this test is stored in a file MP-pte.litmus:
MP-pte.litmus
~/bin/litmus7 -mach kvm-aarch64 -variant fatal MP-pte.litmus
After executing the test in a VM, we will again see a summary of the final states that were observed.
Tue 19 Sep 2023 12:23:24 BST %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Results for ../catflap/tests/vmsa/MP-pte.litmus % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% AArch64 MP-pte { [PTE(x)]=(oa:PA(x), valid:0); [y]=1; pteval_t 0:X1=(oa:PA(y)); 0:X2=PTE(x); 0:X8=z; 1:X8=z; 1:X3=x; } P0 | P1 ; STR X1,[X2] | LDR W7,[X8] ; MOV W7,#1 | L0: LDR W4,[X3] ; STR W7,[X8] | ; locations [1:X4;] exists (1:X7=1 /\ fault(P1:L0,x)) Generated assembler #START _litmus_P0 nop str x1,[x2] mov w0,#1 str w0,[x3] nop #START _litmus_P1 nop ldr w1,[x3] 1: ldr w0,[x2] nop /opt/homebrew/bin/qemu-system-aarch64 -nodefaults -machine virt -accel hvf -cpu host -device virtio-serial-device -device virtconsole,chardev=ctd -chardev testdev,id=ctd -device pci-testdev -display none -serial stdio -kernel t/./MP-pte.flat -smp 4 -append -q # -initrd /var/folders/cl/7m7_nx7j0yn5tnsnwydtglcr0000gn/T/tmp.s1ggNsAc Test MP-pte Allowed Histogram (4 states) 12672 :>1:X4=1; 1:X7=1; ~fault(P1:L0,x); 1669479:>1:X4=0; 1:X7=0; fault(P1:L0,x,MMU:Translation); 317847:>1:X4=1; 1:X7=0; ~fault(P1:L0,x); 2 *>1:X4=0; 1:X7=1; fault(P1:L0,x,MMU:Translation); Ok Witnesses Positive: 2, Negative: 1999998 Condition exists (1:X7=1 /\ fault(P1:L0,x)) is validated Hash=0effe3c5c9512116cdc3770032f132fd Observation MP-pte Sometimes 2 1999998 Topology 2 :> part=0 [[0],[1]] Faults MP-pte 1669481 P1:1669481 Time MP-pte 6.65 EXIT: STATUS=1 Revision 7ea1aaf8e1e99e8007d4670667648037624dae1f, version 7.56+03 Command line: litmus7 -a 4 -variant fatal,self -mach kvm-armv8.1 -kinds /Users/niknik01/workspace/mm/catflap/tests/vmsa.kinds -o /Users/niknik01/workspace/mm/kvm-unit-tests/t ../catflap/tests/vmsa/MP-pte.litmus Parameters #define SIZE_OF_TEST 5000 #define NUMBER_OF_RUN 200 #define AVAIL 4 /* gcc options: -Wall -std=gnu99 -std=gnu99 -ffreestanding -I $(SRCDIR)/lib -I $(SRCDIR)/libfdt -Wall -Werror -fomit-frame-pointer -Wno-frame-address -fno-pic -no-pie -Wmissing-parameter-type -Wold-style-declaration -Woverride-init -O2 $(call optional-ccopt, -mno-outline-atomics) -march=armv8.1-a */ /* barrier: userfence */ /* alloc: static */ /* proc used: 4 */ LITMUSOPTS= Tue 19 Sep 2023 12:23:31 BST
In general a Translation Table Descriptor such as PTE(x) is a tuple (oa, valid, af, db, dbm, attrs), where:
(oa, valid, af, db, dbm, attrs)
oa
af
db
dbm
attrs
The af, db and dbm bits behave as follows.
TCR_EL1.{HA,HD}=={0,0}
TCR_EL1.HA==1
TCR_EL1.{HA,HD}=={1,1}
For stage 1 translation, the dirty bit in our litmus tests corresponds to the AP[2] bit of the Translation Table Descriptor.
The default value for PTE(x) is (oa:PA(x), valid:1, af:1, db:1, dbm:0, attrs:(Normal,Inner-shareable,Inner-write-back,Outer-write-back)). When not explicitly initialised, PTE(x) will have its default value. If certain fields are omitted (for example, in the initial state or the exists clause), they are assumed to have their default values.
(oa:PA(x), valid:1, af:1, db:1, dbm:0, attrs:(Normal,Inner-shareable,Inner-write-back,Outer-write-back))
When running tests with litmus7, the directive TTHM at the beginning of the test enables hardware management of the accesss flag bit and/or the dirty bit. For example, in the presence of TTHM=HD, will check if FEAT_HAFDBS is implemented and set TCR_EL1.{HD} to 1, if the underlying hardware does not implement FEAT_HAFDBS, it cannot run this test, an error message will be printed and the execution will stop.
TTHM
TTHM=HD
TCR_EL1.{HD}
Now, consider the following test:
AArch64 LDR-AF TTHM=HA { int x=1; [PTE(x)]=(af:0); 0:X2=x; } P0 ; L0: LDR W0,[X2] ; forall 0:X0=1 /\ [PTE(x)]=(af:1) /\ ~fault(P0:L0,x)
Upon execution of this test, we check that the underlying hardware implements FEAT_HAFDBS. Then we initialise TCR_EL1.HA to 1, x to 1, and the access flag af of PTE(x) to 0 (since the other fields are omitted, they will be set to their default values; PTE(x) will be valid and its output address oa will be PA(x)).
TCR_EL1.HA
P0 performs a load of x.
The post-condition checks that at the end of all executions (see keyword “forall”):
“forall”
X0
As briefly mentioned before, the tuple PTE(x) has the field attrs which controls the memory attributes for accesses to memory locations.
In VMSAv8-64 a memory location can be mapped as normal memory or device memory. A memory location in normal memory can have the following attributes:
Write-back
Write-through
Non-cacheable
Which are specified separately for caches in the inner-shareability domain and the outer-shareability domain. As an example, a location in normal memory can be Inner-write-through and Outer-write-back.
Inner-write-through
Outer-write-back
A memory location in device memory is always Non-cacheable and, in addition, it can have one of the following attributes:
Device-nGnRnE
Device-nGnRE
Device-nGRE
Device-GRE
For further details on the memory attributes, please refer to the Arm Architecture Reference Manual for A-profile architecture [4].
As an example, take this litmus test:
AArch64 MP-xWB-yWT { PTE(x)=(attrs:(Inner-shareable,Inner-write-back,Outer-write-back)); PTE(y)=(attrs:(Inner-shareable,Inner-write-through,Outer-write-through)); 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; } P0 | P1 ; MOV W0,#1 | LDR W0,[X1] ; STR W0,[X1] | LDR W2,[X3] ; MOV W2,#1 | ; STR W2,[X3] | ; exists (1:X0=1 /\ 1:X2=0)
MP-xWB-yWT is very similar to the MP test we used before, the location is x is mapped as a Normal, Inner-Shareable, Inner-write-back and Outer-write-back memory location which is the default memory attributes, however, y is mapped as a Normal, Inner-Shareable, Inner-write-back and Outer-write-back memory location.
MP-xWB-yWT
litmus7 generates the code to initialise the memory attributes of PTE(x) and PTE(y) before executing the test. We can use the same steps as before to run the MP-xWB-yWT test:
PTE(y)
Tue 19 Sep 2023 11:57:58 BST %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Results for ../../misc/blog/MP-xWB-yWT.litmus % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% AArch64 MP-xWB-yWT { [PTE(x)]=(oa:PA(x), Inner-shareable, Inner-write-back, Outer-write-back); [PTE(y)]=(oa:PA(y), Inner-shareable, Inner-write-through, Outer-write-through); 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; } P0 | P1 ; MOV W0,#1 | LDR W0,[X1] ; STR W0,[X1] | LDR W2,[X3] ; MOV W2,#1 | ; STR W2,[X3] | ; exists (1:X0=1 /\ 1:X2=0) Generated assembler #START _litmus_P0 nop mov w0,#1 str w0,[x2] mov w1,#1 str w1,[x3] nop #START _litmus_P1 nop ldr w0,[x2] ldr w1,[x3] nop /opt/homebrew/bin/qemu-system-aarch64 -nodefaults -machine virt -accel hvf -cpu host -device virtio-serial-device -device virtconsole,chardev=ctd -chardev testdev,id=ctd -device pci-testdev -display none -serial stdio -kernel t/./MP-xWB-yWT.flat -smp 4 -append -q # -initrd /var/folders/cl/7m7_nx7j0yn5tnsnwydtglcr0000gn/T/tmp.tsyINJEb Test MP-xWB-yWT Allowed Histogram (4 states) 55877 :>1:X0=0; 1:X2=1; 38767 *>1:X0=1; 1:X2=0; 1112085:>1:X0=0; 1:X2=0; 793271:>1:X0=1; 1:X2=1; Ok Witnesses Positive: 38767, Negative: 1961233 Condition exists (1:X0=1 /\ 1:X2=0) is validated Hash=62d5771f8f693f5ddae49b9f14aa6dca Observation MP-xWB-yWT Sometimes 38767 1961233 Topology 38767 :> part=0 [[0],[1]] Faults MP-xWB-yWT 1041965 P0:1041965 Time MP-xWB-yWT 5.68 EXIT: STATUS=1 Revision 0216f2eeedc4e0a313b4b778a465b3a61a9dfde9, version 7.56+03 Command line: litmus7 -a 4 -variant fatal,self -mach kvm-armv8.1 -kinds /Users/niknik01/workspace/mm/catflap/tests/vmsa.kinds -o /Users/niknik01/workspace/mm/kvm-unit-tests/t ../../misc/blog/MP-xWB-yWT.litmus Parameters #define SIZE_OF_TEST 5000 #define NUMBER_OF_RUN 200 #define AVAIL 4 /* gcc options: -Wall -std=gnu99 -std=gnu99 -ffreestanding -I $(SRCDIR)/lib -I $(SRCDIR)/libfdt -Wall -Werror -fomit-frame-pointer -Wno-frame-address -fno-pic -no-pie -Wmissing-parameter-type -Wold-style-declaration -Woverride-init -O2 $(call optional-ccopt, -mno-outline-atomics) -march=armv8.1-a */ /* barrier: userfence */ /* alloc: static */ /* proc used: 4 */ LITMUSOPTS= Tue 19 Sep 2023 11:58:05 BST
At this point, it is worth noting that when running such tests using KVM or other hypervisors, one has to consider the effects of stage2 translation. For more information on the semantics of memory attributes at stage1 and stage2 translation please refer to the Arm Architecture Reference Manual for A-profile architecture [4].
We thank Mark Rutland for the description of FEAT_HAFDBS.
[1] https://github.com/herd/herdtools7/tree/KVM
[2] https://www.linux-kvm.org/page/KVM-unit-tests
[3] https://www.linux-kvm.org/page/Main_Page
[4] https://developer.arm.com/documentation/ddi0487/latest/