Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Arm Research
    • DesignStart
    • Education Hub
    • Innovation
    • Open Source Software and Platforms
  • Forums
    • AI and ML forum
    • Architectures and Processors forum
    • Arm Development Platforms forum
    • Arm Development Studio forum
    • Arm Virtual Hardware forum
    • Automotive forum
    • Compilers and Libraries forum
    • Graphics, Gaming, and VR forum
    • High Performance Computing (HPC) forum
    • Infrastructure Solutions forum
    • Internet of Things (IoT) forum
    • Keil forum
    • Morello Forum
    • Operating Systems forum
    • SoC Design and Simulation forum
    • 中文社区论区
  • Blogs
    • AI and ML blog
    • Announcements
    • Architectures and Processors blog
    • Automotive blog
    • Graphics, Gaming, and VR blog
    • High Performance Computing (HPC) blog
    • Infrastructure Solutions blog
    • Innovation blog
    • Internet of Things (IoT) blog
    • Mobile blog
    • Operating Systems blog
    • Research Articles
    • SoC Design and Simulation blog
    • Smart Homes
    • Tools, Software and IDEs blog
    • Works on Arm blog
    • 中文社区博客
  • Support
    • Open a support case
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • Community Help
  • More
  • Cancel
Arm Community blogs
Arm Community blogs
Architectures and Processors blog Expanding the Memory Model Tools to System-level architecture
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI and ML blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded blog

  • Graphics, Gaming, and VR blog

  • High Performance Computing (HPC) blog

  • Infrastructure Solutions blog

  • Internet of Things (IoT) blog

  • Operating Systems blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
  • Architecture
  • Memory Architecture
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Expanding the Memory Model Tools to System-level architecture

Jade Alglave
Jade Alglave
October 26, 2020
Please note: This blog post was written in tandem with Nikos Nikoleris.

In a previous post, we used litmus7 to generate source code and run litmus tests on hardware. However, in certain cases, a user might need to control more than just that the instructions that execute within a standalone application.

In addition to generating application-level source code as before, we can use litmus7 to control several aspects of the system software. For example, of particular interest to the memory model are the implications of address translation to execution of applications. In such cases, litmus7 can generate tests with full control of the translation tables and how the Memory Management Unit handles virtual addresses translation and protection. Such functionality requires access to instructions and memory that are typically available to the operating system that runs in a higher Exception Level (EL1).

The litmus7 tool can also generate the sources for a binary that runs as a Virtual Machine (VM). As a result, litmus7 can control the code that executes EL0 as well as in EL1 and control services such as memory allocation and therefore virtual address translation.

To run litmus tests in a VM, litmus7 builds on kvm-unit-tests [1]. The kvm-unit-tests project was originally intended to test the Kernel-based Virtual Machine (KVM) [2] features, and litmus7 uses its basic functionality to encapsulate litmus tests into small operating systems. By doing that, we can control and make changes to functionality that typically would be provided by an operating system running in EL1. For example, the test sets up and control translation tables, the Memory Management Unit (MMU) and any device that we make available to the VM.

Prerequisites

The following instructions have been tested on Ubuntu 18.04. However, the concepts are similar for other Linux distributions and operating systems with support for Virtual Machines.

  1. Install the KVM branch of herdtools7:

https://github.com/herd/herdtools7/tree/KVM

  1. Install KVM setup

You need to have essential build tools installed as well as QEMU. In Ubuntu 18.04 you can install these dependencies by executing:

sudo apt install build-essential qemu-system-arm

Also your user needs to have permissions to run VMs using KVM and typically that can be achieved by making sure that your user is a member of the kvm Unix group:

sudo usermod -a -G kvm ${USER}

First, we first need to download a copy of the kvm-unit-tests and configure it in our system. We can fetch the latest copy of kvm-unit-tests using the following link:
https://gitlab.com/kvm-unit-tests/kvm-unit-tests/-/archive/master/kvm-unit-tests-master.tar.gz

And extract the files from the compressed tarball:

tar xzvf kvm-unit-tests-master.tar.gz && cd kvm-unit-tests-master

Finally, we need to build the source code in kvm-unit-tests:

./configure && make

To test that kvm-unit-tests work in our system, we can simply run:

./run_tests.sh

Which will run the tests which are included by default in kvm-unit-tests. If at least one of them was run successfully we know that our setup is working correctly, and we can move on to next part which is running litmus tests using KVM.

Using litmus7 to run tests inside a VM

To begin with, we take our traditional MP litmus test and run it inside a VM. We use litmus7 to generate the sources for MP inside a folder in kvm-unit-tests:

mkdir litmus-tests
~/bin/litmus7 -mach kvm-aarch64 -variant precise -o litmus-tests mp.litmus

The litmus7 tool has now produced the sources for the litmus test inside the folder “litmus-tests”. We will then use make and the generated Makefile to produce a binary that we can then run inside a VM:

cd litmus-tests && make && cd ..

In this example, this binary is stored in the file “MP.flat” and resides inside the folder “litmus-tests”. Then we can run the MP inside a VM using the command:

sh ./litmus-tests/run.sh

Which gives the output:

Wed 21 Oct 11:56:20 UTC 2020
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Results for MP.litmus %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
AArch64 MP
"PodWW Rfe PodRR Fre"

{0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x;}

 P0          | P1          ;
 MOV W0,#1   | LDR W0,[X1] ;
 STR W0,[X1] | LDR W2,[X3] ;
 MOV W2,#1   |             ;
 STR W2,[X3] |             ;

exists (1:X0=1 /\ 1:X2=0)
Generated assembler
#START _litmus_P0
	nop
	mov w3,#1
	str w3,[x0]
	mov w2,#1
	str w2,[x1]
	nop
#START _litmus_P1
	nop
	ldr w4,[x1]
	ldr w5,[x0]
	nop
/usr/bin/qemu-system-aarch64 -nodefaults -machine virt,gic-version=host,accel=kvm -cpu host -device virtio-serial-device -device virtconsole,chardev=ctd -chardev testdev,id=ctd -device pci-testdev -display none -serial stdio -kernel MP/MP.flat -smp 4 -append -q # -initrd /tmp/tmp.g5R4XsS1BN
Test MP Allowed
Histogram (4 states)
910548:>1:X0=0; 1:X2=0;
56482 :>1:X0=0; 1:X2=1;
38531 *>1:X0=1; 1:X2=0;
994439:>1:X0=1; 1:X2=1;
Ok

Witnesses
Positive: 38531, Negative: 1961469
Condition exists (1:X0=1 /\ 1:X2=0) is validated
Hash=211d5b298572012a0869d4ded6a40b7f
Cycle=Rfe PodRR Fre PodWW
Generator=diycross7 (version 7.54+01(dev))
Com=Rf Fr
Orig=PodWW Rfe PodRR Fre
Observation MP Sometimes 38531 1961469
Time MP 3.75
Revision 8a99be9348124b52f116a26453a3cf3da5c5fb59, version 7.56+02~dev
Command line: /home/niknik01/workspace/mm/herdtools7/_build/litmus/litmus.native -mach kvm-aarch64 -variant precise -o mp mp.litmus
Parameters
#define SIZE_OF_TEST 5000
#define NUMBER_OF_RUN 200
#define AVAIL 4
/* gcc options: -Wall -std=gnu99 -std=gnu99 -ffreestanding -I $(SRCDIR)/lib -I $(SRCDIR)/libfdt -Wall -Werror  -fomit-frame-pointer -Wno-frame-address   -fno-pic  -no-pie -Wmissing-parameter-type -Wold-style-declaration -Woverride-init -O2 $(call optional-ccopt, -mno-outline-atomics) -pthread */
/* barrier: userfence */
/* launch: changing */
/* affinity: none */
/* alloc: dynamic */
/* memory: direct */
/* stride: 1 */
/* safer: write */
/* preload: random */
/* speedcheck: no */
/* proc used: 4 */
LITMUSOPTS=
Wed 21 Oct 11:56:24 UTC 2020

As previously, the generated output, includes the test itself, the generated assembly code, followed by the command used to run the VM:

/usr/bin/qemu-system-aarch64 -nodefaults -machine virt,gic-version=host,accel=kvm -cpu host -device virtio-serial-device -device virtconsole,chardev=ctd -chardev testdev,id=ctd -device pci-testdev -display none -serial stdio -kernel ./litmus-tests/MP.flat -smp 4 -append -q # -initrd /tmp/tmp.yjTA8uEN6k

And the final states of all executions of the MP litmus test as before:

Test MP Allowed
Histogram (4 states)
1023984:>1:X0=0; 1:X2=0;
8226  :>1:X0=0; 1:X2=1;
6604  *>1:X0=1; 1:X2=0;
961186:>1:X0=1; 1:X2=1;
Ok

Witnesses
Positive: 6604, Negative: 1993396
Condition exists (1:X0=1 /\ 1:X2=0) is validated

So far, we have only run the MP test again, using a different litmus7 mode. But now we can move on to running tests which do make interesting use of our new capability to run code in EL1. We will first look at how the litmus test syntax is extended to ask questions about scenarios relative to address translation and page tables.

Extended litmus tests syntax – first step

Consider the following variant of the message passing idiom:

AArch64 MP-pte
{ 
pte_x=(valid:0,oa:phy_x);
0:X2=pte_x;
0:X1=(valid:1,oa:phy_y);
1:X3=x;  
y=1;      
0:X8=z; 1:X8=z;
}       
P0          | P1;
STR X1,[X2] | LDR W7,[X8]   ;
MOV W7,#1   | L0:LDR W4,[X3];
STR W7,[X8] |               ;
locations [1:X4;]
exists(1:X7=1 /\ fault(P1:L0,x))

There are several new points worth noting:

  • The initial state can now refer to a new type of value, which corresponds to page table entries. For example, in the initial state for MP-pte above, we have: pte_x = (valid:0,oa:phy_x). In words, this says that the page table entry for x is initialized so that its valid bit is 0, and so that it points to a physical address phy_x
  • In the final clause, we can now specify whether an access is expected to fault. We can do this using labels in the body of the test (see label L0 on the previous thread P1). For example, in the final clause for MP-pte above, we have: fault(P1:L0,x) In words, this asks whether the access at line L0 on thread P1 can provoke a fault relative to address x.

In summary, the test MP-pte is describing the following scenario:

  • Starting from an initial state where the page table entry pte_x (which by default points to the physical address phy_x) is invalid (because the valid bit is set to 0), and where y holds the value 1
  • A thread P0 updates the page table entry pte_x, making it not only valid but also pointing to a different physical address phy_y, and sets up a flag z to 1
  • A thread P1 reads the flag, and reads x

What is P1 to see? In particular, can it be that P1 reads the initial value for pte_x? If that is the case, then the access at L0 on P1 will fault, because the initial value for pte_x is invalid.

Using litmus7 to run the MP-pte test

We can now run our MP-pte litmus test using the previous instructions, or by calling litmus7 directly. Assuming this test is in the file MP-pte.litmus:

~/bin/litmus7 -mach kvm-aarch64 -variant precise MP-pte.litmus

After executing the test in a VM, we will again see a summary of the final states that were observed.

Wed Oct 21 10:19:09 UTC 2020
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Results for MP-pte.litmus %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
AArch64 MP-pte

{pteval_t pte_x=(oa:phy_x, af:1, db:1, dbm:0, valid:0); 0:X2=pte_x; pteval_t 0:X1=(oa:phy_y, af:1, db:1, dbm:0, valid:1); 1:X3=x; y=1; 0:X8=z; 1:X8=z;}

 P0          | P1              ;
 STR X1,[X2] | LDR W7,[X8]     ;
 MOV W7,#1   | L0: LDR W4,[X3] ;
 STR W7,[X8] |                 ;

locations [1:X4;]
exists (1:X7=1 /\ fault(P1:L0,x))
Generated assembler
#START _litmus_P0
        nop
        str x2,[x1]
        mov w3,#1
        str w3,[x0]
        nop
#START _litmus_P1
        nop
        ldr w5,[x1]
0:
        ldr w4,[x0]
        nop
/usr/bin/qemu-system-aarch64 -nodefaults -machine virt,gic-version=host,accel=kvm -cpu host -device virtio-serial-device -device virtconsole,chardev=ctd -chardev testdev,id=ctd -device pci-testdev -display none -serial stdio -kernel J/./MP-pte.flat -smp 4 -append -q -r 20k -s 500 # -initrd /tmp/tmp.OCQCGXnK5v
Test MP-pte Allowed
Histogram (4 states)
332   *>1:X4=0; 1:X7=1; fault(P1:L0,x);
13634195:>1:X4=1; 1:X7=0;
2172395:>1:X4=0; 1:X7=0; fault(P1:L0,x);
4193078:>1:X4=1; 1:X7=1;
Ok

Witnesses
Positive: 332, Negative: 19999668
Condition exists (1:X7=1 /\ fault(P1:L0,x)) is validated
Hash=0effe3c5c9512116cdc3770032f132fd
Observation MP-pte Sometimes 332 19999668
Faults MP-pte 2172727 P1:2172727
Time MP-pte 57.45
Revision 8a99be9348124b52f116a26453a3cf3da5c5fb59, version 7.56+02~dev
Command line: /opt/KVM/bin/litmus7 -mach kvm-aarch64 -variant precise -o J.tar MP-pte.litmus
Parameters
#define SIZE_OF_TEST 5000
#define NUMBER_OF_RUN 200
#define AVAIL 4
/* gcc options: -Wall -std=gnu99 -std=gnu99 -ffreestanding -I $(SRCDIR)/lib -I $(SRCDIR)/libfdt -Wall -Werror  -fomit-frame-pointer -Wno-frame-address   -fno-pic  -no-pie -Wmissing-parameter-type -Wold-style-declaration -Woverride-init -O2 $(call optional-ccopt, -mno-outline-atomics) -pthread */
/* barrier: userfence */
/* launch: changing */
/* affinity: none */
/* alloc: dynamic */
/* memory: direct */
/* stride: 1 */
/* safer: write */
/* preload: random */
/* speedcheck: no */
/* proc used: 4 */
LITMUSOPTS=-r 20k -s 500
Wed Oct 21 10:20:08 UTC 2020

Extended litmus test syntax – Access flag, dirty bit and hardware management

In general a page table entry value such as pte_x is a tuple (oa, valid, af, db, dbm), where:

  • oa is the “output address”
  • valid is the valid bit
  • af is the Access Flag
  • db is the Dirty Bit
  • dbm is the Dirty Bit Management bit

Those af, db and dbm bits behave as follows.

  • Without hardware management (on older CPUs, or when ARMv8.1-TTHM is implemented yet TCR_ELx.{HA,HD} == {0,0}):
  • A load or store to x where pte_x has the access flag af set to 0 raises a permission fault;
  • A store to x where pte_x has the dirty bit db set to 0 raises a permission fault. Software is expected to deal with this by updating the translation tables with explicit stores or atomics.
  • With hardware management (that is, when ARMv8.1-TTHM is implemented) where TCR_ELx.HA = 1:
  • A load or store to x where pte_x has the access flag af set to 0 results in the Memory Management Unit (MMU) updating the translation table entry to set the access flag af to 1, and continuing without a fault;
  • A store where pte_x has the dirty bit db set to 0 raises a permission fault.
  • With hardware management (that is, when ARMv8.1-TTHM is implemented) where TCR_ELx.{HA,HD} == {1,1}:
  • A load or store to x where pte_x has the access flag af set to 0 results in the MMU:
    • updating the translation table entry to set the access flag af to 1,
    • and continuing without a fault;
  • A store to x where pte_x has the dirty bit db set to 0 and also has the dirty bit management dbm bit set to 0 raises a permission fault;
  • A store to x where pte_x has the dirty bit db set to 0 and has dbm set to 1 results in the MMU:
    • updating the translation table entry to set the dirty bit db to 1,
    • and continuing without a fault.

The dirty bit corresponds to hardware-level write permissions in page table entries. Hence in the simpler stage 1 case, we have AP[2] == 0b1 for clean, and AP[2] == 0b0 for dirty. The default value for pte_x is (oa:phy_x, valid:1, af:1, db:1, dbm:0). If certain fields are omitted (for example in the initial state or the exists clause), they are assumed to have their default values. We can now specify whether ARMv8.1-TTHM is implemented, and whether TCR_ELx.{HA, HD} are equal to 1, with metadata at the top of the test. By default, none of those are equal to 1.

For example, consider the following test:

AArch64 LDR-AF
TTHM=HA
{
int x=1;
pte_x = (af:0);
0:X2=x;
}
P0            ;
L0: LDR   W0,[X2] ;
forall 0:X0=1 /\ pte_x=(af:1) /\ ~fault(P0:L0,x)

Here we have a test which assumes that ARMv8.1-TTHM is implemented, and states that TCR_ELx.HA is equal to 1. The initial value for x is 1, and initially the page table entry for x has the Access Flag af set to 0 (since the other values are omitted, this entails that pte_x is valid and points to phy_x).

At line L0 the thread P0 does a load of x.

In the final clause, we state that it must always be the case (see keyword “forall”) that:

  • The register X0 holds the value 1
  • The page table entry for x now has the Access Flag af set to 1
  • There was no fault at line L0 relative to address x.

Extended litmus test syntax – Cacheability attributes

We can also manage attributes that determine Cacheability for memory locations in litmus tests.

In VMSAv8-64 a memory location can be mapped as normal memory or device memory. A memory location in normal memory can have the following attributes:

  • Write-back
  • Write-through
  • Non-cacheable

Which are specified separately for caches in the inner-shareability domain of the PE and the outer Shareability domain of the PE. As an example, a location in normal memory can be Inner Write-through and Outer Write-back.

A memory location in device memory is always Non-cacheable but in addition it can have the following attributes:

  • nGnRnE: non-Gathering, non-Reordering, No Early write Acknowledgement.
  • nGnRE: non-Gathering, non-Reordering, Early write Acknowledgement.
  • nGRE: non-Gathering, Reordering, Early write Acknowledgement.
  • GRE: Gathering, Reordering, Early write Acknowledgement.

Please note this is not an exhaustive and detailed explanation of the Cacheability attributes. For further details, please refer to the ARMv8-A Architecture Reference Manual [4].

By default, litmus7 uses Normal, Inner write-back and Outer write-back for all memory locations. However, we can alter the Cacheability attributes for each location at will. For example, we can change the Cacheability attributes for the locations x and y in the MP litmus test:

AArch64 mp-xWB-yWT
{
0 :X1=x ; 0 :X3=y ;
1 :X1=y ; 1 :X3=x ;
}
P0          | P1          ;
MOV W0,#1   | LDR W0,[X1] ;
STR W0,[X1] | LDR W2,[X3] ;
MOV W2,#1   |             ;
STR W2,[X3] |             ;
regions: x: normal inner-write-back outer-write-back, y: normal inner-write-through outer-write-through
exists (1:X0=1 /\ 1:X2=0)

The previous litmus test is similar to the traditional MP litmus with the addition of the regions section:

regions: x: normal inner-write-back outer-write-back, y: normal inner-write-through outer-write-through

In the regions section, we have specified memory attributes for x and y. In this example, x is mapped as Normal Memory, Inner Write-back and Outer Write-back and y is mapped as Normal Memory, Inner Write-through and Outer-Write-through.

The litmus7 tool generates the code to set these Cacheability attributes before executing the test. We can use the same flow to run the MP-xWB-yWT test:

Wed 21 Oct 12:25:50 BST 2020
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Results for mp-xWB-yWT.litmus %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
AArch64 mp-xWB-yWT

{0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x;}

 P0          | P1          ;
 MOV W0,#1   | LDR W0,[X1] ;
 STR W0,[X1] | LDR W2,[X3] ;
 MOV W2,#1   |             ;
 STR W2,[X3] |             ;

exists (1:X0=1 /\ 1:X2=0)
Generated assembler
#START _litmus_P0
	nop
	mov w3,#1
	str w3,[x0]
	mov w2,#1
	str w2,[x1]
	nop
#START _litmus_P1
	nop
	ldr w4,[x1]
	ldr w5,[x0]
	nop
/usr/bin/qemu-system-aarch64 -nodefaults -machine virt,gic-version=host,accel=kvm -cpu host -device virtio-serial-device -device virtconsole,chardev=ctd -chardev testdev,id=ctd -device pci-testdev -display none -serial stdio -kernel ./mp-xWB-yWT/mp-xWB-yWT.flat -smp 4 -append -q # -initrd /tmp/tmp.Df13ZMOkLP
Test mp-xWB-yWT Allowed
Histogram (4 states)
972907:>1:X0=0; 1:X2=0;
7     :>1:X0=0; 1:X2=1;
673912*>1:X0=1; 1:X2=0;
353174:>1:X0=1; 1:X2=1;
Ok

Witnesses
Positive: 673912, Negative: 1326088
Condition exists (1:X0=1 /\ 1:X2=0) is validated
Hash=211d5b298572012a0869d4ded6a40b7f
Observation mp-xWB-yWT Sometimes 673912 1326088
Time mp-xWB-yWT 10.18
Revision 2e1f66a892506e3027ac381048928efe3020e120, version 7.56+02~dev
Command line: /home/niknik01/workspace/mm/herdtools7/_build/litmus/litmus.native -mach kvm-aarch64 -variant precise -o mp-xWB-yWT mp-xWB-yWT.litmus
Parameters
#define SIZE_OF_TEST 5000
#define NUMBER_OF_RUN 200
#define AVAIL 4
/* gcc options: -Wall -std=gnu99 -std=gnu99 -ffreestanding -I $(SRCDIR)/lib -I $(SRCDIR)/libfdt -Wall -Werror  -fomit-frame-pointer -Wno-frame-address   -fno-pic  -no-pie -Wmissing-parameter-type -Wold-style-declaration -Woverride-init -O2 $(call optional-ccopt, -mno-outline-atomics) -pthread */
/* barrier: userfence */
/* launch: changing */
/* affinity: none */
/* alloc: dynamic */
/* memory: direct */
/* stride: 1 */
/* safer: write */
/* preload: random */
/* speedcheck: no */
/* proc used: 4 */
LITMUSOPTS=
Wed 21 Oct 12:26:02 BST 2020

Acknowledgments

We thank Mark Rutland for the description of TTHM.

[1] https://github.com/herd/herdtools7/tree/KVM

[2] https://www.linux-kvm.org/page/KVM-unit-tests

[3] https://www.linux-kvm.org/page/Main_Page

[4] https://developer.arm.com/documentation/ddi0487/latest/

Anonymous
Architectures and Processors blog
  • A closer look at Arm A-profile support for non-maskable interrupts

    Christoffer Dall
    Christoffer Dall
    Arm is adding support in both the CPU and Generic Interrupt Controller (GIC) architecture for NMIs. But what is an NMI? how does operating systems software use these features?
    • May 23, 2022
  • Arm announces Ampere Mt Jade as first certification for the SystemReady LS band

    Samer El-Haj-Mahmoud
    Samer El-Haj-Mahmoud
    The Arm SystemReady compliance certification program has reached a significant milestone today with the Arm SystemReady LS certification of the Ampere Altra Arm-based Mt. Jade server.
    • May 17, 2022
  • Arm SystemReady certifications reach 50 including the Morello System Development Platform

    Dong Wei
    Dong Wei
    The Arm SystemReady program has reached a significant milestone with over 50 certifications since its launch only two years ago.
    • May 9, 2022