In previous installments of this series of blogs, we learned how we can use herd7 to simulate litmus tests and diy7 to automate the generation of litmus tests. In this post, we are going to learn how to run litmus tests on hardware using litmus7.
litmus7
Running litmus tests on hardware allows verification engineers, for example, to test whether their implementation follows the Arm architecture, more precisely the Arm memory model.
After downloading and installing the herd+diy tool suite [1] (see our previous blogs for instructions), we can start using the litmus7 tool. We start with the simple MP litmus test that we previously generated using diy7.
herd+diy
AArch64 MP { 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; } P0 | P1 ; MOV W0,#1 | LDR W0,[X1] ; STR W0,[X1] | LDR W2,[X3] ; MOV W2,#1 | ; STR W2,[X3] | ; exists (1:X0=1 /\ 1:X2=0)
We assume that this litmus test is in a file called “mp.litmus”. To run the litmus test in the file mp.litmus, we execute the command:
“mp.litmus”
mp.litmus
~/bin/litmus7 mp.litmus
In our terminal, litmus7 now prints a lot of information about this run. First, we see the input test mp.litmus what we specified in the command line.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Results for mp.litmus % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% AArch64 MP "PodWW Rfe PodRR Fre" {0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x;} P0 | P1 ; MOV W0,#1 | LDR W0,[X1] ; STR W0,[X1] | LDR W2,[X3] ; MOV W2,#1 | ; STR W2,[X3] | ; exists (1:X0=1 /\ 1:X2=0)
Then, litmus7 prints the assembly it generated for the test. This code corresponds to the code in the input file, but is in a form that the assembler can read and use to produce a binary:
Generated assembler #START _litmus_P1 ldr w4,[x1] ldr w5,[x2] #START _litmus_P0 mov w7,#1 str w7,[x0] mov w6,#1 str w6,[x2]
What follows in our screen is the observed outcomes form this run. For many users, this is the most interesting part.
Test MP Allowed Histogram (4 states) 500141:>1:X0=0; 1:X2=0; 53 *>1:X0=1; 1:X2=0; 482 :>1:X0=0; 1:X2=1; 499324:>1:X0=1; 1:X2=1; Ok Witnesses Positive: 53, Negative: 999947 Condition exists (1:X0=1 /\ 1:X2=0) is validated
The litmus7 tool ran the test one million times and recorded the outcomes.
Since the condition in mp.litmus involves two registers X0 and X2 of P1, there are four possible final states. In the histogram, litmus7 reports the observed final states and how many times each of them was observed. This is given by the number to the left of the “>” sign on each line.
X0
X2
P1
Also, litmus7 summarizes the results, reporting the overall number of times the final state of the execution validated the condition (Positive) and the number of times it did not (Negative). If there was at least one Positive execution, it reports that the condition “is validated”.
Positive
Negative
“is validated”
In addition, and after reporting information about the observed states, litmus7 will also output more information about the test itself, as well as information about the hardware and the parameters of this run.
Hash=211d5b298572012a0869d4ded6a40b7f Cycle=Rfe PodRR Fre PodWW Generator=diycross7 (version 7.54+01(dev)) Com=Rf Fr Orig=PodWW Rfe PodRR Fre Observation MP Sometimes 423 999577 Time MP 0.17 Machine:kylo processor : 0 BogoMIPS : 100.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x1 CPU part : 0xd0c CPU revision : 0 processor : 1 BogoMIPS : 100.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x1 CPU part : 0xd0c CPU revision : 0 processor : 2 BogoMIPS : 100.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x1 CPU part : 0xd0c CPU revision : 0 processor : 3 BogoMIPS : 100.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x1 CPU part : 0xd0c CPU revision : 0 Revision 8a99be9348124b52f116a26453a3cf3da5c5fb59, version 7.56+02~dev Command line: ~/bin/litmus.native mp.litmus Parameters #define SIZE_OF_TEST 100000 #define NUMBER_OF_RUN 10 #define AVAIL 1 #define STRIDE (-1) #define MAX_LOOP 0 /* gcc options: -Wall -std=gnu99 -O2 -pthread */ /* barrier: user */ /* launch: changing */ /* affinity: none */ /* alloc: dynamic */ /* memory: direct */ /* safer: write */ /* preload: random */ /* speedcheck: no */
To understand how litmus7 runs the test first, we need to understand the test itself. A litmus test typically contains four distinct sections. The test begins with the header:
AArch64 MP
The header contains the target architecture, followed by the name of the test. When using diyone7 to generate the litmus test, there may also be some metadata about, for example, the version of tool. The litmus7 tool ignores this part.
diyone7
In the following section, a litmus test contains the initialization list:
{ 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; }
Here, we specify the initial state (for example, registers or memory locations) before the execution of the test. In the example above,
X1
x
X3
y
In addition, litmus7 will initialize all used memory locations to 0 unless specified otherwise.
In the next section, a litmus test contains the assembly code of the test.
P0 | P1 ; MOV W0,#1 | LDR W0,[X1] ; STR W0,[X1] | LDR W2,[X3] ; MOV W2,#1 | ; STR W2,[X3] | ;
The litmus7 tool uses this code in the code generation in a form that is used by the C compiler. In the example above, we have two processes P0 and P1, followed by the instructions that each process will execute. Based on this code and the initialization list, litmus7 allocates the necessary memory and execute the code in the test. By default, litmus7 executes the test one million times.
P0
Finally, the litmus test contains a condition:
exists (1:X0=1 /\ 1:X2=0)
The condition includes predicates on the state of each process after executing the test. The litmus7 tool records the state at the end of each execution of the test and compare it against it to determine if any of them validates the condition.
In addition to running a test, we can also use litmus7 to generate source code. This source code can be used to compile a standalone application that encapsulates the test. This can be useful when we want to run tests on a different computer where the diy+herd toolsuite may not be installed.
diy+herd
To do that we use a command similar to the one we used before:
~/bin/litmus7 -o mp.tar mp.litmus
The litmus7 generates the file “mp.tar”, a tarball with the source code. Then we can copy the file mp.tar to the computer we would like to run the test. To run the test, we have to extract the files in a directory of our choosing:
“mp.tar”
mkdir mp cd mp tar xvf ~/mp.tar
Compile the source code using make and the included Makefile:
Makefile
make
And run the test using the included script “run.sh”:
“run.sh”
sh ./run.sh
The results are presented in the same as way before and the outcome should be equivalent to running the test directly with litmus.
Please note that when running on different computer, litmus7 cannot identify the target operating system. By default, it assumes that the standalone programs are compiled and run on Linux. If the target operating system is different, we can instruct litmus7 to guide its source code generation, for example:
~/bin/litmus7 -ps mac -o mp.tar mp.litmus
Often, we need to run more than one test. For example, we might want to understand how adding memory barriers alters the specified condition and want to try various combinations on a machine. Adding a DMB SY instruction on P0 between the two stores, leaving P1 unchanged, would lead to a test called MP+dmb.sy+po:AArch64 MP+dmb.sy+po { 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; } P0 | P1 ; MOV W0,#1 | LDR W0,[X1] ; STR W0,[X1] | LDR W2,[X3] ; DMB SY | ; MOV W2,#1 | ; STR W2,[X3] | ; exists (1:X0=1 /\ 1:X2=0)
DMB SY
MP+dmb.sy+po:AArch64 MP+dmb.sy+po { 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; } P0 | P1 ; MOV W0,#1 | LDR W0,[X1] ; STR W0,[X1] | LDR W2,[X3] ; DMB SY | ; MOV W2,#1 | ; STR W2,[X3] | ; exists (1:X0=1 /\ 1:X2=0)
AArch64 MP+dmb.sy+po { 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; } P0 | P1 ; MOV W0,#1 | LDR W0,[X1] ; STR W0,[X1] | LDR W2,[X3] ; DMB SY | ; MOV W2,#1 | ; STR W2,[X3] | ; exists (1:X0=1 /\ 1:X2=0)
Adding a DMB SY instruction on P1 between the two loads, leaving P0 unchanged, would lead to a test called MP+po+dmb.sy:AArch64 MP+po+dmb.sy { 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; } P0 | P1 ; MOV W0,#1 | LDR W0,[X1] ; STR W0,[X1] | DMB SY ; MOV W2,#1 | LDR W2,[X3] ; STR W2,[X3] | ; exists (1:X0=1 /\ 1:X2=0)
MP+po+dmb.sy:AArch64 MP+po+dmb.sy { 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; } P0 | P1 ; MOV W0,#1 | LDR W0,[X1] ; STR W0,[X1] | DMB SY ; MOV W2,#1 | LDR W2,[X3] ; STR W2,[X3] | ; exists (1:X0=1 /\ 1:X2=0)
AArch64 MP+po+dmb.sy { 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; } P0 | P1 ; MOV W0,#1 | LDR W0,[X1] ; STR W0,[X1] | DMB SY ; MOV W2,#1 | LDR W2,[X3] ; STR W2,[X3] | ; exists (1:X0=1 /\ 1:X2=0)
Finally, adding a DMB SY instruction both P0 between the two stores and on P1 between the two loads would lead to a test called MP+dmb.sys:AArch64 MP+dmb.sys { 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; } P0 | P1 ; MOV W0,#1 | LDR W0,[X1] ; STR W0,[X1] | DMB SY ; DMB SY | LDR W2,[X3] ; MOV W2,#1 | ; STR W2,[X3] | ; exists (1:X0=1 /\ 1:X2=0)
MP+dmb.sys:AArch64 MP+dmb.sys { 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; } P0 | P1 ; MOV W0,#1 | LDR W0,[X1] ; STR W0,[X1] | DMB SY ; DMB SY | LDR W2,[X3] ; MOV W2,#1 | ; STR W2,[X3] | ; exists (1:X0=1 /\ 1:X2=0)
AArch64 MP+dmb.sys { 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; } P0 | P1 ; MOV W0,#1 | LDR W0,[X1] ; STR W0,[X1] | DMB SY ; DMB SY | LDR W2,[X3] ; MOV W2,#1 | ; STR W2,[X3] | ; exists (1:X0=1 /\ 1:X2=0)
To run all those tests all together, we can ask litmus7 to compile more than one test at a time. For example, the command will generate a tarball containing the source code:
~/bin/litmus7 -o mp.tar mp.litmus mp+dmb.sys.litmus mp+po+dmb.sy.litmus mp+dmb.sy+po.litmus
However, this time, when compiling the source code, we will generate one standalone application for each of the four input tests. When invoked, the generated run.sh script will run all four tests one after the other.
run.sh
Often, it might be desirable to control how litmus7 runs a test. The litmus7 provides a number of command-line options that can offer a way to change the parameters of a run. We give a brief explanation of the most commonly used. However, we refer to the full documentation of the tool [1] for a more detailed explanation.
One of the parameters we can control is the number of times litmus7 executes a test. By default, litmus7 performs one million executions. The total number of executions is determined by the number of repetitions and the array size that it uses for each memory location that we use in the test.
For example, for the MP test, litmus7 allocates one array of 100,000 elements for each of the memory locations in the test (array_x[100000] for x and array_y[100000] for y). In each execution, litmus7 uses an element from the corresponding array as the memory location x and y (array_x[i] as x and array_y[i] as y). It will repeat this sweep for ten times and execute the test for one million times in total.
MP
array_x[100000]
array_y[100000]
array_x[i]
array_y[i]
We can control the size of the arrays that litmus7 uses and goes through using the command-line option:
-s <n>
In addition, we can control the number of times we repeat this using the command-line option:
-r <n>
Executing the command:
sh ./run.sh -s 100000 -r 10
Or the command:
sh ./run.sh -s 10000 -r 100
Both commands result in one million executions. However, the first command executes ten iterations; in each iteration, the tool goes through an array of 100000 elements for each memory location. In the case of the second command, litmus7 executes one 100 iterations; in each iteration, it goes through an array of 100000 elements for each memory location.
In addition, we can also instruct litmus7 to perform multiple executions of the tests concurrently. To do this we can use the argument:
Where n is the total number of threads we make available to the test.
For example, the MP test requires two threads (P0 and P1); if we set -a 4, we would execute two instances of the MP test concurrently.
-a <n>
In all examples above, we used the “run.sh” scripts, however, we can use the same parameters (and more) with litmus7 when we generate sources or run the test directly. For example, if we would like to run the MP litmus test for a total of two million executions, with an array size of one million and two repetitions, we can use the command:
~/bin/litmus7 -s 1000000 -r 2 mp.litmus
Or alternatively, we generate a tarball with the source code with the command:
~/bin/litmus7 -s 1000000 -r 2 -o mp.tar mp.litmus
In this case, after extracting the source code from the generated tarball, and building the source code, we can run the test using the previous script “run.sh”. It runs by default two iterations and with arrays of one million elements.
[1] http://diy.inria.fr
[2] https://github.com/herd/herdtools7/tree/KVM
[3] https://www.linux-kvm.org/page/KVM-unit-tests
[4] https://www.linux-kvm.org/page/Main_Page
[5] https://developer.arm.com/documentation/ddi0487/latest/
For below example:sh ./run.sh -s 10000 -r 100The descriptions "In the case of the second command, litmus7 executes one 100 iterations; in each iteration, it goes through an array of 100000 elements for each memory location." have a typo, 100000 should be 10000, right?