How to generate litmus tests automatically with the diy7 tool

June 11, 2020

15 minute read time.

This blog is the second installment in a series of blogs looking at how to use the Memory Model Tool. It provides an opportunity to experiment with the model and develop an intuitive understanding of how it works. The information is useful to software developers, compiler writers, and verification engineers. To read the first installment in this blog series, go to A working example of how to use the herd7 Memory Model Tool. You can access the tool through the Arm Developer website and at the bottom of this blog.

2. Generating tests

Here we are going to learn how to automatically generate litmus tests. This is similar to the message passing one (MP) that we studied in the previous blog post.

2.0 Installing the tools

To start with, we need to install the diy tool. More precisely, we are going to install all the herd+diy toolsuite. This will give you:

The herd7 tool, which, given a cat model and a litmus test, outputs all possible executions (up to a certain bound on loop unrolling if necessary) of that program under that model;
The diy7 tool, which generates systematic families of litmus tests which highlight the discrepancies between weak consistency models and the interleaving model;
The litmus7 tool, which runs such litmus tests on hardware chips and collects the observed outcomes.

Of course we have experimented with the herd7 tool in the previous blog, with its web interface. You now get the command line version.

Please follow the instructions at http://diy.inria.fr/sources/index.html. You may also want to follow the development version from https://github.com/herd/herdtools7/tree/master.

2.1 Generating a specific test

Let us generate the message passing example. As a starting point, if you do:

~/bin/diyone7 -arch AArch64 "PodWW Rfe PodRR Fre"

You get the message passing example of the previous blog post on your terminal:

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
AArch64 A
"PodWW Rfe PodRR Fre"
Generator=diyonedev (version 7.51+4(dev))
Prefetch=0:x=F,0:y=W,1:y=F,1:x=R
Com=Rf Fr
Orig=PodWW Rfe PodRR Fre
{
0:X1=x; 0:X3=y;
1:X1=y; 1:X3=x;
}
P0          | P1          ;
MOV W0,#1   | LDR W0,[X1] ;
STR W0,[X1] | LDR W2,[X3] ;
MOV W2,#1   |             ;
STR W2,[X3] |             ;
exists (1:X0=1 /\ 1:X2=0)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

AArch64 A
"PodWW Rfe PodRR Fre"
Generator=diyonedev (version 7.51+4(dev))
Prefetch=0:x=F,0:y=W,1:y=F,1:x=R
Com=Rf Fr
Orig=PodWW Rfe PodRR Fre
{
0:X1=x; 0:X3=y;
1:X1=y; 1:X3=x;
}
P0          | P1          ;
MOV W0,#1   | LDR W0,[X1] ;
STR W0,[X1] | LDR W2,[X3] ;
MOV W2,#1   |             ;
STR W2,[X3] |             ;
exists (1:X0=1 /\ 1:X2=0)

You can name the test as you would like using a command-line option, for example:

~/bin/diyone7 -arch AArch64 "PodWW Rfe PodRR Fre" -name mp

This will not display the test on your terminal, but rather it will create a file called "mp.litmus".

2.2. Syntax used by diyone7

The tool understands the input "PodWW Rfe PodRR Fre" as a cycle of semantics relations (the ones that are displayed in the execution figures in the previous blog). More precisely, the tool takes the input string as a directed path between memory events, which ends up cycling back to its initial event. Let me explain this a bit:

PodWW means program order (Po) between two writes (WW) with different locations (d)
Rfe means read-from (Rf) between two events that belong to different threads (e stands for external)
PodRR means program order (Po) between two reads (RR) with different locations (d)
Fre means from-read (Fr) between two events that belong to different threads (e stands for external).

Recall from the previous blog that a read R is in from-read before a write W if there exists a write W’ such that: 1. R reads-from W’ and 2. W’ is in coherence order before W, that is, hit the memory before W. In other words, R is in from-read before all the writes W that overwrite the value read by R, which comes from W’.

A graphic which shows from-read.

Figure 1

So, the tool starts by reading PodWW and builds a first thread P0 which has two stores to different locations. Graphically, this corresponds to:

A diagram which shows PodWW.

Figure 2

Then the Rfe indicates that it must create a new thread P1, which starts with a read that will read from the bottom write on P0. Note that this read mandates the final value 1:X0=1. Graphically, this corresponds to:

A graphic which shows the Rfe.

Figure 3

Then PodRR indicates that the new thread, P1, has to have two reads to different locations. The first one comes from Rfe, and the second one appears at the bottom of P1. Graphically, this corresponds to:

A graphic which shows PodRR.

Figure 4

Finally, Fre closes the cycle, and mandates the final value 1:X2=0. Graphically, this corresponds to:

A graphic which shows Fre.

Figure 5

For more details on the input syntax, please see the documentation for the diy tool at http://diy.inria.fr/doc/gen.html.

2.3 Memory Barriers

Now, you might want to generate a test which should not be observed on hardware, for example to test whether that piece of hardware is respecting the architecture. For instance, let us generate the Message Passing example with barriers on each thread, so that the test we obtain is correctly synchronized as per the architecture.

The architecture states that a DMB ST will prevent two store instructions from being reordered. There are two reference points which allow you to check this:

The formal executable model given as a cat file (see http://diy.inria.fr/www/?record=aarch64)
It is English transliteration in the Arm Architecture Reference Manual (Arm ARM, see section B2.3 of https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile).

In the cat file (see http://diy.inria.fr/www/?record=aarch64), the definition of the relation bob (Barrier-ordered-before) features the following clause:

| [W] ; po; [dmb.st]; po; [W]

This can be read as follows: starting from a write ([W]), if I take one or several steps in program order (po), land on a DMB ST ([dmb.st]), then take one or several steps in program order which lands on another write ([W]), then the two extremity writes are ordered. Graphically, this corresponds to:

A graphic which shows DMB ST.

Figure 6

In the Arm Architecture Reference Manual (Arm ARM) (see https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile, section B2.3), the corresponding text reflects the formal cat clause:

RW1 is a write W1, RW2 is a write W2, and W1 appears in program order before a DMB ST that appears in program order before W2.

Similarly, a DMB LD will prevent two loads from being reordered (NB: a DMB LD will also ensure that a load-store pair will not be reordered, although we won't be needing that ordering in this example).

The cat clause (see relation bob in http://diy.inria.fr/www/?record=aarch64) is:

| [R] ; po; [dmb.ld]; po

The English transliteration in the Arm ARM (see relation Barrier-ordered-before in Section B2.3 of https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile) is:

RW1 is a read R1 and R1 appears in program order before a DMB LD that appears in program order before RW2.

Now, let us try:

~/bin/diyone7 -arch AArch64 "DMB.STdWW Rfe DMB.LDdRR Fre" -name mp+dmb.st+dmb.ld

You get the following test on your terminal:

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
AArch64 mp+dmb.st+dmb.ld
"DMB.STdWW Rfe DMB.LDdRR Fre"
Generator=diyone7 (version 7.55+01(dev))
Prefetch=0:x=F,0:y=W,1:y=F,1:x=T
Com=Rf Fr
Orig=DMB.STdWW Rfe DMB.LDdRR Fre
{
0:X1=x; 0:X3=y;
1:X1=y; 1:X3=x;
}
 P0          | P1          ;
 MOV W0,#1   | LDR W0,[X1] ;
 STR W0,[X1] | DMB LD      ;
 DMB ST      | LDR W2,[X3] ;
 MOV W2,#1   |             ;
 STR W2,[X3] |             ;
exists (1:X0=1 /\ 1:X2=0)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

AArch64 mp+dmb.st+dmb.ld
"DMB.STdWW Rfe DMB.LDdRR Fre"
Generator=diyone7 (version 7.55+01(dev))
Prefetch=0:x=F,0:y=W,1:y=F,1:x=T
Com=Rf Fr
Orig=DMB.STdWW Rfe DMB.LDdRR Fre
{
0:X1=x; 0:X3=y;
1:X1=y; 1:X3=x;
}
 P0          | P1          ;
 MOV W0,#1   | LDR W0,[X1] ;
 STR W0,[X1] | DMB LD      ;
 DMB ST      | LDR W2,[X3] ;
 MOV W2,#1   |             ;
 STR W2,[X3] |             ;
exists (1:X0=1 /\ 1:X2=0)

Observe that in the previous test we were using "PodWW" and "PodRR"; this time around we are using their siblings - "DMB.STdWW" and "DMB.LDdRR" - which will require the tool to generate the corresponding barriers on each thread. As before, DMB.LDdRR means using a DMB LD barrier (DMB.LD) between two reads (RR) with different locations (d).

2.4 Dependencies

On the reading thread P1, we can use an address dependency instead of the barrier DMB.LD. The corresponding cat clause (see relation dob in http://diy.inria.fr/www/?record=aarch64) is:

| addr | data

The English transliteration in the Arm ARM (see relation Dependency-ordered-before in Section B2.3 of https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile) is:

There is an Address dependency or a Data dependency from R1 to RW2.

Now, try:

~/bin/diyone7 -arch AArch64 "DMB.STdWW Rfe DpAddrdR Fre" -name mp+dmb.st+addr

You should get:

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
AArch64 mp+dmb.st+addr
"DMB.STdWW Rfe DpAddrdR Fre"
Generator=diyone7 (version 7.55+01(dev))
Prefetch=0:x=F,0:y=W,1:y=F,1:x=T
Com=Rf Fr
Orig=DMB.STdWW Rfe DpAddrdR Fre
{
0:X1=x; 0:X3=y;
1:X1=y; 1:X4=x;
}
 P0          | P1                  ;
 MOV W0,#1   | LDR W0,[X1]         ;
 STR W0,[X1] | EOR W2,W0,W0        ;
 DMB ST      | LDR W3,[X4,W2,SXTW] ;
 MOV W2,#1   |                     ;
 STR W2,[X3] |                     ;
exists (1:X0=1 /\ 1:X3=0)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

AArch64 mp+dmb.st+addr
"DMB.STdWW Rfe DpAddrdR Fre"
Generator=diyone7 (version 7.55+01(dev))
Prefetch=0:x=F,0:y=W,1:y=F,1:x=T
Com=Rf Fr
Orig=DMB.STdWW Rfe DpAddrdR Fre
{
0:X1=x; 0:X3=y;
1:X1=y; 1:X4=x;
}
 P0          | P1                  ;
 MOV W0,#1   | LDR W0,[X1]         ;
 STR W0,[X1] | EOR W2,W0,W0        ;
 DMB ST      | LDR W3,[X4,W2,SXTW] ;
 MOV W2,#1   |                     ;
 STR W2,[X3] |                     ;
exists (1:X0=1 /\ 1:X3=0)

Note that I have used "DpAddrdR" to require the tool to generate an address dependency; previously we have used "DMB.LDdRR". As before, DpAddrdRR means using an address dependency (DpAddr) between two reads (R) with different locations (d). Note that, unlike in the PodRR and DMB.LDdRR, we only need to specify that the second access is a read because dependencies always start with a read---hence, specifying the first one is superfluous.

2.5 Annotations

Annotations allow you to indicate which flavors of accesses you would like your test to feature: plain, acquire or release, exclusives.

If you use the “P” annotation (plain), you will get the same MP example again, because you are asking diy to give you only plain accesses:

~/bin/diyone7 -arch AArch64 "PodWW P Rfe P PodRR Fre" -name mp

This can be read as follows: we are asking the tool to generate a test where we start with a first thread featuring two writes in program order (PodWW), where the second one is plain (P). Then the tool is asked to exit the first thread with Rfe, which means it needs to create a read which will read the last write on the first thread. This read needs to be plain too (P). The second thread should feature two reads in program order with different locations (PodRR), and we close the cycle with Fre as before.

If you want acquire accesses you can use the “A” annotation, and for release accesses, the “L” annotation; for example:

~/bin/diyone7 -arch AArch64 "PodWW L Rfe A PodRR Fre" -name mp+rel+acq

This should give you a variation on the message passing example, where the write of the flag is a release write, and the read of the flag is an acquire read. In other words, this will give you the message passing example with the second store on P0 being a release and the first load on P1 being an acquire.

More precisely, this can be read as follows: we are asking the tool to generate a test where we start with a first thread featuring two writes in program order (PodWW), where the second one is release (L). Then the tool is asked to exit the first thread with Rfe, which means it needs to create a read which will read the last write on the first thread. This read needs to be an acquire (A). The second thread should feature two reads in program order with different locations (PodRR), and we close the cycle with Fre as before.

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
AArch64 mp+rel+acq
"PodWWPL RfeLA PodRRAP Fre"
Generator=diyone7 (version 7.55+01(dev))
Prefetch=0:x=F,0:y=W,1:y=F,1:x=T
Com=Rf Fr
Orig=PodWWPL RfeLA PodRRAP Fre
{
0:X1=x; 0:X3=y;
1:X1=y; 1:X3=x;
}
 P0           | P1           ;
 MOV W0,#1    | LDAR W0,[X1] ;
 STR W0,[X1]  | LDR W2,[X3]  ;
 MOV W2,#1    |              ;
 STLR W2,[X3] |              ;
exists (1:X0=1 /\ 1:X2=0)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

AArch64 mp+rel+acq
"PodWWPL RfeLA PodRRAP Fre"
Generator=diyone7 (version 7.55+01(dev))
Prefetch=0:x=F,0:y=W,1:y=F,1:x=T
Com=Rf Fr
Orig=PodWWPL RfeLA PodRRAP Fre
{
0:X1=x; 0:X3=y;
1:X1=y; 1:X3=x;
}
 P0           | P1           ;
 MOV W0,#1    | LDAR W0,[X1] ;
 STR W0,[X1]  | LDR W2,[X3]  ;
 MOV W2,#1    |              ;
 STLR W2,[X3] |              ;
exists (1:X0=1 /\ 1:X2=0)

Note that the input cycle (see "Orig") is rendered slightly differently, in a canonical manner, with the annotations directly on the relaxation. For example, what was "L Rfe A" in our input cycle is shown as "RfeLA" in the "Orig" cycle. You can use either syntax equivalently.

Using a store release on thread P0 ensures that the two stores on P0 will not be reordered. In the cat file this appears in the relation bob again:

| po; [L]

This can be read as follows: if I take one or several steps in program order (po), land on a release write ([L]), then the two extremities are ordered.

In the Arm ARM this corresponds to:

RW2 is a write W2 and W2 is generated by an instruction with Release semantics.

Using a load acquire on thread P1 ensures that the two loads on P1 will not be reordered. In the cat file this appears in the relation bob:

| [A | Q]; po

Note that this clause also applies to AcquirePC ([Q]) instructions, viz, LDAPR instructions. This can be read as follows: if I start from an acquire or acquirePC read, and take one or several steps in program order (po), then the two extremities are ordered.

In the Arm ARM this corresponds to:

RW1 is a read R1 and R1 is generated by an instruction with Acquire or AcquirePC semantics.

2.6 Load and Store Exclusive

The diyone tool will let you generate tests which involve Load and Store Exclusive instructions. The relation LxSx asks the tool to generate a read-modify-write implemented with a load-store exclusive pair. Try for example:

~/bin/diyone7 -arch AArch64 LxSx Rfe PosRW Rfe -oneloc

You should get:

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
AArch64 A
"LxSx Rfe PosRW Rfe"
Generator=diyone7 (version 7.55+01(dev))
Com=Rf Rf
Orig=LxSx Rfe PosRW Rfe
{
0:X0=x;
1:X1=x;
}
 P0              | P1          ;
 MOV W2,#1       | LDR W0,[X1] ;
 Loop00:         | MOV W2,#2   ;
 LDXR W1,[X0]    | STR W2,[X1] ;
 STXR W3,W2,[X0] |             ;
 CBNZ W3,Loop00  |             ;
exists (x=2 /\ 0:X1=0 /\ 1:X0=1)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

AArch64 A
"LxSx Rfe PosRW Rfe"
Generator=diyone7 (version 7.55+01(dev))
Com=Rf Rf
Orig=LxSx Rfe PosRW Rfe
{
0:X0=x;
1:X1=x;
}
 P0              | P1          ;
 MOV W2,#1       | LDR W0,[X1] ;
 Loop00:         | MOV W2,#2   ;
 LDXR W1,[X0]    | STR W2,[X1] ;
 STXR W3,W2,[X0] |             ;
 CBNZ W3,Loop00  |             ;
exists (x=2 /\ 0:X1=0 /\ 1:X0=1)

Note the use of "PosRW": program order (Po) between a read and a write (RW) with the same location (s).

The "-oneloc" option ensures that the tool is aware that the test being generated only involves one location, it will complain otherwise.

You can ask the tool to unroll the loop of the read-modify-write, as follows:

~/bin/diyone7 -arch AArch64 LxSx Rfe PosRW Rfe -oneloc -unrollatomic 1

You should get:

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
AArch64 A
"LxSx Rfe PosRW Rfe"
Generator=diyone7 (version 7.55+01(dev))
Com=Rf Rf
Orig=LxSx Rfe PosRW Rfe
{
0:X0=x; 0:X5=ok00;
ok00=1;
1:X1=x;
}
 P0              | P1          ;
 MOV W2,#1       | LDR W0,[X1] ;
 LDXR W1,[X0]    | MOV W2,#2   ;
 STXR W3,W2,[X0] | STR W2,[X1] ;
 CBNZ W3,Fail00  |             ;
 B Exit00        |             ;
 Fail00:         |             ;
 MOV W4,#0       |             ;
 STR W4,[X5]     |             ;
 Exit00:         |             ;
exists (x=2 /\ 0:X1=0 /\ 1:X0=1 /\ ok00=1)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

AArch64 A
"LxSx Rfe PosRW Rfe"
Generator=diyone7 (version 7.55+01(dev))
Com=Rf Rf
Orig=LxSx Rfe PosRW Rfe
{
0:X0=x; 0:X5=ok00;
ok00=1;
1:X1=x;
}
 P0              | P1          ;
 MOV W2,#1       | LDR W0,[X1] ;
 LDXR W1,[X0]    | MOV W2,#2   ;
 STXR W3,W2,[X0] | STR W2,[X1] ;
 CBNZ W3,Fail00  |             ;
 B Exit00        |             ;
 Fail00:         |             ;
 MOV W4,#0       |             ;
 STR W4,[X5]     |             ;
 Exit00:         |             ;
exists (x=2 /\ 0:X1=0 /\ 1:X0=1 /\ ok00=1)

The option "-unrollatomic" takes a natural number, which will correspond to the number of times the loop is unrolled. Note that the tool added a new variable ok00 in the final state, which stems from unrolling the loop.

Finally, you can use annotations on the LxSx relaxation too. For example (note the use of LxSXAP):

~/bin/diyone7 -arch AArch64 LxSxAP Rfe PosRW Rfe -oneloc -unrollatomic 1

You should get:

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
AArch64 A
"LxSxAP Rfe PosRW RfePA"
Generator=diyone7 (version 7.55+01(dev))
Com=Rf Rf
Orig=LxSxAP Rfe PosRW RfePA
{
0:X0=x; 0:X5=ok00;
ok00=1;
1:X1=x;
}
 P0              | P1          ;
 MOV W2,#1       | LDR W0,[X1] ;
 LDAXR W1,[X0]   | MOV W2,#2   ;
 STXR W3,W2,[X0] | STR W2,[X1] ;
 CBNZ W3,Fail00  |             ;
 B Exit00        |             ;
 Fail00:         |             ;
 MOV W4,#0       |             ;
 STR W4,[X5]     |             ;
 Exit00:         |             ;
exists (x=2 /\ 0:X1=0 /\ 1:X0=1 /\ ok00=1)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

AArch64 A
"LxSxAP Rfe PosRW RfePA"
Generator=diyone7 (version 7.55+01(dev))
Com=Rf Rf
Orig=LxSxAP Rfe PosRW RfePA
{
0:X0=x; 0:X5=ok00;
ok00=1;
1:X1=x;
}
 P0              | P1          ;
 MOV W2,#1       | LDR W0,[X1] ;
 LDAXR W1,[X0]   | MOV W2,#2   ;
 STXR W3,W2,[X0] | STR W2,[X1] ;
 CBNZ W3,Fail00  |             ;
 B Exit00        |             ;
 Fail00:         |             ;
 MOV W4,#0       |             ;
 STR W4,[X5]     |             ;
 Exit00:         |             ;
exists (x=2 /\ 0:X1=0 /\ 1:X0=1 /\ ok00=1)

Note that the LDXR instruction has become LDAXR, as per the Acquire (A) annotation given as input to diyone.

2.7 Atomic operations

The diyone tool will let you generate tests which involve atomic operations such as CAS or SWP. The following relaxations are available:

Amo.Swp
Amo.Cas
Amo.Ld<Op>
Amo.St<Op>
with <Op> being Add, Eor, Clr, Set

Try for example:

~/bin/diyone7 -arch AArch64 Amo.LdClr PodWR Amo.Swp Rfe Amo.StAdd PodWR Amo.Cas Rfe

You should get:

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
AArch64 A
"Amo.LdClr PodWR Amo.Swp Rfe Amo.StAdd PodWR Amo.Cas Rfe"
Generator=diyone7 (version 7.55+01(dev))
Prefetch=0:x=F,0:y=W,1:y=F,1:x=W
Com=Rf Rf
Orig=Amo.LdClr PodWR Amo.Swp Rfe Amo.StAdd PodWR Amo.Cas Rfe
{
0:X0=x; 0:X3=y;
1:X0=y; 1:X2=x;
}
 P0               | P1             ;
 MOV W2,#2        | MOV W1,#2      ;
 LDCLR W2,W1,[X0] | STADD W1,[X0]  ;
 MOV W5,#1        | MOV W3,#0      ;
 SWP W5,W4,[X3]   | MOV W4,#1      ;
                  | CAS W3,W4,[X2] ;
exists (x=2 /\ y=2 /\ 0:X1=1 /\ 0:X4=0 /\ 1:X3=0)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

AArch64 A
"Amo.LdClr PodWR Amo.Swp Rfe Amo.StAdd PodWR Amo.Cas Rfe"
Generator=diyone7 (version 7.55+01(dev))
Prefetch=0:x=F,0:y=W,1:y=F,1:x=W
Com=Rf Rf
Orig=Amo.LdClr PodWR Amo.Swp Rfe Amo.StAdd PodWR Amo.Cas Rfe
{
0:X0=x; 0:X3=y;
1:X0=y; 1:X2=x;
}
 P0               | P1             ;
 MOV W2,#2        | MOV W1,#2      ;
 LDCLR W2,W1,[X0] | STADD W1,[X0]  ;
 MOV W5,#1        | MOV W3,#0      ;
 SWP W5,W4,[X3]   | MOV W4,#1      ;
                  | CAS W3,W4,[X2] ;
exists (x=2 /\ y=2 /\ 0:X1=1 /\ 0:X4=0 /\ 1:X3=0)

You may use annotations on those atomic operations as well. For example, (note the use of SwpAL):

~/bin/diyone7 -arch AArch64 Amo.LdClr PodWR Amo.SwpAL Rfe Amo.StAdd PodWR Amo.Cas Rfe

You should get:

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
AArch64 A
"Amo.LdClr PodWRPA Amo.SwpAL RfeLP Amo.StAdd PodWR Amo.Cas Rfe"
Generator=diyone7 (version 7.55+01(dev))
Prefetch=0:x=F,0:y=W,1:y=F,1:x=W
Com=Rf Rf
Orig=Amo.LdClr PodWRPA Amo.SwpAL RfeLP Amo.StAdd PodWR Amo.Cas Rfe
{
0:X0=x; 0:X3=y;
1:X0=y; 1:X2=x;
}
 P0               | P1             ;
 MOV W2,#2        | MOV W1,#2      ;
 LDCLR W2,W1,[X0] | STADD W1,[X0]  ;
 MOV W5,#1        | MOV W3,#0      ;
 SWPAL W5,W4,[X3] | MOV W4,#1      ;
                  | CAS W3,W4,[X2] ;
exists (x=2 /\ y=2 /\ 0:X1=1 /\ 0:X4=0 /\ 1:X3=0)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

AArch64 A
"Amo.LdClr PodWRPA Amo.SwpAL RfeLP Amo.StAdd PodWR Amo.Cas Rfe"
Generator=diyone7 (version 7.55+01(dev))
Prefetch=0:x=F,0:y=W,1:y=F,1:x=W
Com=Rf Rf
Orig=Amo.LdClr PodWRPA Amo.SwpAL RfeLP Amo.StAdd PodWR Amo.Cas Rfe
{
0:X0=x; 0:X3=y;
1:X0=y; 1:X2=x;
}
 P0               | P1             ;
 MOV W2,#2        | MOV W1,#2      ;
 LDCLR W2,W1,[X0] | STADD W1,[X0]  ;
 MOV W5,#1        | MOV W3,#0      ;
 SWPAL W5,W4,[X3] | MOV W4,#1      ;
                  | CAS W3,W4,[X2] ;
exists (x=2 /\ y=2 /\ 0:X1=1 /\ 0:X4=0 /\ 1:X3=0)

Note that the SWP instruction has become SWPAL as per the input annotation.

2.8 Sequences of relations

The diy tool will let you specify certain sequences of relaxations as well. As an example, the architecture specifies the following in the definition of Dependency-ordered-before:

RW2 is a write W2 appearing in program order after a read or a write RW3 and there is an Address dependency from R1 to RW3.

This means that a sequence of an address dependency followed by a step of program order should be maintained by any implementation.

Now let us generate a test which will probe for this. Try for example (observe the syntax [DpAddrdR, PodRR] which stands for the sequence):

~/bin/diyone7 -arch AArch64 "DMB.STdWW Rfe [DpAddrdR, PodRR] Fre"

You should get:

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
AArch64 A
"DMB.STdWW Rfe DpAddrdR PodRR Fre"
Generator=diyone7 (version 7.55+01(dev))
Prefetch=0:x=F,0:y=W,1:y=F,1:x=T
Com=Rf Fr
Orig=DMB.STdWW Rfe DpAddrdR PodRR Fre
{
0:X1=x; 0:X3=y;
1:X1=y; 1:X4=z; 1:X6=x;
}
 P0          | P1                  ;
 MOV W0,#1   | LDR W0,[X1]         ;
 STR W0,[X1] | EOR W2,W0,W0        ;
 DMB ST      | LDR W3,[X4,W2,SXTW] ;
 MOV W2,#1   | LDR W5,[X6]         ;
 STR W2,[X3] |                     ;
exists (1:X0=1 /\ 1:X5=0)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

AArch64 A
"DMB.STdWW Rfe DpAddrdR PodRR Fre"
Generator=diyone7 (version 7.55+01(dev))
Prefetch=0:x=F,0:y=W,1:y=F,1:x=T
Com=Rf Fr
Orig=DMB.STdWW Rfe DpAddrdR PodRR Fre
{
0:X1=x; 0:X3=y;
1:X1=y; 1:X4=z; 1:X6=x;
}
 P0          | P1                  ;
 MOV W0,#1   | LDR W0,[X1]         ;
 STR W0,[X1] | EOR W2,W0,W0        ;
 DMB ST      | LDR W3,[X4,W2,SXTW] ;
 MOV W2,#1   | LDR W5,[X6]         ;
 STR W2,[X3] |                     ;
exists (1:X0=1 /\ 1:X5=0)

2.9 Generating many tests at once

Now that we know how to generate one test at a time, and are a bit more familiar with the syntax needed to do so, let us see how we can generate many tests at once. This proves useful when wanting to test a piece of hardware and check whether it respects the memory model.

To generate several tests at once, we can write a configuration file and feed it to diy7, the sibling tool of diyone7.

In a configuration file, we should specify the following items:

The architecture of interest, with "-arch"
The maximum number of threads we want our tests to have, with "-nprocs".
The maximum number of relaxations we want our tests to involve, with "-size".
The list of relaxations we are interested in with "-relaxlist"
A common prefix for naming all the tests being generated with "-name".

Here is a small example, where I have put in the relaxlist a selection of the relaxations we have studied in this tutorial. You can also download small.conf:

Download small.conf

-arch AArch64

-nprocs 2

-size 8

-name example

-relaxlist Rfe Fre Coe DMB.STdWW DMB.LDdRR DpAddrdR PodWWPL PodRRAP LxSxAP Amo.SwpAL [DpAddrdR, PodRR]

Calling:

~/bin/diy7 -conf small.conf

should give you:

Generator produced 359 tests

All those tests are called "exampleXYZ.litmus". Let us open one at random:

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
AArch64 example249
"DMB.LDdRRPA LxSxAP PodWWPL RfeLP DpAddrdR PodRRPA Amo.SwpAL RfeLP"
Cycle=DpAddrdR PodRRPA Amo.SwpAL RfeLP DMB.LDdRRPA LxSxAP PodWWPL RfeLP
Relax=
Safe=Rfe DMB.LDdRR [DpAddrdR,PodRR] PodWWPL LxSxAP Amo.SwpAL
Generator=diy7 (version 7.55+01(dev))
Prefetch=0:x=F,0:z=W,1:z=F,1:x=W
Com=Rf Rf
Orig=DMB.LDdRRPA LxSxAP PodWWPL RfeLP DpAddrdR PodRRPA Amo.SwpAL RfeLP
{
0:X1=x; 0:X2=y; 0:X7=z;
1:X1=z; 1:X4=a; 1:X5=x;
}
 P0              | P1                  ;
 LDR W0,[X1]     | LDR W0,[X1]         ;
 DMB LD          | EOR W2,W0,W0        ;
 MOV W4,#1       | LDR W3,[X4,W2,SXTW] ;
 Loop00:         | MOV W7,#1           ;
 LDAXR W3,[X2]   | SWPAL W7,W6,[X5]    ;
 STXR W5,W4,[X2] |                     ;
 CBNZ W5,Loop00  |                     ;
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

AArch64 example249
"DMB.LDdRRPA LxSxAP PodWWPL RfeLP DpAddrdR PodRRPA Amo.SwpAL RfeLP"
Cycle=DpAddrdR PodRRPA Amo.SwpAL RfeLP DMB.LDdRRPA LxSxAP PodWWPL RfeLP
Relax=
Safe=Rfe DMB.LDdRR [DpAddrdR,PodRR] PodWWPL LxSxAP Amo.SwpAL
Generator=diy7 (version 7.55+01(dev))
Prefetch=0:x=F,0:z=W,1:z=F,1:x=W
Com=Rf Rf
Orig=DMB.LDdRRPA LxSxAP PodWWPL RfeLP DpAddrdR PodRRPA Amo.SwpAL RfeLP
{
0:X1=x; 0:X2=y; 0:X7=z;
1:X1=z; 1:X4=a; 1:X5=x;
}
 P0              | P1                  ;
 LDR W0,[X1]     | LDR W0,[X1]         ;
 DMB LD          | EOR W2,W0,W0        ;
 MOV W4,#1       | LDR W3,[X4,W2,SXTW] ;
 Loop00:         | MOV W7,#1           ;
 LDAXR W3,[X2]   | SWPAL W7,W6,[X5]    ;
 STXR W5,W4,[X2] |                     ;
 CBNZ W5,Loop00  |                     ;
 MOV W6,#1       |                     ;
 STLR W6,[X7]    |                     ;
exists (0:X0=1 /\ 0:X3=0 /\ 1:X0=1 /\ 1:X6=0)

2.10 An Arm forbidden configuration file

And now, as a finishing note: we distribute a configuration file forbidden.conf which aims to list all the possible relaxations which are guaranteed to be ordered by the architecture. We make no guarantee as to its exhaustivity, but using this file should give you a good set of tests to probe designs, silicon, or other tools with. The user can of course vary the bounds "nprocs" and size, but please be aware that the number of tests generated can become very large.

Download forbidden.conf

Access the Memory Model Tool via the Arm Developer website:

Memory Model Tool

0 comments
0 members are here

Architectures and Processors blog

Introducing GICv5: Scalable and secure interrupt management for Arm

Christoffer Dall

Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
- April 28, 2025
Getting started with AARCHMRS Features.json using Python

Joh

A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
- April 8, 2025
Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

Samer El-Haj-Mahmoud

Arm and 9elements Cyber Security have brought a prototype of OpenBMC to the Arm Neoverse Compute Subsystem (CSS) to advancing server manageability.
- January 28, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

How to generate litmus tests automatically with the diy7 tool

2. Generating tests

2.0 Installing the tools

2.1 Generating a specific test

2.2. Syntax used by diyone7

2.3 Memory Barriers

| [W] ; po; [dmb.st]; po; [W]

RW1 is a write W1, RW2 is a write W2, and W1 appears in program order before a DMB ST that appears in program order before W2.

| [R] ; po; [dmb.ld]; po

RW1 is a read R1 and R1 appears in program order before a DMB LD that appears in program order before RW2.

2.4 Dependencies

| addr | data

There is an Address dependency or a Data dependency from R1 to RW2.

2.5 Annotations

| po; [L]

RW2 is a write W2 and W2 is generated by an instruction with Release semantics.

| [A | Q]; po

RW1 is a read R1 and R1 is generated by an instruction with Acquire or AcquirePC semantics.

2.6 Load and Store Exclusive

2.7 Atomic operations

2.8 Sequences of relations

RW2 is a write W2 appearing in program order after a read or a write RW3 and there is an Address dependency from R1 to RW3.

2.9 Generating many tests at once

2.10 An Arm forbidden configuration file

Introducing GICv5: Scalable and secure interrupt management for Arm

Getting started with AARCHMRS Features.json using Python

Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC