This blog is the second installment in a series of blogs looking at how to use the Memory Model Tool. It provides an opportunity to experiment with the model and develop an intuitive understanding of how it works. The information is useful to software developers, compiler writers, and verification engineers. To read the first installment in this blog series, go to A working example of how to use the herd7 Memory Model Tool. You can access the tool through the Arm Developer website and at the bottom of this blog.
Here we are going to learn how to automatically generate litmus tests. This is similar to the message passing one (MP) that we studied in the previous blog post.
To start with, we need to install the diy tool. More precisely, we are going to install all the herd+diy toolsuite. This will give you:
Of course we have experimented with the herd7 tool in the previous blog, with its web interface. You now get the command line version.
Please follow the instructions at http://diy.inria.fr/sources/index.html. You may also want to follow the development version from https://github.com/herd/herdtools7/tree/master.
Let us generate the message passing example. As a starting point, if you do:
~/bin/diyone7 -arch AArch64 "PodWW Rfe PodRR Fre"
You get the message passing example of the previous blog post on your terminal:
AArch64 A "PodWW Rfe PodRR Fre" Generator=diyonedev (version 7.51+4(dev)) Prefetch=0:x=F,0:y=W,1:y=F,1:x=R Com=Rf Fr Orig=PodWW Rfe PodRR Fre { 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; } P0 | P1 ; MOV W0,#1 | LDR W0,[X1] ; STR W0,[X1] | LDR W2,[X3] ; MOV W2,#1 | ; STR W2,[X3] | ; exists (1:X0=1 /\ 1:X2=0)
You can name the test as you would like using a command-line option, for example:
~/bin/diyone7 -arch AArch64 "PodWW Rfe PodRR Fre" -name mp
This will not display the test on your terminal, but rather it will create a file called "mp.litmus".
The tool understands the input "PodWW Rfe PodRR Fre" as a cycle of semantics relations (the ones that are displayed in the execution figures in the previous blog). More precisely, the tool takes the input string as a directed path between memory events, which ends up cycling back to its initial event. Let me explain this a bit:
Recall from the previous blog that a read R is in from-read before a write W if there exists a write W’ such that: 1. R reads-from W’ and 2. W’ is in coherence order before W, that is, hit the memory before W. In other words, R is in from-read before all the writes W that overwrite the value read by R, which comes from W’.
Figure 1
So, the tool starts by reading PodWW and builds a first thread P0 which has two stores to different locations. Graphically, this corresponds to:
Figure 2
Then the Rfe indicates that it must create a new thread P1, which starts with a read that will read from the bottom write on P0. Note that this read mandates the final value 1:X0=1. Graphically, this corresponds to:
Figure 3
Then PodRR indicates that the new thread, P1, has to have two reads to different locations. The first one comes from Rfe, and the second one appears at the bottom of P1. Graphically, this corresponds to:
Figure 4
Finally, Fre closes the cycle, and mandates the final value 1:X2=0. Graphically, this corresponds to:
Figure 5
For more details on the input syntax, please see the documentation for the diy tool at http://diy.inria.fr/doc/gen.html.
Now, you might want to generate a test which should not be observed on hardware, for example to test whether that piece of hardware is respecting the architecture. For instance, let us generate the Message Passing example with barriers on each thread, so that the test we obtain is correctly synchronized as per the architecture.
The architecture states that a DMB ST will prevent two store instructions from being reordered. There are two reference points which allow you to check this:
In the cat file (see http://diy.inria.fr/www/?record=aarch64), the definition of the relation bob (Barrier-ordered-before) features the following clause:
This can be read as follows: starting from a write ([W]), if I take one or several steps in program order (po), land on a DMB ST ([dmb.st]), then take one or several steps in program order which lands on another write ([W]), then the two extremity writes are ordered. Graphically, this corresponds to:
Figure 6
In the Arm Architecture Reference Manual (Arm ARM) (see https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile, section B2.3), the corresponding text reflects the formal cat clause:
Similarly, a DMB LD will prevent two loads from being reordered (NB: a DMB LD will also ensure that a load-store pair will not be reordered, although we won't be needing that ordering in this example).
The cat clause (see relation bob in http://diy.inria.fr/www/?record=aarch64) is:
The English transliteration in the Arm ARM (see relation Barrier-ordered-before in Section B2.3 of https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile) is:
Now, let us try:
~/bin/diyone7 -arch AArch64 "DMB.STdWW Rfe DMB.LDdRR Fre" -name mp+dmb.st+dmb.ld
You get the following test on your terminal:
AArch64 mp+dmb.st+dmb.ld "DMB.STdWW Rfe DMB.LDdRR Fre" Generator=diyone7 (version 7.55+01(dev)) Prefetch=0:x=F,0:y=W,1:y=F,1:x=T Com=Rf Fr Orig=DMB.STdWW Rfe DMB.LDdRR Fre { 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; } P0 | P1 ; MOV W0,#1 | LDR W0,[X1] ; STR W0,[X1] | DMB LD ; DMB ST | LDR W2,[X3] ; MOV W2,#1 | ; STR W2,[X3] | ; exists (1:X0=1 /\ 1:X2=0)
Observe that in the previous test we were using "PodWW" and "PodRR"; this time around we are using their siblings - "DMB.STdWW" and "DMB.LDdRR" - which will require the tool to generate the corresponding barriers on each thread. As before, DMB.LDdRR means using a DMB LD barrier (DMB.LD) between two reads (RR) with different locations (d).
On the reading thread P1, we can use an address dependency instead of the barrier DMB.LD. The corresponding cat clause (see relation dob in http://diy.inria.fr/www/?record=aarch64) is:
The English transliteration in the Arm ARM (see relation Dependency-ordered-before in Section B2.3 of https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile) is:
Now, try:
~/bin/diyone7 -arch AArch64 "DMB.STdWW Rfe DpAddrdR Fre" -name mp+dmb.st+addr
You should get:
AArch64 mp+dmb.st+addr "DMB.STdWW Rfe DpAddrdR Fre" Generator=diyone7 (version 7.55+01(dev)) Prefetch=0:x=F,0:y=W,1:y=F,1:x=T Com=Rf Fr Orig=DMB.STdWW Rfe DpAddrdR Fre { 0:X1=x; 0:X3=y; 1:X1=y; 1:X4=x; } P0 | P1 ; MOV W0,#1 | LDR W0,[X1] ; STR W0,[X1] | EOR W2,W0,W0 ; DMB ST | LDR W3,[X4,W2,SXTW] ; MOV W2,#1 | ; STR W2,[X3] | ; exists (1:X0=1 /\ 1:X3=0)
Note that I have used "DpAddrdR" to require the tool to generate an address dependency; previously we have used "DMB.LDdRR". As before, DpAddrdRR means using an address dependency (DpAddr) between two reads (R) with different locations (d). Note that, unlike in the PodRR and DMB.LDdRR, we only need to specify that the second access is a read because dependencies always start with a read---hence, specifying the first one is superfluous.
Annotations allow you to indicate which flavors of accesses you would like your test to feature: plain, acquire or release, exclusives.
If you use the “P” annotation (plain), you will get the same MP example again, because you are asking diy to give you only plain accesses:
~/bin/diyone7 -arch AArch64 "PodWW P Rfe P PodRR Fre" -name mp
This can be read as follows: we are asking the tool to generate a test where we start with a first thread featuring two writes in program order (PodWW), where the second one is plain (P). Then the tool is asked to exit the first thread with Rfe, which means it needs to create a read which will read the last write on the first thread. This read needs to be plain too (P). The second thread should feature two reads in program order with different locations (PodRR), and we close the cycle with Fre as before. If you want acquire accesses you can use the “A” annotation, and for release accesses, the “L” annotation; for example:
~/bin/diyone7 -arch AArch64 "PodWW L Rfe A PodRR Fre" -name mp+rel+acq
This should give you a variation on the message passing example, where the write of the flag is a release write, and the read of the flag is an acquire read. In other words, this will give you the message passing example with the second store on P0 being a release and the first load on P1 being an acquire.
More precisely, this can be read as follows: we are asking the tool to generate a test where we start with a first thread featuring two writes in program order (PodWW), where the second one is release (L). Then the tool is asked to exit the first thread with Rfe, which means it needs to create a read which will read the last write on the first thread. This read needs to be an acquire (A). The second thread should feature two reads in program order with different locations (PodRR), and we close the cycle with Fre as before.
AArch64 mp+rel+acq "PodWWPL RfeLA PodRRAP Fre" Generator=diyone7 (version 7.55+01(dev)) Prefetch=0:x=F,0:y=W,1:y=F,1:x=T Com=Rf Fr Orig=PodWWPL RfeLA PodRRAP Fre { 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; } P0 | P1 ; MOV W0,#1 | LDAR W0,[X1] ; STR W0,[X1] | LDR W2,[X3] ; MOV W2,#1 | ; STLR W2,[X3] | ; exists (1:X0=1 /\ 1:X2=0)
Note that the input cycle (see "Orig") is rendered slightly differently, in a canonical manner, with the annotations directly on the relaxation. For example, what was "L Rfe A" in our input cycle is shown as "RfeLA" in the "Orig" cycle. You can use either syntax equivalently.
Using a store release on thread P0 ensures that the two stores on P0 will not be reordered. In the cat file this appears in the relation bob again:
This can be read as follows: if I take one or several steps in program order (po), land on a release write ([L]), then the two extremities are ordered.
In the Arm ARM this corresponds to:
Using a load acquire on thread P1 ensures that the two loads on P1 will not be reordered. In the cat file this appears in the relation bob:
Note that this clause also applies to AcquirePC ([Q]) instructions, viz, LDAPR instructions. This can be read as follows: if I start from an acquire or acquirePC read, and take one or several steps in program order (po), then the two extremities are ordered.
The diyone tool will let you generate tests which involve Load and Store Exclusive instructions. The relation LxSx asks the tool to generate a read-modify-write implemented with a load-store exclusive pair. Try for example:
~/bin/diyone7 -arch AArch64 LxSx Rfe PosRW Rfe -oneloc
AArch64 A "LxSx Rfe PosRW Rfe" Generator=diyone7 (version 7.55+01(dev)) Com=Rf Rf Orig=LxSx Rfe PosRW Rfe { 0:X0=x; 1:X1=x; } P0 | P1 ; MOV W2,#1 | LDR W0,[X1] ; Loop00: | MOV W2,#2 ; LDXR W1,[X0] | STR W2,[X1] ; STXR W3,W2,[X0] | ; CBNZ W3,Loop00 | ; exists (x=2 /\ 0:X1=0 /\ 1:X0=1)
Note the use of "PosRW": program order (Po) between a read and a write (RW) with the same location (s).
The "-oneloc" option ensures that the tool is aware that the test being generated only involves one location, it will complain otherwise.
You can ask the tool to unroll the loop of the read-modify-write, as follows:
~/bin/diyone7 -arch AArch64 LxSx Rfe PosRW Rfe -oneloc -unrollatomic 1
AArch64 A "LxSx Rfe PosRW Rfe" Generator=diyone7 (version 7.55+01(dev)) Com=Rf Rf Orig=LxSx Rfe PosRW Rfe { 0:X0=x; 0:X5=ok00; ok00=1; 1:X1=x; } P0 | P1 ; MOV W2,#1 | LDR W0,[X1] ; LDXR W1,[X0] | MOV W2,#2 ; STXR W3,W2,[X0] | STR W2,[X1] ; CBNZ W3,Fail00 | ; B Exit00 | ; Fail00: | ; MOV W4,#0 | ; STR W4,[X5] | ; Exit00: | ; exists (x=2 /\ 0:X1=0 /\ 1:X0=1 /\ ok00=1)
The option "-unrollatomic" takes a natural number, which will correspond to the number of times the loop is unrolled. Note that the tool added a new variable ok00 in the final state, which stems from unrolling the loop.
Finally, you can use annotations on the LxSx relaxation too. For example (note the use of LxSXAP):
~/bin/diyone7 -arch AArch64 LxSxAP Rfe PosRW Rfe -oneloc -unrollatomic 1
AArch64 A "LxSxAP Rfe PosRW RfePA" Generator=diyone7 (version 7.55+01(dev)) Com=Rf Rf Orig=LxSxAP Rfe PosRW RfePA { 0:X0=x; 0:X5=ok00; ok00=1; 1:X1=x; } P0 | P1 ; MOV W2,#1 | LDR W0,[X1] ; LDAXR W1,[X0] | MOV W2,#2 ; STXR W3,W2,[X0] | STR W2,[X1] ; CBNZ W3,Fail00 | ; B Exit00 | ; Fail00: | ; MOV W4,#0 | ; STR W4,[X5] | ; Exit00: | ; exists (x=2 /\ 0:X1=0 /\ 1:X0=1 /\ ok00=1)
Note that the LDXR instruction has become LDAXR, as per the Acquire (A) annotation given as input to diyone.
The diyone tool will let you generate tests which involve atomic operations such as CAS or SWP. The following relaxations are available:
Amo.SwpAmo.CasAmo.Ld<Op>Amo.St<Op>with <Op> being Add, Eor, Clr, Set
Try for example:
~/bin/diyone7 -arch AArch64 Amo.LdClr PodWR Amo.Swp Rfe Amo.StAdd PodWR Amo.Cas Rfe
AArch64 A "Amo.LdClr PodWR Amo.Swp Rfe Amo.StAdd PodWR Amo.Cas Rfe" Generator=diyone7 (version 7.55+01(dev)) Prefetch=0:x=F,0:y=W,1:y=F,1:x=W Com=Rf Rf Orig=Amo.LdClr PodWR Amo.Swp Rfe Amo.StAdd PodWR Amo.Cas Rfe { 0:X0=x; 0:X3=y; 1:X0=y; 1:X2=x; } P0 | P1 ; MOV W2,#2 | MOV W1,#2 ; LDCLR W2,W1,[X0] | STADD W1,[X0] ; MOV W5,#1 | MOV W3,#0 ; SWP W5,W4,[X3] | MOV W4,#1 ; | CAS W3,W4,[X2] ; exists (x=2 /\ y=2 /\ 0:X1=1 /\ 0:X4=0 /\ 1:X3=0)
You may use annotations on those atomic operations as well. For example, (note the use of SwpAL):
~/bin/diyone7 -arch AArch64 Amo.LdClr PodWR Amo.SwpAL Rfe Amo.StAdd PodWR Amo.Cas Rfe
AArch64 A "Amo.LdClr PodWRPA Amo.SwpAL RfeLP Amo.StAdd PodWR Amo.Cas Rfe" Generator=diyone7 (version 7.55+01(dev)) Prefetch=0:x=F,0:y=W,1:y=F,1:x=W Com=Rf Rf Orig=Amo.LdClr PodWRPA Amo.SwpAL RfeLP Amo.StAdd PodWR Amo.Cas Rfe { 0:X0=x; 0:X3=y; 1:X0=y; 1:X2=x; } P0 | P1 ; MOV W2,#2 | MOV W1,#2 ; LDCLR W2,W1,[X0] | STADD W1,[X0] ; MOV W5,#1 | MOV W3,#0 ; SWPAL W5,W4,[X3] | MOV W4,#1 ; | CAS W3,W4,[X2] ; exists (x=2 /\ y=2 /\ 0:X1=1 /\ 0:X4=0 /\ 1:X3=0)
Note that the SWP instruction has become SWPAL as per the input annotation.
The diy tool will let you specify certain sequences of relaxations as well. As an example, the architecture specifies the following in the definition of Dependency-ordered-before:
This means that a sequence of an address dependency followed by a step of program order should be maintained by any implementation.
Now let us generate a test which will probe for this. Try for example (observe the syntax [DpAddrdR, PodRR] which stands for the sequence):
~/bin/diyone7 -arch AArch64 "DMB.STdWW Rfe [DpAddrdR, PodRR] Fre"
AArch64 A "DMB.STdWW Rfe DpAddrdR PodRR Fre" Generator=diyone7 (version 7.55+01(dev)) Prefetch=0:x=F,0:y=W,1:y=F,1:x=T Com=Rf Fr Orig=DMB.STdWW Rfe DpAddrdR PodRR Fre { 0:X1=x; 0:X3=y; 1:X1=y; 1:X4=z; 1:X6=x; } P0 | P1 ; MOV W0,#1 | LDR W0,[X1] ; STR W0,[X1] | EOR W2,W0,W0 ; DMB ST | LDR W3,[X4,W2,SXTW] ; MOV W2,#1 | LDR W5,[X6] ; STR W2,[X3] | ; exists (1:X0=1 /\ 1:X5=0)
Now that we know how to generate one test at a time, and are a bit more familiar with the syntax needed to do so, let us see how we can generate many tests at once. This proves useful when wanting to test a piece of hardware and check whether it respects the memory model.
To generate several tests at once, we can write a configuration file and feed it to diy7, the sibling tool of diyone7.In a configuration file, we should specify the following items:
Here is a small example, where I have put in the relaxlist a selection of the relaxations we have studied in this tutorial. You can also download small.conf:
[CTAToken URL = "https://developer.arm.com/-/media/B999B42FCCD7409A9B76C41CB06BC84E.ashx?revision=28b0b179-a63f-4bf2-aabb-d029be5fe9a9" target="_blank" text="Download small.conf" class ="green"]
-arch AArch64
-nprocs 2
-size 8
-name example
-relaxlist Rfe Fre Coe DMB.STdWW DMB.LDdRR DpAddrdR PodWWPL PodRRAP LxSxAP Amo.SwpAL [DpAddrdR, PodRR]
Calling:
~/bin/diy7 -conf small.conf
should give you:
Generator produced 359 tests
All those tests are called "exampleXYZ.litmus". Let us open one at random:
AArch64 example249 "DMB.LDdRRPA LxSxAP PodWWPL RfeLP DpAddrdR PodRRPA Amo.SwpAL RfeLP" Cycle=DpAddrdR PodRRPA Amo.SwpAL RfeLP DMB.LDdRRPA LxSxAP PodWWPL RfeLP Relax= Safe=Rfe DMB.LDdRR [DpAddrdR,PodRR] PodWWPL LxSxAP Amo.SwpAL Generator=diy7 (version 7.55+01(dev)) Prefetch=0:x=F,0:z=W,1:z=F,1:x=W Com=Rf Rf Orig=DMB.LDdRRPA LxSxAP PodWWPL RfeLP DpAddrdR PodRRPA Amo.SwpAL RfeLP { 0:X1=x; 0:X2=y; 0:X7=z; 1:X1=z; 1:X4=a; 1:X5=x; } P0 | P1 ; LDR W0,[X1] | LDR W0,[X1] ; DMB LD | EOR W2,W0,W0 ; MOV W4,#1 | LDR W3,[X4,W2,SXTW] ; Loop00: | MOV W7,#1 ; LDAXR W3,[X2] | SWPAL W7,W6,[X5] ; STXR W5,W4,[X2] | ; CBNZ W5,Loop00 | ; MOV W6,#1 | ; STLR W6,[X7] | ; exists (0:X0=1 /\ 0:X3=0 /\ 1:X0=1 /\ 1:X6=0)
And now, as a finishing note: we distribute a configuration file forbidden.conf which aims to list all the possible relaxations which are guaranteed to be ordered by the architecture. We make no guarantee as to its exhaustivity, but using this file should give you a good set of tests to probe designs, silicon, or other tools with. The user can of course vary the bounds "nprocs" and size, but please be aware that the number of tests generated can become very large.
[CTAToken URL = "https://github.com/herd/herdtools7/blob/master/gen/libdir/forbidden.conf" target="_blank" text="Download forbidden.conf" class ="green"]
Access the Memory Model Tool via the Arm Developer website:
[CTAToken URL = "https://developer.arm.com/architectures/cpu-architecture/a-profile/memory-model-tool?_ga=2.196536111.26066257.1591604084-1566025861.1574767315" target="_blank" text="Memory Model Tool" class ="green"]