This post presents the implementation of Morello in the Memory Model Tool. The reader is expected to have some understanding of the tool, which is documented in previous posts: A working example of how to use the herd7 Memory Model Tool and How to generate litmus tests automatically with the diy7 tool.
Morello is a security architecture developed by Arm, based on Capability Hardware Enhanced RISC Instructions (CHERI). More information can be found in the Arm Morello program.
We have extended the Memory Model Tools to support Morello:
https://github.com/herd/herdtools7/commit/ef79f222ed8ad77c79457c3c9c77a945f12bf9fb
Extending the Memory Model Tools to include the Morello instruction set increases the coverage of the tools. This benefits hardware developers and software developers during the prototyping of Morello evaluation boards and the code that runs on them. There are several aspects that are of interest in the implementation of Morello in the Memory Model Tool, namely the addition of:
Before we dive into Morello, let us have a brief introduction about the Arm Memory Tagging Extension (MTE), since it is useful later in this post. MTE implements lock and key access to memory. More information can be found in the Armv8-A security features.
MTE has been part of the Memory Model Tools for some time now (introduced with https://github.com/herd/herdtools7/commit/6378f2fb0ce9a489d76544dd07bfcb32116d6e76).
Both MTE and Morello have objects called “tags”. In both MTE and Morello, the goal of tags is to act as labels attached to the rest of the data. As a consequence, the implementation of the MTE and Morello tags in the tool have some similarities. Therefore we briefly describe the way the MTE variant works here, to support the presentation of the Morello variant.
In the Memory Model Tool, the different values the MTE tag can take are represented as colors: “green”, “red”, “blue”, … They can be read with the “LDG” instruction, and written with the “STG” instruction.
Here is an example of an MTE test that can be generated with diyone7:
> diyone7 -arch AArch64 -variant memtag PodWWTT Rfe PodRRTT Fre -name MP-MTE AArch64 MP-MTE { 0:X0=x:red; 0:X1=x:green; 0:X2=y:red; 0:X3=y:green; 1:X1=y:red; 1:X3=x:green; } P0 | P1 ; STG X0,[X1] | MOV X0,X1 ; STG X2,[X3] | LDG X0,[X1] ; | MOV X2,X3 ; | LDG X2,[X3] ; exists (1:X0=y:red /\ 1:X2=x:green)
Note the use of the new “memtag” variant and the new “T” annotation to generate MTE instructions operating on the MTE tags instead of the data.
One can then run MTE tests using the herd7 tool:
> herd7 -variant memtag MP-MTE.litmus Test MP-MTE Allowed States 4 1:X0=y:green; 1:X2=x:green; 1:X0=y:green; 1:X2=x:red; 1:X0=y:red; 1:X2=x:green; 1:X0=y:red; 1:X2=x:red; Ok Witnesses Positive: 1 Negative: 3 Condition exists (1:X0=y:red /\ 1:X2=x:green) Observation MP-MTE Sometimes 1 3 Time MP-MTE 0.03 Hash=8955f3049e5ad93e9d858b99d7d6ef22
Morello has two instruction sets: A64 and C64. The difference between the two is the interpretation of the operands in some instructions. Take the example of a load instruction:
So far only C64 is implemented in the tool.
In Armv8 without extensions, an address and a numeric value are indistinguishable. The choice to see a register as holding an address or a numeric value depends on the programmer’s intent.
That is not the case in the Memory Model Tool, where memory locations are distinguished from numeric values: memory locations are represented as strings. This allows for a clear distinction between what is read or written in memory and where that happens. Notations such as “Wx=1” then clearly means that the value 1 is written to memory location “x”, and “Rx=1” that the same memory location was read with value 1. In other words, the tool works on symbolic rather than byte memory locations.
When dealing with mixed memory access sizes (for example, 32-bit and 64-bit accesses to the same memory location), it is important to be able to refer to only part of a memory location. Depending on the litmus test, the tool identifies a “smallest access size”, and divides and memory locations into sublocations of that size. An offset can be added to the base memory location to access the other sublocations. If a mixed-size litmus test containing W and X sized memory accesses, what would have been a single 64-bit “x” location gets divided in two 32-bit sublocations: “x” and “x+4”.
But it is impossible to reference a symbolic memory location “y” from another symbolic memory location “x” by adding any amount of offset. This is a design choice of the tool, which we refer to in the remainder of this post. This allows us to explain certain choices made in the implementation of the Morello variant, as well as distinctions in the way numeric values and addresses are handled.
Morello introduces a new data type, the capability. This data type is held in Capability registers, which are 129-bit wide. Those registers are composed of the following fields:
The Flags and the lower bits of the Capability Bounds share encoding with the Capability Value.
Here is a graph showing the organization of the fields in a capability, from the Arm Architecture Reference Manual Supplement - Morello, version A.f, page 58:
The capability registers are stored in a capability-tagged memory, where a capability tag location exists for every 16 byte locations.
This has to be reflected in the Memory Model Tool when using it with the Morello variant enabled:
> herd7 -variant morello test.litmus
With the Morello variant:
This is implemented by adding a memory sublocation containing the capability tag, similarly to what has been done for the MTE tag. A significant difference being that the MTE tag of a particular memory location is accessed independently from the value held in this location. That is not the case for Morello. For Morello, this additional memory sublocation is treated in a way that is closer to the mixed-size variant, in that they are accessed together in memory.
For performance reasons, as well as simplicity of implementation, at the time of writing using the Morello variant in the tool triggers the use of the mixed variant. For the user, it means that reading graphs of the execution can potentially show a “smallest access size” (as described in 3.1 Memory locations) smaller than what the litmus test would suggest.
The Morello architecture ensures that certain checks guard every memory access. Those checks check whether the access is permitted or not. If a check fails, a synchronous Data Abort is generated, with a specific fault depending on what caused it. The implemented faults are the capability tag fault, the capability sealed fault and the capability permission fault. The capability bounds fault is not implemented because bounds cannot be represented in the Memory Model Tool at the moment.
In order not to fault, memory accesses need to use a valid and unsealed capability that has the necessary permissions to perform the relevant type of access (load or store).
The other types of faults introduced by Morello are not implemented in the tool. Those faults are:
The Morello instruction set expands the “classic” memory instructions (STR, LDR, SWP, CAS, …) to use capability registers, creating 128-bit aligned accesses. Note that capability registers are 129-bit wide, but that virtual addresses are 128-bit aligned.
It also adds two new instructions that operate specifically on capability tags: STCT (store capability tags) and LDCT (load capability tags). They have a unique interaction with memory and are therefore of special interest for the tool. It is also for that reason that they get their own annotation in the diy tool (more later). The instructions STCT and LDCT theoretically operate on 4 consecutive capability locations in memory. However the tool cannot reference a memory location “y” from a memory location “x” (see 3.1 Memory locations). Therefore those instructions only operate on the specified location. This limitation would translate to a real system where variables are 512-bit aligned.
Behind the scene, the diy tool uses different “banks” to manage the values written and read in the cycles it creates. Banks are unique to each memory location. There is only one bank used when no variant is specified: “Ord”.
The tool increments the bank of a memory location by 1 when a write occurs, and the final state of the cycle is generated from the expected values. In the example above, the read of “y” happens after the write of y with value 1, so the expected value of the read of y is 1. The read of “x” happens before the write of x with value, so the expected value of the read of x is 0.
The memtag variant of the tool added a second bank: “Tag”, storing the value of the MTE tag independently from the Ord bank. Reading and writing from and to the Tag bank is done by using the “T” annotation with the diy tool.
In the same spirit, the Morello variant introduces new banks to handle the different fields of capabilities, “CapaTag” for the capability tag and “CapaSeal” for the object type.
Here you can see that writing to different banks of a same memory location (here with “Wy.seal” followed by “Wy”) keeps all the written values in memory, and that reading from one of the banks (noted as “Ry.Seal”) only retrieves the value of the specified bank:
Unlike the MTE Tag bank, the capability banks act, along with Ord, as a tuple of values. The reason being that writing a capability tag or an object type to a register needs to be associated with the rest of the banks in order not to lose information. That can result in many more instructions in litmus tests.
The previous figure showing the Morello cycle is associated to the following litmus:
> diyone7 -arch AArch64 -variant morello PodWW Cs PosWW Rfe Cs PodRR Fre AArch64 A { uint128_t y; uint128_t x; 0:X1=0xffffc0000:x:1; 0:X2=0xffffc0000:y:1; 1:X1=0xffffc0000:y:1; 1:X3=0xffffc0000:x:1; } P0 | P1 ; MOV X0,#1 | LDR C0,[C1] ; STR X0,[C1] | GCTYPE X0,C0 ; MOV X3,#0 | LDR X2,[C3] ; MOV X4,#1 | ; SCVALUE C3,C2,X3 | ; SEAL C3,C3,C4 | ; STR C3,[C2] | ; MOV X5,#1 | ; STR X5,[C2] | ; exists (1:X0=1 /\ 1:X2=0)
We now cover the Morello-specific initializations and annotations.
The current representation is as follows, but may be subject to change.
When using “-variant morello”, capability registers holding values that are not memory locations are represented as:
“C:T”
Where C is an integer representing bits <127:0> of the capability and T is the tag (bit <128>). When “T” is 0, only “C” is displayed.
The tag is separated from the rest of the fields to simplify the reading when using cycles focused on tags. For example, a tagged capability with value 2 is represented as “0x2:1” instead of a more classical “0x100000000000000000000000000000002”.
The representation of the other types of registers (W and X) does not change.
Note that it might be useful to use “-hexa true” to get a more intelligible reading of the permission and object type fields of capability registers. For example, a tagged capability with all permissions and value 3 is represented as:
Memory locations represents a symbol in the tool: “x”, “y”, … (see 3.1 Memory locations). Bits <91:0> of capabilities holding a memory location are amalgamated as the letter representing this location. This is why the tool uses the following representation:
“C:m:T” (e.g. “0xffffc0000:x:1”)
Where:
When both “C” and “T” are 0, only “m” is displayed.
In the previous example, “0xffffc0000:x:1” denotes a tagged and unsealed capability with full permissions, containing the memory location “x”.
Note that in this case “C” is always represented as a hexadecimal value, ignoring the “-hexa” flag.
The initialization section of the litmus tests contains the initial state of the registers for each processing element. The input format matches the output format previously described.
For example, using the diy tool to get a message-passing cycle with “-variant morello” gives the following litmus:
> diyone7 -arch AArch64 -variant morello PodWW Rfe PodRR Fre AArch64 MP { uint128_t y; uint128_t x; 0:X1=0xffffc0000:x:1; 0:X3=0xffffc0000:y:1; 1:X1=0xffffc0000:y:1; 1:X3=0xffffc0000:x:1; } P0 | P1 ; MOV X0,#1 | LDR X0,[C1] ; STR X0,[C1] | LDR X2,[C3] ; MOV X2,#1 | ; STR X2,[C3] | ; exists (1:X0=1 /\ 1:X2=0)
Compared to the previous MP-MTE litmus test, this cycle does not contain additional annotations. The memory accesses are unchanged compared to the no-variant version of the message passing litmus test. Using “-variant morello” merely initializes the addresses so that they do not fault when executed in a Morello environment.
It may be interesting to observe the behavior of mixed Morello and non-Morello instructions in the same program. Classic “P”, “L”, “A” and “Q” annotations are receiving optional modifiers to reflect this. They can be suffixed with a lower-cased “c” to generate a Morello variation of the specified access.
For example:
“PodRW L Rfe A PodRW L Rfe A”
can be changed to:
“PodRW L Rfe Ac PodRW Lc Rfe A”
to replace memory accesses made by the second processing element to capability ones.
> diyone7 -arch AArch64 -variant morello PodRW L Rfe A PodRW L Rfe A AArch64 A { uint128_t y; uint128_t x; 0:X1=0xffffc0000:x:1; 0:X3=0xffffc0000:y:1; 1:X1=0xffffc0000:y:1; 1:X3=0xffffc0000:x:1; } P0 | P1 ; LDAR X0,[C1] | LDAR X0,[C1] ; MOV X2,#1 | MOV X2,#1 ; STLR X2,[C3] | STLR X2,[C3] ; exists (0:X0=1 /\ 1:X0=1) > diyone7 -arch AArch64 -variant morello PodRW L Rfe Ac PodRW Lc Rfe A AArch64 A { uint128_t y; uint128_t x; 0:X1=0xffffc0000:x:1; 0:X3=0xffffc0000:y:1; 1:X1=0xffffc0000:y:1; 1:X3=0xffffc0000:x:1; } P0 | P1 ; LDAR X0,[C1] | LDAR C0,[C1] ; MOV X2,#1 | GCVALUE X0,C0 ; STLR X2,[C3] | MOV X2,#1 ; | STLR C2,[C3] ; exists (0:X0=1 /\ 1:X0=1)
Notice the use of capability registers in P1 in the second example, and that “GCVALUE” is added to the load to extract the Capability Value field from the capability.
There are also new annotations that act on specific fields of a capability: “Ct” and “Cs”. Those annotations are the user-side of the banks concept developed in paragraph 3.5: Introducing new Banks. They correspond respectively to the “CapaTag” and “CapaSeal” banks. “Ct” to feature memory accesses modifying the Tag. “Cs” to feature memory accesses modifying the ObjectType. Note that because a capability tag is either 0 or 1, a “Ct” write can only be performed once per memory location.
> diyone7 -arch AArch64 -variant morello PodWWCtCt Rfe PodRRCtCt Fre AArch64 A { uint128_t y; uint128_t x; 0:X1=0xffffc0000:x:1; 0:X3=0xffffc0000:y:1; 1:X1=0xffffc0000:y:1; 1:X3=0xffffc0000:x:1; } P0 | P1 ; MOV X0,#1 | LDCT X0,[C1] ; STCT X0,[C1] | LDCT X2,[C3] ; MOV X2,#1 | ; STCT X2,[C3] | ; exists (1:X0=1 /\ 1:X2=0)
When no variant is selected, the diy tool creates cycles where the memory locations (or addresses) and the values are two completely separate entities. The registers containing memory locations are initialized once with the string representing that location (“x”, “y”, ...); their content does not change during the execution of the litmus test. On the other hand, values are numbers that are incremented for each write to the same memory location.
Morello offers a memory protection in the form of checks on every memory access. If those checks fail, this results in a capability fault (see 3.3 Capability memory protection). We handle this behavior in the tool.
But creating diy cycles that display a possibility of fault depending on the execution order requires to move away from the separation of values and memory locations.
When no variant is selected, the diy tool creates a dependency using an “exclusive or” operation. That way the address is unchanged, but the register value does depend on the previous load.
This can be produces with:
> diyone7 -arch AArch64 DpAddrdW Rfe PodRW Rfe
The Morello variant builds on the address dependency to introduce scenarios where a fault can happen. When used with one of the new “Ct” or “Cs” annotations, the dependency of DpAddr still exists but under a different form. In this context, the content of the capability used for the second memory access still depends on the value read during the first access. However, the difference between the value read and the expected value in the cycle is used to fill the specified capability field of the address: the tag when specifying “DpAddr(s|d)(R|W)Ct”, the object type when specifying “DpAddr(s|d)(R|W)Cs”.This can be observed in the tool with:
> diyone7 -arch AArch64 -variant morello DpAddrdWPCs Rfe PodRW Rfe
Produces the following test:
AArch64 A "DpAddrdWPCs RfeCsP PodRW Rfe" Generator=diyone7 (version 7.56+02~dev) Prefetch=0:x=F,0:y=W,1:y=F,1:x=W Com=Rf Rf Orig=DpAddrdWPCs RfeCsP PodRW Rfe { uint128_t y; uint128_t x; 0:X1=0xffffc0000:x:1; 0:X3=0xffffc0000:y:1; 1:X1=0xffffc0000:y:1; 1:X3=0xffffc0000:x:1; } P0 | P1 ; LDR X0,[X1] | LDR X0,[X1] ; SUBS X4,X0,#1 | MOV X2,#1 ; AND X4,X4,#4095 | STR X2,[X3] ; SCVALUE C4,C3,X4 | ; CSEAL C4,C3,C4 | ; MOV X5,#0 | ; MOV X6,#1 | ; SCVALUE C5,C3,X5 | ; SEAL C5,C5,C6 | ; STR C5,[X4] | ; exists (0:X0=1 /\ 1:X0=0) /\ ~fault(P0,y)
Because there are many more fields to consider as compared to, for example, a plain write, the creation of the capability requires several instructions.
For example,
> diyone7 -arch AArch64 -variant morello DpAddrdWPCs Rfe PodRWCsP Rfe
AArch64 A { uint128_t y; uint128_t x; 0:X1=0xffffc0000:x:1; 0:X3=0xffffc0000:y:1; 1:X1=0xffffc0000:y:1; 1:X3=0xffffc0000:x:1; } P0 | P1 ; LDR X0,[C1] | LDR C0,[C1] ; SUBS X4,X0,#1 | GCTYPE X0,C0 ; AND X4,X4,#4095 | MOV X2,#1 ; SCVALUE C4,C3,X4 | STR X2,[C3] ; CSEAL C4,C3,C4 | ; MOV X5,#0 | ; MOV X6,#1 | ; SCVALUE C5,C3,X5 | ; SEAL C5,C5,C6 | ; STR C5,[C4] | ; exists (0:X0=1 /\ 1:X0=1) /\ ~fault(P0,y)