Why Study Consistency Models in Distributed Storage Systems?

August 17, 2020

7 minute read time.

Cloud computing has become popular. A storage subsystem is a key component in many cloud computing infrastructures, and many systems have adopted so-called “NoSQL” databases, where data is often organized in a key-value structure. Adoption of these databases is driven by the need to store unstructured data, such as pictures, videos, or documents.

Similar to traditional single-machine relational databases, a transactional storage model is still the default in NoSQL databases. However, for the purposes of availability, performance, and fault tolerance, storage may now be replicated on many servers distributed around the world. Each replica would be allowed to independently process transactions without first synchronizing with other replicas, in certain situations. This replicated model leads to weak consistency models - analogous to Arm’s weak memory consistency model [1, 2, 3] - such as parallel snapshot isolation and causal consistency.

Weak consistency permits observable behaviors forbidden by serializability, the usual behavior of a transactional database operating on a single machine. Despite extensive available documentation for programmers detailed weak behaviors, working with these consistency models can be a difficult task, since most programmers are typically only familiar with transactions under serializability. Moreover, many well-known weak consistency models are originally defined with reference implementations, and implementations over different configurations may refer to the same model. This hinders understanding, makes comparing all the different behavioral models more difficult, and creates a risk of inadvertent data-loss.

In this early-stage research work with Andrea Cerone, Azalea Raad and Philippa Gardner, I propose an operational semantics - a precise, mathematical description of how a storage system should behave - that is parametric in the choice of the consistency model.

Our semantics

We use a multi-versioning abstract key-value store (kv-store) to model the overall state of a database. An example, kv-store is given in Figure 1, which contains one key, k, with two versions carrying values 0 and 1 respectively. Each version is associated with metadata, namely the transaction identifier writing the version, for example, t₀, and a set of transaction identifiers reading the version, for example {t₁}. This metadata is used when defining consistency models later. Formally, a kv-store is a mapping from keys to lists of versions, for example, K(k) = [ (0, t₀, {t₁}), (1, t₁, Ø) ].

Figure 1: An example kv-store

In our semantics, we focus exclusively on client observations, that is, what somebody interacting with a storage system can observe.

Figure 2: An example that two clients having different views on a kv-store.

Each client has a partial view, v, on this abstract kv-store. For example, in Figure 2, client B is aware of both values of the kv-store, while client A is only able to observe the leftmost. Formally, a view is a mapping from keys to sets of indices in the kv-store, for example, v_B(k) = { 0, 1 }.

Our kv-store abstraction hides details of any particular storage implementation, making it applicable to a wide-range of different storage technologies: for example, the two versions carrying values 0 and 1 presented in Figure 2, may in fact be hosted on different shards, or even duplicated among several replicas. In the latter case, client A and B, may connect to distinct replicas that experience temporary network partition, or similar; this can be represented by different views on the abstract kv-store in our semantics.

Henceforth, we use an increment transaction on this kv-store, inc(k), as a running example to explain our semantics in intuitive terms. This is defined by:

transaction { x = read(k); write(k, x+1); }

Suppose, now, that both clients, A and B, invoke inc(k).

As a transaction is atomic, by definition, we assume that client B executes inc(k) to completion and collects its effects. Specifically, client B reads value 1 of key k (the latest value in client B’s view) and writes back 2; this can be represented as a set, F_B = { read(k,1), write(k,2) }, which we call a fingerprint.

The particular consistency model in use now determines if client B with view v_B is allowed to commit fingerprint to the kv-store in what we call the execution test. In this particular case where view v_B includes all existing versions of k, the execution test passes, yielding a new kv-store and a view for client B. This is presented in Figure 3.

Figure 3: Client B commits increment transaction.

However, Figure 3 captures the fact that this update is not yet observable by client A, who still observes only the leftmost value.

Suppose now that client A also executes inc(k) given their view, and as a result, client A wants to commit the fingerprint, F_A = { read(k,0), write(k,1) }, to the database. What happens now varies according to the consistency model in use. In causal consistency, client A’s fingerprint will pass the check and yield a new kv-store shown in Figure 4. Of note here is that the increment from client B is lost, as the final value of k is 1 instead of 3 after two increments. This is known as a lost update anomaly and may be undesirable, depending on the application.

Figure 4: Lost update anomaly

In contrast, when it comes to parallel snapshot isolation where the lost update anomaly is forbidden, the execution test fails, and client A cannot commit their fingerprint to the database. Instead, client A must advance their view to include all values prior to executing inc(k). As a result, client A reads 2 and writes back 3.

What does an execution test look like? An execution test states that, prior to updating, the initial view, v, must be closed with respect to a relation, R, defined over the initial kv-store, K, and the fingerprint, F. Specifically, the relation R is a combination of other, more primitive relations, which we call WW, WR, RW, and SO. Here:

The Write-Read relation from t₀ to t₁, written (t₀, t₁) ∈ WR(k), means that t₀ directly reads from t₁.
The Write-Write relation from t₀to t₂, written (t₀, t₂) ∈ WW(k), means that t₂ installs a new value of a key previous written by t₀.
The Read/Write relation from t₁ to t₂, written (t₁, t₂) ∈ RW(k), means that t₁ reads an old value of a key that is overwritten by t₂later.
Last, the Session Order relation, SO, is encoded into the transaction identifiers.

Figure 5 gives a graphical intuition of the WW, WR, RW relations on a kv-store:

Figure 5: WW, WR, and RW relations (values are omitted)

In some consistency models, the fingerprint is used to project the general relation on a kv-store to specific keys. We now explain the notion of closed with respect to a relation R. Let take Figures 5 and R = WW as an example. A view, v, is closed with respect to WW can be described as:

If a transaction t is the writer of a value included in view v, e.g. transaction t₂ in Figure 5
If another transaction, t₀, is related to t₂ in WW, as (t₀, t₂) ∈ WW(k), then any value written by transaction t₀ must be included in view v.

In the case of Figure 5, this means that if a view includes the second value of key k, then it must also include the first version of k under the relation R = WW. Note that a transaction may update several keys. This means that an edge on a key, e.g. (t₀, t₂) ∈ WW(k), may affect the view on a different key.

Given this, we can now define the execution tests for consistency models, for example, causal consistency. Causal consistency (CC) can be captured by the relation R_CC = WR ∪ SO. Client A’s view, v_A, in Figure 3 is closed with respect to R_CC= WR ∪ SO. Let apply the two conditions above. View v_A contains (indicated by the arrow) only one value written by t₀ (condition 1) and there is no other transaction, t, such that (t, t₀) ∈ R_CC (condition 2). Hence, this view is immediately closed with respect to R_CC= WR ∪ SO.

Given these formal semantics, programmers can derive the behaviors of their code executing under weak consistency models, and database implementers can check if their implementations are correct. We have shown a few examples of these two tasks via hand-writing proofs. In future works, we are planning to develop tools in order to work with real-world examples.

A conference paper describing this work will be presented at ECOOP 2020, and the pre-print can be found here (distributed under Creative Commons License CC-BY 3.0, ECOOP 2020). Full details of our semantics and execution tests for different consistency models, including parallel snapshot isolation, consistency prefix, snapshot isolation and serializability, are provided. If you have any questions about the work, please reach out to me.

Contact Shale Xiong

Read the Full Paper

If you are interested in hearing more from our security team at Arm Research, look at our other blogs, ranging from securing home IoT devices to hardware side-channel mitigations.

References

[1] Memory Model Tool

[2] A working example of how to use the herd7 Memory Model Tool

[3] How to generate litmus tests automatically with the diy7 tool

Research Articles

HOL4 users' workshop 2025

Hrutvik Kanabar

Tue 10th - Wed 11th June 2025. A workshop to bring together developers/users of the HOL4 interactive theorem prover.
- March 24, 2025
TinyML: Ubiquitous embedded intelligence

Becky Ellis

With Arm’s vast microprocessor ecosystem at its foundation, the world is entering a new era of Tiny ML. Professor Vijay Janapa Reddi walks us through this emerging field.
- November 28, 2024
To the edge and beyond

Becky Ellis

London South Bank University’s Electrical and Electronic Engineering department have been using Arm IP and teaching resources as core elements in their courses and student projects.
- November 5, 2024

Research Articles

Why Study Consistency Models in Distributed Storage Systems?

Our semantics

Read more

References

HOL4 users' workshop 2025

TinyML: Ubiquitous embedded intelligence

To the edge and beyond