By the Veracruz Development Team: Basma El Gaabouri, Christopher Haster, Derek Miller, Dominic Mulligan, Guilhem Bryant, Nick Spinale, Hugo Vincent, Shale Xiong.
Data exists in one of three modes: in transit, at rest, and in use. Today, we understand how to protect data when in transit, that is, when it is being sent from computer to computer. This protection is achieved using protocols like the Transport Layer Security protocol (TLS), which is commonly deployed in web-browsers to protect the confidentiality and integrity of our internet traffic. Likewise, we also generally understand how to protect data when it is at rest. That is, when it is stored on a computer’s disk or similar, using standardized block ciphers like the Advanced Encryption Standard (AES), and full-disk encryption tools built around them.
However, how to protect data when in use–––that is, when data is being fed as input into a computation, in a potentially collaborative setting–––is not well understood. Cryptographers have made great strides in developing a host of techniques for protecting data when in use, for example Fully Homomorphic Encryption schemes and protocols for affecting Secure Multiparty Computations. But the unfortunate truth is that these techniques tend not to be deployed widely, barring exceptional cases. There are many reasons why this is true, but Advanced Cryptographic techniques are slow, hard-to-use, and even harder to understand. What is more, these techniques tend to be quite brittle, requiring significant amounts of reconfiguration if the underlying computation changes.
Strong Isolation Technologies pose a potentially interesting, and pragmatic, alternative to the use of pure cryptography for protecting data when in use. Here, we use the phrase Strong Isolation Technology to denote a range of hardware- and high-assurance software-based isolates. These isolates provide strong confidentiality and integrity guarantees to software, even in the face of a privileged attacker (for example, an attacker able to wield the capabilities of the Operating System or Hypervisor). Strong Isolation Technologies are also typically accompanied by a remote attestation procedure which allows third parties to reliably challenge the authenticity of an isolate, and the integrity of software loaded within it, from a potentially remote machine. Remote attestation, along with the confidentiality and integrity guarantees of isolates, allows a third party to establish an execution environment, safe from prying eyes or interference, in a known good state, on somebody else’s machine.
Veracruz is an Arm Research project exploring how novel, data-intensive distributed systems can be built using Strong Isolation Technologies and remote attestation.
Veracruz allows programmers to quickly (and easily!) design collaborative, privacy-preserving computations amongst a group of mutually mistrusting individuals, using Strong Isolation Technologies as a shared “neutral ground” within which a collaborative computation takes place. Participants in a Veracruz computation use standard transport-layer security to feed their secrets directly into the isolate after authenticating the isolate and its contents using remote attestation. Veracruz harnesses a range of strong isolation technologies–––including Arm TrustZone , AWS Nitro Enclaves, Intel SGX Secure Enclaves, and the seL4 high-assurance hypervisor – as a mechanism by which groups of collaborators can securely pool their data without necessarily revealing it to each other, or to anybody else. Once pooled inside an isolate, this data is fed as an input to a program, with the result retrievable by principals stated in a global policy file.
Whilst Veracruz aims to provide strong security and privacy guarantees to principals engaging in collaborative computation, our guarantees are naturally not as strong as those offered by Advanced Cryptography. On the other hand, Veracruz is more efficient, easier to deploy and configure, and much easier to explain as compared to pure cryptography.
Note that Veracruz can be used to affect several interesting privacy-preserving collaborative computations, including:
Let us focus on one of the use-cases mentioned above — privacy-preserving ML — and describe how Veracruz can be used to design a distributed computation that allows Alice and Bob, representatives from two competing companies, to collaborate in a delimited manner.
Specifically, Alice and Bob want to pool their private customer click-through data together to derive a more effective ML-based customer recommendation system than either could have hoped to achieve separately. Importantly, neither wish to divulge their data set to each other, nor to anybody else: the only thing that should be divulged from the computation, and only to both Alice and Bob, is the ML model learnt from their pooled data sets.
Figure 1: Alice and Bob securely provision their data, and their agreed algorithm, into the Veracruz runtime after authenticating the authenticity of the runtime, and the Isolate containing it, using remote attestation. Once the computation is complete, both Alice and Bob (and nobody else!) gets access to the learnt ML model.
To achieve this, Alice and Bob first agree on the ML algorithm to use, its parameters, and the format that their datasets must be stored in. Arbitrarily, Alice implements this algorithm, in Rust say, and divulges it to Bob for vetting. Once Bob is happy with the algorithm, the two start an isolate on a host machine and load the Veracruz runtime into it. Alice and Bob then both use a remote attestation procedure to check that the isolate has indeed been started, and that the correct Veracruz runtime has been loaded within it.
Once the isolate and its software have been authenticated using remote attestation, Alice and Bob know that the isolate is indeed genuine and contains the software that they think it does. Accordingly, the two make a secure connection to the isolate itself using TLS and provision their data sets and the ML algorithm into it. Note that the host of the machine cannot see inside or influence the behavior of the isolate, nor can they break the encryption of the TLS connection used to provision the data sets or algorithm. At this point, both data sets and algorithm are now in one place without Alice or Bob, or the host of the computation, having learnt anything that they should have. All that is left now is for the computation to trigger, and produce a result, which is then made retrievable by both Alice and Bob, again via a secure TLS link.
Veracruz is now an open source project, with all design and development discussion now taking place in public. Moreover, Veracruz was also recently adopted as a project by the Confidential Compute Consortium (CCC), an industry-led Linux Foundation consortium aiming to promote hardware-based confidential computing technologies. The Veracruz team welcomes contributions from interested third parties, and we have listed several issues in our GitHub issue tracker suitable for newcomers to the project.
To find out more about the project, and how to contribute, you can consult the following resources:
Veracruz on GitHub Questions? Contact Dominic Mulligan