The year 2018 will be remembered in computing history for the discovery of the Spectre and Meltdown exploits. This was underlined by the great shift in interest in the academic community, and also by the way industry course corrected their approach to securing their designs. A year in, we find ourselves with a total of 13 Spectre and 14 Meltdown variants strongly indicating how pervasively the vulnerabilities that allow these attacks have weaved themselves into modern computer systems.
All of the known exploits that belong to the Spectre and Meltdown families as of January 2019. Graph Data: Canella et al.
There are several reasons why last year’s attacks were reported so prevalently. Spectre and Meltdown both neatly demonstrate remote deployable vulnerabilities that many internet facing systems are subject to. They also revealed an area that processor designers, at least in retrospect, had not given enough thought to – namely unarchitected behavior. The two exploits from last year highlighted that non-architectural performance optimization could have a programmer-visible impact on architected computer state, and therefore can reveal details of the state of the system. This created loopholes that hackers could exploit by operating outside of the documented architecture envelope.
Even though there are probably other security vulnerabilities that are easier to exploit today, side-channel attacks are bound to become more prominent in the future and therefore require our immediate attention. Undocumented architectural corner cases can no longer be discounted by the professional computer community. Rather than treating hardware security as an optional extra, it should be considered at least as important and high priority as performance. So, a natural question arises...
As with most computer architects, my work has predominantly focused on "making things go faster". The challenge of addressing both performance and security concerns is a huge and exciting opportunity that will directly benefit billions of users in a world of increasingly more connected devices.
Our recent work tries to tackle part of this, with techniques that secure branch prediction from some vulnerabilities. The motivation behind this work is based on the following premise: What if instead of designing purely for performance, secure operation was viewed as a necessary constraint?
Designing-in security from concept is also the key premise of Arm Platform Security Architecture (PSA), a common framework for securing connected devices throughout the IoT ecosystem.
In our paper, we discuss how software patches cannot address all , and even where they do can result in unwieldy and fragile solutions. Most processors today rely heavily on structures that are shared between different contexts, i.e. programs, threads and execution privileges to deliver high performance. This sharing can result in information leakage being possible between contexts and is a feature exploited by the recent side-channel attacks.
Hardware mitigation techniques, on the other hand, seem like the preferred option. However, designers should strive to make them as universal as possible. If the approach is to fill in the cracks which are visible today, chances are we will spend an increasing amount of time filling cracks in the future. Instead, a more elegant approach is to reason about our problem. While side-channel attacks greatly vary, they all are based on the premise of leaking information. Isolation should be the basis for any secure design. Unfortunately, the reality is never that simple. As most processors today rely heavily on shared structures to deliver high performance, flushing the branch predictor state at every context switch to guarantee isolation can lead to a significant performance drop, in some cases measured as high as 40%.
Our investigation focuses on securing the branch predictor by retaining some minimal state per context while flushing the rest of the predictor. This ensures isolation per context and limits the negative effects in performance and area. The figure above shows that retaining minimal vital state (red line) improves branch predictor accuracy when compared to a system that frequently flushes the state to secure context switches (orange region). Overall this approach can recover as much as 25% of the lost performance, with minimal to no added area cost.
Whilst partial state retention cannot be considered a silver bullet it is a reasonable starting point that allows future system to establish some notion of ground truth. It provides a tangible solution that exemplifies how isolation can also coexist with performance optimizations, the way flushing or hard partitioning fail to do so. This only scratches the surface of the problem and new issues, much harder to tackle, will surface in the future. As an aspiring researcher I am hugely excited by this new challenge to develop future architectures that are both secure and performant which will make a difference to people's connected lives.
Read our paper to find out how we find the smallest part of the branch predictor to partition and more detailed performance breakdowns.
Read the paper