Arm's CE-Software team is delighted to announce the release of Chromium M115, with experimental support for Arm’s Memory Tagging Extension (MTE). M115 represents over five years of work by Arm's engineering teams to enable this breakthrough technology, bringing better spatial and temporal memory safety to existing C and C++ codebases. Approximately 70% of Chromium’s serious security bugs are related to memory safety problems, so enabling it experimentally is an important first step for end-user security.
C and C++ extensively rely on manual memory management. Programmers request memory via the malloc, calloc, or C++ new APIs, use it to do interesting computations, and may eventually hand it back to the allocator with free. For simple programs, this approach works, but Chromium is complex. Thousands of changes land per day, with hundreds of people collaborating on refactoring and improving it. One person cannot know all of Chromium’s design or the lifecycle of all its C++ objects. Mistakes happen, and innocuous changes can accidentally introduce security issues.
There are two main kinds of memory safety issues:
Both attack vectors can be the first step in an exploit chain, eventually granting control over the Chromium web content renderer process and, perhaps, the rest of the system.
Finding vulnerabilities before they reach most users is critically important for Chromium. Chromium heavily uses custom technologies like its managed C++ garbage collector (cppgc), guard pages (GWP-ASan) and type separation in its memory allocator (PartitionAlloc), as well as traditional technologies like Clang's Address Sanitizer (ASan). MTE adds a new, powerful, defense-in-depth tool to help detect memory safety issues earlier in development. With early detection, fixes cost less to deploy, and security exploits become more difficult (and expensive) to develop.
Under Clang's AddressSanitizer, each 8-byte granule of memory given out by malloc is also backed by a byte of metadata. ASan's extra compiler pass rewrites standard memory loads and stores to check the metadata associated with an address. Accesses to non-addressable memory are caught and reported by code in the compiler runtime (compiler-rt). This technique in systems like Google's ClusterFuzz finds most memory safety issues today.
However, ASAN is too expensive to use as a mitigation. Most single loads and stores become two loads and stores, and extra comparisons and branching can majorly slow memory-intensive workloads. Another challenge is that ASAn's extra compiler pass only works on C and C++ code, meaning any low-level assembly code will not be covered. This could potentially exclude sensitive areas like audio and video codecs from protection. ASan also covers only the standard malloc and free APIs, so apps that implement custom allocators can still have bugs that ASan cannot detect. HWASan is a faster, hardware-backed alternative, but it only implements a subset of ASan's detection capabilities and has similar drawbacks.
MTE works like ASan, but memory loads and stores to tagged memory are now checked by the Arm architecture. While MTE only offers probabilistic detection, it is also far more flexible. Any one of 16 tags can be assigned to any 16-byte granule, allowing custom allocators to make different choices about using MTE to meet their design goals. MTE can also be turned on or off at runtime with a choice of three modes (synchronous, asymmetric, asynchronous) offering a trade-off of performance versus precision. Even in the debug-friendly synchronous mode, MTE is much faster than the AddressSanitizer and has much lower memory overhead. MTE’s major advantages mean that we can add ASAN-style checking to Chromium’s production memory allocator (PartitionAlloc) and switch it off and on dynamically. MTE’s capabilities also extend to additional system software (such as the Linux Kernel) that the traditional AddressSanitizer cannot reach.
Arm co-develops its architecture with its partners, most of whom have big C and C++ codebases. It’s up to Arm’s Architecture and Technology Group (ATG) to combine the ideas to create something with enough functionality and performance for everyone, which can also be feasibly made into silicon. After ATG created a first, alpha specification in 2018, Arm's software teams got started adding support vital components like Clang and the Linux kernel and our CPU Engineering team got to work adding it to our first-generation Armv9 parts. The specification reached near-final beta status in April 2018, and the Android team began adding support to jemalloc, bionic, and all the other components. Back then, we mostly focussed on figuring out any software design changes and detecting issues and ambiguities in the specification before it was finished later in 2018. Investigation work for Chromium began in 2019, alongside enablement work on the Android system and the kernel.
By 2020, Arm’s CPU division had created FPGA prototypes of the Arm Cortex-A710 and Cortex-A510 CPUs, which we used to check the likely performance characteristics of the finished systems. We spent time tuning the Scudo allocator and doing deep optimization of basic routines like strcmp and memcpy, as well as optimizing the Linux kernel to make best use of MTE.
In 2020, we began outreach to Chromium’s upstream community explaining the technology and our plans for deploying it. We also began bringing up AOSP 12's user interface via Arm's Fast Model platform and began running unit tests. We merged initial support for detecting MTE in 2020, initial functional changes to map pages in early 2021, and initial support for Chrome’s partition allocator in late 2021.
With MTE support merged, we focused on testing and deploying Arm’s other security features (PAC and BTI) to Chromium. We realized that the detection of memory safety issues was not enough. We also needed to capture a buffer of allocation and free calls so that Chromium developers could understand why a use-after-free was happening. This required some complicated changes to the way Chromium’s allocator hooks subsystem works.
Performance and security are both important to us, so we also began a programme to show the benefits of adopting profile-guided optimization (PGO) and using higher levels of compiler optimization. This outreach paid off in May with the release of M114. M114 offers a massive overall boost in performance, which more than pays for the overhead of asynchronous MTE.
For M116 and beyond, we continue to add allocation and free trace reporting to Chromium, so in-the-wild reports can be uploaded via Google’s Crashpad reporting system. We are also ensuring that MTE fits well with PartitionAlloc’s plans for their BackupRefPtr security mitigation, as well as optimizing performance on upcoming hardware. Arm continues to refine the underlying architecture and develop our next generation hardware to further improve MTE’s performance and security.
Chromium M115 is an important next step on our never-ending mission to improve the security, robustness, and integrity of every device. We now need your help to get your apps ready for MTE. Check out our MTE User Guide for Android, join the conversation on the Arm Developer Discord, and build and run a Linux example to learn more about how it works. We are incredibly excited about MTE's potential to help you ship great software more quickly, and we cannot wait to see what you build.