StickyTags

TL;DR

StickyTags is an efficient Arm MTE-based solution that mitigates bounded spatial memory errors (also known as buffer overflows, or out-of-bounds violations). Additionally, we show that MTE is vulnerable to speculative probing, where attackers can use a contention-based side channel to deduce whether or not a tag check results in a mismatch.

Buffer Overflows

The MITRE CWE (Common Weakness Enumeration) ranks “out-of-bounds writes” as the most severe software weakness of 2023 [1]. In fact, the out-of-bounds writes category was already ranked 2nd in 2020, and has been ranked 1st ever since [2]. Out-of-bounds violations, also commonly known as “buffer overflows”, introduce a wide variety of security risks, such as arbitrary code execution, denial of service, information leakage, and privilege escalation. The following code snippet displays a simple example buffer overflow:

int main() {
  char *obj = (char*) malloc(16);
  return obj[32]; // index 32 is out of bounds
}

While there have been large efforts to improve software testing (for example through bug sanitizers and automated fuzz testing), many bugs still make it to production (meaning, live deployed) systems. Anecdotally, the GWP-ASan project has already found over thirty buffer overflows in the live build of Google Chrome [3], highlighting the need for tools that can prevent (i.e., mitigate) exploitation of such vulnerabilities. However, existing post-deployment solutions have found little applicability in the field due to their high overhead. Recent reports indicate that mitigations only see real-world deployment if their performance overhead stays below 5% [4], which renders existing bounds checking solutions impractical. In response, contemporary memory error detection and mitigation systems are shifting towards hardware-assisted solutions to reduce the overhead, with Arm’s Memory Tagging Extension (MTE) being a strong contender as a building block.

Arm MTE

MTE can be described as a ‘lock’ and ‘key’ mechanism with hardware checks on memory accesses. MTE associates every memory location and every pointer with a tag, with the hardware disallowing any dereference if the pointer and memory tags do not match. The following figure shows how MTE works: each memory object has a memory tag (lock), and the pointers to the objects have corresponding pointer tags (keys). Memory tags are stored in a dedicated area of (physical) memory, while pointer tags are stored in the upper bits of pointers. The figure highlights that when pointer-1 is out-of-bounds, the hardware checks prevent pointer-1 from accessing the memory of object-2, because the pointer tag (7) does not match the memory tag (4). MTE provides a total of 16 distinct tags.

Figure 1: Overview of tagging memory with Arm MTE

Although MTE is a powerful mechanism, even state-of-the-art MTE solutions remain costly due to the need for frequent memory (re)tagging: LLVM’s MemTagSanitizer incurs average and worst-case overheads on SPEC CPU 2006 of 15.2% and 267% (respectively) just for protecting the stack alone (meaning, no heap protection). Existing MTE solutions (for example, the Scudo heap allocator, MemTagSanitizer for the stack, but also KASAN for the Linux kernel) heavily rely on random tags provided by the hardware. These solutions assign a random tag to each memory allocation, and our research shows the resulting high tagging frequency quickly becomes a performance bottleneck.

Spectre-MTE

For probabilistic MTE solutions based on random tagging (for instance., the Linux kernel), the assumption is that, even if attackers manage to hijack a tagged victim pointer (for example via a buffer overflow) to reference a target object, they cannot predict whether the tag of the target object matches the pointer tag—hindering reliable exploitation. However, even without brute-forcing capabilities, if attackers can deduce which tags are assigned at runtime, then the random source of the tags has no added benefit.

We show that attackers can find MTE pointer/memory tag matches through speculative probing. More specifically, we show attackers can use a contention-based side channel to deduce whether or not a tag check results in a violation (i.e., tag mismatch).

In summary, the contention caused by tag mismatches provides attackers with a convenient side channel to determine whether a tag mismatch occurred. Crafting probe gadgets is relatively simple: an attacker needs to trigger the target software vulnerability on a speculative path and, unlike standard (and mitigated) Spectre, observe a microarchitectural signal from any independent memory operation within the speculation window.

StickyTags

StickyTags aims to reduce the overhead of protecting memory with MTE by significantly decreasing the number of times we have to tag the memory. StickyTags achieves this by reorganizing memory into regions each containing objects of a particular size class (see Figure below). It tags memory at the first use of an object slot, allowing the tag to persist across the lifetimes of different objects allocated in the same slot. As a result, StickyTags eliminates the need for retagging memory, because of which it performs well on both the stack, where the allocation (and hence retagging) frequency is typically high, and the heap, where large allocations are not uncommon and hence retagging is costly.

Figure 2: Memory organization in StickyTags. The tags are persistent: they remain in place when new objects reuse the memory.

StickyTags employs a completely deterministic tagging layout, where we assign tags to size class slots in a round-robin fashion. This way, StickyTags protects against buffer under- and overflows bounded by the number of tags times the slot size. Within a size class the objects always use the same slot size, hence the tag layout in a region is constant. After an object is deallocated, a new object can reuse the slot while the underlying memory tags remain unchanged.

Whenever StickyTags allocates a memory object, it determines the tag to use for the pointer (i.e., the address) based on the location of the underlying memory. The key insight for tagging is that StickyTags can deterministically calculate the correct tags for all allocation pointers and memory objects such that both correspond to the same tagging layout. 

Evaluation

We evaluate the performance of StickyTags and related state-of-the-art systems using a Google Pixel 8 Pro, which is the first consumer device with MTE hardware. We compare StickyTags against the Scudo heap allocator and MemTagSanitizer’s stack instrumentation, both of which employ random tagging. We measure performance using the SPEC CPU 2006 benchmark suite. The table below displays our main findings.

SystemHeapStackBoth
StickyTags3.1%1.2%4.0%
Scudo + MemTagSanitizer5.8%15.2%20.2%
Table 1: Runtime overhead comparison between StickyTags, MemTagSanitizer, and Scudo using SPEC CPU2006 (Pixel 8 Pro)

On the heap, with 3.1% runtime overhead StickyTags provides nearly 2x higher performance than Scudo (5.8%). On the stack the improvement is even more pronounced: with only 1.2% runtime overhead, StickyTags is over 12x faster than MemTagSanitizer with 15.2% overhead. Overall, StickyTags protects the heap and stack (combined) at a runtime cost of only 4%, which can be considered suitable for protecting live systems. In comparison, combining Scudo with MemTagSanitizer results in a total overhead of 20.2%. Interesting to point out is that 95% of MemTagSanitizer’s overhead on the stack (14.4% percentage points from the total of 15.2%) comes from tagging memory, which is exactly the bottleneck StickyTags successfully relieves. 

Paper

Acknowledgements

We disclosed speculative probing of random tags to Arm, which further disclosed to affected licensees. In response, Arm published an advisory to offer guidance on the impact of speculative oracles on memory tagging.

This work was supported by Intel Corporation through the “Allocamelus” project, by NWO through project “INTERSECT” and “Theseus”, and by the European Union’s Horizon Europe programme under grant agreement No. 101120962 (“Rescale”).

References

[1] “2023 CWE Top 25 Most Dangerous Software Weaknesses“

[2] “Trends in Real-World CWEs: 2019 to 2023”

[3] V. Tsyrklevich, “GWP-ASan: Sampling heap memory error detection in-the-wild”

[4] D. Song, J. Lettner, P. Rajasekaran, Y. Na, S. Volckaert, P. Larsen, and M. Franz, “SoK: Sanitizing for security,” in IEEE S&P, 2019.