GhostRace

GhostRace: CVE-2024-2193

Race conditions arise when multiple threads attempt to access a shared resource without proper synchronization, often leading to vulnerabilities such as concurrent use-after-free. To mitigate their occurrence, operating systems rely on synchronization primitives such as mutexes, spinlocks, etc.

In this work, we present GhostRace, the first security analysis of these primitives on speculatively executed code paths.

Our key finding is that all the common synchronization primitives implemented using conditional branches (Figure 1) can be microarchitecturally bypassed on speculative paths using a Spectre-v1 attack, turning all architecturally race-free critical regions into Speculative Race Conditions (SRCs), allowing attackers to leak information from the target software.

Figure 1: Top part: The core implementation of the mutex_lock synchronization primitive in the Linux x86-64 kernel, with the conditional branch that can be abused to craft SRCs in red. Bottom part: The branch ultimately checks the outcome of the lock cmpxchgq instruction, which does not serialize the execution.
Figure 1: Top part: The core implementation of the mutex_lock synchronization primitive in the Linux x86-64 kernel, with the conditional branch that can be abused to craft SRCs in red. Bottom part: The branch ultimately checks the outcome of the lock cmpxchgq instruction, which does not serialize the execution.

Our GhostRace Paper (PDF) is accepted for publication at the 33rd USENIX Security Symposium 2024. This is a joint project with the Systems Security Research Group at IBM Research Europe.


Speculative Synchronization Primitives

Our analysis shows all the other common write-side synchronization primitives in the Linux kernel are ultimately implemented through a conditional branch and are therefore vulnerable to speculative race conditions.

To experimentally confirm this intuition, we tested all such synchronization primitives under speculative execution after mistraining the vulnerable branch. In all cases, we confirmed transient execution of the guarded critical region despite another victim thread already architecturally executing in the region. To determine the transient window size, we measured the maximum number of speculative load instructions we could speculatively execute inside the critical region (Figure 2).

Figure 2: Transient window size for different write-side synchronization mechanisms, i.e., the number of speculative loads that leave an observable microarchitectural trace.
Figure 2: Speculative window size for different write-side synchronization mechanisms, i.e., the number of speculative loads that leave an observable microarchitectural trace.

SCUAF Gadget Scanner

To investigate the severity of SRCs, we concentrate on Speculative Concurrent Use-After-Free (SCUAF) and statically scan the Linux kernel with Coccinelle (Figure 3), discovering 1,283 potentially exploitable gadgets.

Figure 3: Simplified Cocci scripts (left Free and right Use) scanning for SCUAF gadgets in the Linux kernel.
Figure 3: Simplified Cocci scripts (left Free and right Use) scanning for SCUAF gadgets in the Linux kernel.

IPI Storming: CVE-2024-26602

To win an SRC, we need to interrupt the execution of the victim process at the right point (i.e., when the dangling pointer is created), and keep the victim there forever so that the attacker can perform the SCUAF attack. In order to achieve this, we created a new exploitation technique called Inter-Process Interrupt (IPI) Storming, which consists of infinitely flooding the victim process’s CPU core with IPIs once interrupted so that it never finishes handling the incoming interrupts, resulting in creating an unbounded exploitation window that allows the attacker to execute an arbitrary number of SCUAF invocations to mount an end-to-end attack within a single race window. In Figure 4 we show how the increasing number of storming SMTs widens the UAF exploitation window.

Figure 4: Size of the UAF exploitation window vs. number of IPI storming cores targeting the victim core.
Figure 4: Size of the UAF exploitation window vs. number of IPI storming cores targeting the victim core. Our test CPU contains 16 cores and 24 SMTs.

SCUAF Information Disclosure Attacks

Furthermore, we show that SCUAF information disclosure attacks (Figure 5) on the kernel are feasible and can match the reliability of typical Spectre attacks, with our proof of concept leaking kernel memory at 12 KB/s.

Figure 5: Speculative information disclosure attack exploiting a speculative race condition. Steps 1-4 and 8-10 run in user mode, issuing syscalls to trigger the relevant kernel code. The other steps run in kernel mode. Our gadget scanner identified the nfc_hci_msg_tx_work function as a SCUAF gadget in the Linux kernel.
Figure 5: Speculative information disclosure attack exploiting a speculative race condition. Steps 1-4 and 8-10 run in user mode, issuing syscalls to trigger the relevant kernel code. The other steps run in kernel mode. Our gadget scanner identified the nfc_hci_msg_tx_work function as a SCUAF gadget in the Linux kernel.

Code

You can find a minimalistic PoC exemplifying the concept of SRC in a step-by-step single-threaded fashion, Coccinelle SCUAF-scanning scripts, and 1200+ SCUAF gadgets found in the Linux kernel at https://github.com/vusec/ghostrace

Affected Hardware & Software

While we have explicitly focused on x86 and Linux in the paper, SRCs also affect other hardware and software targets.

Hardware: We have confirmed that all the major hardware vendors are affected by SRCs since, regardless of the particular compare-and-exchange instruction implementation, the conditional branch that follows is subject to branch (mis)prediction. In other words, all the microarchitectures affected by Spectre-v1 are also affected by SRCs.

Software: Any target relying on conditional branches to determine whether to enter critical regions—a common design pattern that extends well beyond Linux—is vulnerable to SRCs.

In summary, any software, e.g., operating system, hypervisor, etc., implementing synchronization primitives through conditional branches without any serializing instruction on that path and running on any microarchitecture (e.g., x86, ARM, RISC-V, etc.), which allows conditional branches to be speculatively executed, is vulnerable to SRCs. As in other speculative execution attacks, this allows leaking data from the target software.

Mitigation

To address the new attack surface, we also propose a generic SRC mitigation to serialize all the affected synchronization primitives on Linux (i.e., adding an lfence instruction after the lock cmpxchq in Figure 1). Our mitigation requires minimal kernel changes (i.e., 2 LoC) and incurs only ≈5% geomean performance overhead on LMBench.

Disclosure

We disclosed Speculative Race Conditions to the major hardware vendors (Intel, AMD, ARM, and IBM) and the Linux kernel in late 2023.

Hardware vendors have further notified other affected software (OS/hypervisors) vendors, and all parties have acknowledged the reported issue (CVE-2024-2193). Specifically, AMD responded with an explicit impact statement (i.e., “existing [Spectre-v1] mitigations apply”), pointing to the attacks relying on conditional branch mis-speculation, like Spectre-v1.

The Linux kernel developers have no immediate plans to implement our proposed serialization of synchronization primitives due to performance concerns. However, they confirmed the IPI storming issue (CVE-2024-26602) and implemented an IPI rate-limiting feature to address the CPU saturation issue by adding a synchronization mutex on the path of sys_membarrier and avoiding its concurrent execution on multiple cores. Unfortunately, as our experiments show (Figure 4), hindering IPI storming primitives (i.e., 0 storming cores) is insufficient to close the attack surface completely.

Acknowledgments

We would like to thank the anonymous reviewers for their feedback, Andrew Cooper for his early comments on the paper, Julia Lawall for the Coccinelle clarifications, and Alessandro Sorniotti for the early discussions about the project. This work was partially supported by Intel Corporation through the “Allocamelus” project, by the Dutch Research Council (NWO) through project “INTERSECT”, and by the European Union’s Horizon Europe program under grant agreement No. 101120962 (“Rescale”).