Dedup Est Machina Returns

On the Effectiveness of Same-Domain Memory Deduplication

TL;DR

In this work, we examine the effectiveness of same-domain memory deduplication, i.e., a defense deployed in modern operating systems to mitigate the memory deduplication side channel. In particular, we present two case studies that highlight one key flaw: that it is non-trivial to separate programs into separate security domains. In the first case study, we examine a client-server scenario—a scenario that inherently requires a server to read data from an untrusted client—and demonstrate that the client can control the alignment of data in memory to disclose the server’s secret data. In the second case study, we examine a recent version of Firefox (v83.0)—a browser that has undergone massive efforts to ensure that data from different origins are separated into different domains—and demonstrate that nonetheless, a malicious webpage can exploit the browser’s partial implementation of site isolation to leak secret data across tabs. We conclude that same-domain memory deduplication as a defense is difficult to implement correctly, and hence, is insufficient.

Overview

Memory deduplication is an OS memory optimization technique that merges identical pages into a single Copy-on-Write (CoW) page, to improve memory efficiency and storage space requirements of a running system. CoW pages, however, present significantly slower write times than normal pages and as such memory deduplication has been shown to be susceptible to a variety of timing side channel attacks, such as the original Dedup Est Machina attacks.

To mitigate this, operating systems, such as Windows 10 from v1903 onwards, only merge pages that are considered safe (e.g., pages filled with 0x00 or 0xff whose content will not change throughout their lifetime), and pages that are in the same security domain. Further, browser vendors piggyback on this defense mechanism by adopting site isolation, whereby each open tab resides in its own process, and hence security domain.

Such defenses rely on an attacker abusing the memory deduplication side channel in cross-process scenarios, however, separating programs into different security domains is a non-trivial task, and as such different processes may still share memory and hence security domains, thus enabling memory deduplication of potentially untrusted, attacker-controlled data.

In this work, we examine the effectiveness of same-domain memory deduplication by presenting two case studies that show how an attacker can still leverage the deduplication side channel to leak secrets. In the first case study, we examine a client-server scenario. The server reads data from an untrusted client, and the client can control the alignment of data in memory to disclose the server’s secret data. In the second case study, we examine Firefox’s partial implementation of site isolation–on versions prior to Firefox v94.0–and demonstrate how a malicious website can leak secrets across tabs.

Client-Server Case Study

Alignment Probing Primitive. We employ the same alignment probing primitive as in the Dedup Est Machina attacks, which provides an attacker with the ability to control the alignment of secret data based on the provided input. As such an attacker can incrementally leak secret data over several deduplication passes by manipulating the input provided to shift the secret data up and down in memory, and consequently in or out of the memory page.

Exploitation. For our client-server use case we utilize two local native processes which communicate via sockets. We assume that the client is attacker-controlled and can send multiple requests to the server at any given time. The server receives the client requests and allocates memory based on the attacker-provided input, whereby it stores the client input (known data) at a page-aligned location next to secret data (Figure 1). Knowing that the server stores the provided data next to secret data and that they are page-aligned, the attacker sends an initial request of 4095 known bytes. Further, due to the weak alignment properties of the server, the attacker is employed with the alignment probing primitive, which will result in the server creating the “secret” page containing the 4095 attacker-provided bytes and next to it will be the secret data.

*Figure 1. How the server stores the client-provided input*

The attacker sends multiple requests to the server containing the same 4095 bytes and 1 guessed byte at the end, spraying the memory of the server with “probe” pages (Figure 2). In this case, the server-allocated memory will contain only attacker-provided input. The attacker provides inputs to the server every 10 seconds in order to write to the “probe” pages, and measures how long it takes for the server to respond. When the response takes longer than a moving average, the attacker deduces that the deduplication mechanism combined a “probe” page with the “secret” page.

*Figure 3. Deduplication of the secret page and a probe page.*

The client then repeats the process by sending 1 less byte in the initial and probe requests and including in the known data the first leaked byte of the secret. As such the attacker performs byte-by-byte disclosure of the secret data over several deduplication passes depending on the length of the secret data (Figure 4).

*Figure 4. Example of incremental disclosure of secret data.*

Browser Cross-Tab Case Study

SharedArrayBuffer timer. Modern browsers severely crippled the precision (from 5μs to 20μs) of native JavaScript timers (e.g., performance.now()), which prevented an attacker from accurately detecting deduplication passes. To accurately measure the write operation and detect the deduplication signal, an attacker can craft a custom JavaScript timer utilizing the SharedArrayBuffer JavaScript object. SharedArrayBuffer allows two threads to share state. The first thread operates as the timer and the second thread reads the time. The timer thread utilizes Atomics to perform increment operations, and the reader thread can read the time at any point without the risk of a race condition. Using a SharedArrayBuffer timer, an attacker can still measure the time needed for an operation to complete. As such an attacker can poll the timer, perform a write operation and then poll the timer again in order to find out how long it takes to perform a write operation on a page. When the attacker sees a higher number of increments, it means that the deduplication thread combined the attacker-controlled page with a victim page.

Exploitation. We utilize one attacker tab and 8 victim tabs in Firefox. We assume that a victim visits an attacker-controlled website and all 9 tabs remain open throughout the attack. Firefox uses by default 8 content processes whereby each open tab runs its web content in one of these content processes. Only the first 8 open tabs, however, will reside in their own content process enforcing memory isolation. When a new tab opens, Firefox will arbitrarily assign it to one of the 8 content processes, which results in sharing memory with one of the 8 open tabs. As such, the attacker-controlled tab will reside in the same content process with a victim tab which also enables the deduplication of attacker-controlled memory and victim memory. By exploiting this limitation in content processes, it was possible to bypass Firefox’s partial site isolation mitigation and force deduplication of attacker and victim memory. To bypass the timer limitation, we utilize a SharedArrayBuffer timer, which allows us to create an accurate baseline for how long a write operation takes via JavaScript.

In this use case we load 8 victim tabs whereby we encode multiple fingerprints, which are large enough to be page-aligned, in secret pages using Uint8Array objects. The attacker-controlled tab creates several probe pages using Uint8Array objects containing such fingerprints and waits in order to detect a deduplication pass (Figure 5). By encoding multiple fingerprints in secret pages and encoding fingerprints in probe pages, inevitably after a deduplication pass one of the secret and probe pages will be deduplicated, depending on which content process is shared between the victim and the attacker tabs.

By writing to all attacker-controlled pages which contain the fingerprints and looking at the time needed for a write operation to complete to detect which is significantly slower, we can infer which victim tab resides in the same content process as the attacker tab, and also which probe page was deduplicated with the victim’s secret page which contains the tab’s fingerprint (Figure 6).

Figure 6. Attacker tab polls the timer, writes to a probe page and then polls the timer again to calculate how long the write operation takes; and repeats the process for all probe pages, to detect which probe page is deduplicated with the tab fingerprint.

Code

You can find the implementation of our case studies on GitHub.

Paper

Disclosure

We disclosed our findings to Microsoft on Jan 28, 2022. Firefox mitigated the issue we discovered prior to our paper submission.

Frequently Asked Questions

I use Windows. Am I affected? You can check if your Windows system supports combining memory pages via the Get-MMAgent PowerShell command. PageCombining: true means that memory pages are deduplicated on your system.
I don’t use Windows. Am I affected? Most other operating systems and hypervisors implement memory deduplication (e.g., same-page merging on Linux and KVM, transparent page merging on VMWare, page fusion on Virtual Box), and as far as we know, since they do not even deploy defenses such as same-domain memory deduplication, they are still vulnerable to the original Dedup Est Machina attack.
I don’t use Firefox <v94.0 as my browser. Am I affected? Since Firefox patched their implementation of partial site isolation in v94.0, it is no longer vulnerable. Moreover, other browsers such as Edge and Chrome already have full site isolation deployed, so they are not affected. However, our research is applicable to software beyond browsers. In particular, it applies to any program that stores trusted data alongside untrusted data.
I am an end-user. How do I mitigate this? Unfortunately, the most complete defense is to disable memory deduplication entirely (e.g., on Windows via the PowerShell command: Disable-MMAgent -PageCombining).
I develop an operating system or hypervisor that implements memory deduplication. How do I mitigate this? We suggest adopting a low-complexity, low-overhead mitigation such as VUsion).
I develop user-level software. How do I mitigate this? If possible, re-write your software to either: (1) sanitize all untrusted input, or (2) separate processes that handle untrusted data from processes that handle trusted data and ensure that they use different security domains via the IsolateSecurityDomain flag (like site isolation does in browsers). If this is not possible, then disable deduplication entirely for your program (e.g., on Windows 10 ≥v1903, via the DisablePageCombine flag in the PROCESS_MITIGATION_SIDE_CHANNEL_ISOLATION_POLICY structure).

Acknowledgements

We thank our shepherd, Alessandro Sorniotti, and the anonymous reviewers for their valuable comments. This work was supported by the EU’s Horizon 2020 research and innovation programme under grant agreement No. 825377 (UNICORE) and by Netherlands Organisation for Scientific Research through project “Intersect”. This paper reflects only the authors’ view. The funding agencies are not responsible for any use that may be made of the information it contains.

vusec