A spotlight on the XZ Utils backdoor

I’d like to focus a bit on a supply chain attack that, in my opinion, definitely did not receive the media coverage it deserved.

In late March 2024, a Microsoft engineer named Andres Freund was doing something engineers do all the time: chasing a performance anomaly that was probably nothing. SSH logins on his Debian test machine were taking around 500ms instead of the usual 100ms. Valgrind (a memory diagnostics tool) was throwing unexplained errors. Without knowing it, he was looking at one of the most sophisticated supply chain attacks ever attempted against open source infrastructure, one that had been quietly in motion for over two years, and that had very nearly reached the servers running the backbone of the internet.

What was it about?

Before we get into the attack itself, a quick primer on the two pieces of software at the center of this.

XZ Utils is a compression library. You’ve almost certainly used it without knowing it: it handles the .xz and .lzma compression formats that are baked into virtually every Linux distribution. Package managers use it, backup tools use it, system update mechanisms use it. It’s the kind of foundational utility that nobody thinks about.

SSH (Secure Shell) is a cryptographic network protocol for operating network services securely over an unsecured network. In plain terms: it’s how you talk to remote machines safely, without worrying that someone in the middle might be listening in, since SSH encrypts the entire session. When a developer connects to a production server, when a deployment pipeline pushes code, when an automated system reaches out to a cloud instance, that connection almost certainly runs over SSH.

I think it’s important to have a fundamental understanding of SSH, because it connects directly to how the backdoor worked. SSH uses a combination of asymmetric and symmetric encryption. When you first connect to a server, the two sides perform a key exchange (a mathematical handshake that lets them agree on a shared secret without ever transmitting that secret over the wire). Once they’ve established that shared secret, they switch to symmetric encryption for the rest of the session, which is much faster. The asymmetric step is used for authentication: the server proves its identity, and the client proves theirs, either via password or, more commonly in professional environments, via a public/private key pair.

OpenSSH is the dominant open source implementation of the SSH protocol (the actual software that runs on the vast majority of Linux servers in the world). It’s what listens for incoming connections, performs the key exchange, handles authentication, and opens the shell session. It’s been around since 1999, it’s been audited extensively, and it’s trusted implicitly by essentially everyone running Linux infrastructure.

The reason XZ Utils enters the picture is a detail specific to certain Linux distributions: on systems using systemd (which is most modern Linux systems), OpenSSH links against liblzma (the core library component of XZ Utils), primarily for journal compression: the journaldlog format support compressing log entries to save disk space. If you can compromise liblzma, you get code running inside the SSH process itself, which means you can intercept and manipulate authentication before any of OpenSSH’s own logic runs.

That’s what Jia Tan spent two years trying to do.

Building a reputation

On 19 October 2021, a GitHub account named JiaT75 made its first public contribution to an open source project. It was a small patch for libarchive, not XZ, not anything critical. Just a modest fix from someone who seemed to know what they were doing.

Over the following months, JiaT75 (presenting as a developer named Jia Tan) continued making legitimate contributions across multiple projects. He submitted more than 500 patches across GitHub, all of them fixing real issues users had reported.

In late October 2021, Jia Tan made their first contribution to XZ Utils, a simple .editorconfig file sent to the mailing list. Then more patches followed over the next few months, each one building a little more trust with the project’s sole maintainer, Lasse Collin.

Collin had been running XZ Utils essentially alone since creating it over a decade earlier. It was used everywhere, but like so many foundational open source projects, it ran on volunteer effort with no real organisational backing. The project had a small community (around ten active members on its IRC channel).

Jia Tan was helpful, he filed good bug reports, submitted patches that improved things, and reviewed code thoughtfully. For about a year, nothing about his involvement raised any flags.

In April 2022, Jia Tan submitted another patch to the mailing list. A previously unknown account named “Jigar Kumar” appeared and began complaining that the patch wasn’t being merged fast enough: “Patches spend years on this mailing list. There is no reason to think anything is coming soon.” Later in the same email thread, he says: “Is there any progress on this? Jia I see you have recent commits. Why can’t you commit this yourself?”.

Then, in May 2022, another new account (“Dennis Ens”) showed up asking whether XZ for Java was still maintained, citing a year of silence in the commit log. Collin apologised for the delays. And in his reply, he mentions for the first time that Jia Tan had been helping him off-list, and that Jia Tan “might have a bigger role in the future at least with XZ Utils.”

The pressure kept building. On June 7, Jigar Kumar comments: “Progress will not happen until there is new maintainer. XZ for C has sparse commit log too. Dennis you are better off waiting until new maintainer happens or fork yourself. Submitting patches here has no purpose these days. The current maintainer lost interest or doesn’t care to maintain anymore.”

Collin’s reply was: “I haven’t lost interest but my ability to care has been fairly limited mostly due to longterm mental health issues but also due to some other things. Recently I’ve worked off-list a bit with Jia Tan on XZ Utils and perhaps he will have a bigger role in the future, we’ll see.”

Dennis Ens then responded by using Collin’s own admission against him: “I am sorry about your mental health issues, but its important to be aware of your own limits […] the community desires more.”

By the end of June, after further pressure, Collin wrote again: “As I have hinted in earlier emails, Jia Tan may have a bigger role in the project in the future. He has been helping a lot off-list and is practically a co-maintainer already. :-)”

This is textbook sock puppetry: fake accounts coordinating to create the appearance of community pressure. The goal wasn’t to annoy Collin, they wanted to make him feel like the community was frustrated, and simultaneously positioning Jia Tan as the obvious solution, the attackers engineered conditions that made handing over commit access feel like it was overdue.

By October 2022, Jia Tan was added to the Tukaani organisation on GitHub. By early 2023, he was merging commits directly without review. By late 2023, Jia Tan was signing official releases and the operation had achieved its objective: a fabricated identity now had the cryptographic keys to distribute software that would end up on millions of Linux systems.

Consolidating control

Once inside, Jia Tan moved methodically. Rather than immediately planting anything malicious, he spent months continuing to make legitimate improvements.

In July 2023, Jia Tan updated XZ’s configuration with Google’s OSS-Fuzz (an automated fuzzing infrastructure that continuously tests open source projects for vulnerabilities). The update disabled ifunc support during fuzzing builds. ifunc (indirect function) is a glibc feature that allows runtime selection of optimised function implementations based on hardware capabilities. It’s also, as it would turn out, the mechanism the backdoor would use to hook into OpenSSH’s authentication process. By disabling it during automated testing, Jia Tan proactively ensured that Google’s security tooling would never catch what was coming.

Also in 2023, a separate identity appeared: Hans Jansen. This account contributed a single performance optimisation to XZ Utils (legitimate code, replacing the crc64 constructor with an ifunc resolver) and then went quiet. Nine months later, Jansen reappeared on a Debian bug report, arguing urgently that distributions should adopt the newest XZ Utils release. The timing of this reactivation would later make a lot more sense.

By January 2024, the XZ Utils project website had been migrated to a GitHub Pages subdomain under Jia Tan’s direct control. The operation had gone from an unknown GitHub account to effective ownership of the project’s public face, its release process, and its cryptographic signing keys.

XZ Utils backdoor timeline

How the backdoor actually worked — and what it could have done

In February 2024, with full infrastructure control established, Jia Tan introduced the malicious code. Because of its sofistication, I’ll go into some detail on the technique used here.

The backdoor wasn’t placed in the Git repository itself (the place anyone would look if they were auditing the code). Instead, it was hidden in the release tarballs: the compiled archives that distributions actually download and package. These tarballs contain generated build scripts and binary test files that don’t exist in version control. Anyone reviewing the Git history would see clean commits, but the distributed software would contain the backdoor.

Specifically, the malicious payload was embedded in what appeared to be binary test files (the kind of opaque blobs that are normal in any project with a test suite). The build process was modified (via a tampered build-to-host.m4 file that only existed in the tarball) to decode these files at compile time and link the resulting payload into liblzma.

The payload itself leveraged the ifunc resolver mechanism (the same one Jia Tan had quietly disabled in the fuzzer eight months earlier). At runtime, when OpenSSH initialised and linked against liblzma, the malicious resolver replaced the RSA_public_decrypt function. The replacement implementation checked incoming SSH authentication attempts for a specific cryptographic signature. Connections signed with a particular Ed448 private key (held only by the attacker) would bypass normal authentication entirely and execute arbitrary code with the privileges of the SSH daemon. The CVSS score assigned to this vulnerability was 10.0, the maximum possible.

XZ backdoor attack chain

To understand what that actually means in practice: whoever held that Ed448 private key would have had silent, authenticated, root-level access to any server running a compromised distribution, with no credentials, no log entries, and no forensic trail. The compromised versions were tracking toward Debian stable, Ubuntu LTS, Fedora, and Red Hat Enterprise Linux (the overwhelming majority of production Linux infrastructure in the world), and at the time of discovery had only reached a handful of bleeding-edge distributions. Had detection happened a few weeks later, stable releases would have shipped the backdoored library to millions of production systems worldwide.

The most immediate use would have been exfiltration, meaning having access to classified government documents, military communications, intelligence agency databases, diplomatic cables, none of it leaving any trace. Internal government and military networks are designed to be unreachable from the public internet, but they’re connected internally to servers that are. The backdoor gets you the first foothold, and from there experienced operators can move laterally through internal networks, reaching systems that were never supposed to be exposed.

Perhaps more alarming than any single act of exfiltration is what a careful attacker would have done first: nothing obvious. Before touching any sensitive data, they would quietly install secondary backdoors on the most valuable targets so that even after the vulnerability was discovered and patched, access would be maintained indefinitely. SentinelOne’s analysis found evidence that Jia Tan had already been preparing exactly this kind of follow-on infrastructure, which suggests the XZ backdoor was likely intended as an opening move rather than an end goal.

How it was caught

Back to Andres Freund and his 500ms SSH latency.

Freund was benchmarking PostgreSQL on Debian Sid (a rolling-release distribution that, crucially, had already picked up XZ Utils 5.6.x). He traced the Valgrind errors to liblzma. That was weird, since there shouldn’t be any compression going on during an SSH login. He looked at the XZ release tarballs and compared them to the Git repository, finding out that they didn’t match.

On 28 March 2024, Freund reported his findings to the Openwall Project security mailing list. He did a great reverse engineering job: he’d traced the entire attack chain from the performance anomaly to the obfuscated binary blobs to the tampered build scripts to the final payload and he didn’t publish until he understood exactly what he was looking at.

The key detail about Freund’s discovery is that it wasn’t a security audit that caught this. There was no formal review process that flagged the tampered tarballs. It was one engineer’s refusal to accept a minor performance regression as unexplained, combined with deep enough systems knowledge to follow the thread all the way down. The detection was an accident of engineering rigour, not a feature of any system designed to prevent it.

What this tells us

There are a few obvious lessons that will appear in every post-mortem about this incident: open source projects need more funding, more maintainers, more formal security review processes. All of that is true and worth saying.

But I think the more uncomfortable observation is this: the attack didn’t exploit a weakness in the code, it exploited a weakness in how open source software is actually built and maintained. The internet’s foundational infrastructure (the compression libraries, the authentication protocols, the package managers) is maintained largely by individuals operating without institutional support, driven by intrinsic motivation and a sense of responsibility to a community that mostly doesn’t know they exist.

And the trend is not moving in the right direction. The number of critical vulnerabilities has been climbing steadily, with a sharp acceleration in the last couple of months:

Number of high impact CVEs

Part of that increase reflects better tooling and more researchers looking, but a significant part reflects an increase in the attack surface: there is just more software being written every day.

What concerns me more is where this is heading. AI coding tools (Copilot, Cursor, Claude, Codex, etc.) are dramatically accelerating how much code gets written and contributed. Open source repositories are already seeing a surge in AI-assisted pull requests, issues, and patches. Most of them are legitimate. But the same property that made Jia Tan’s operation so effective is something that AI makes dramatically cheaper and more scalable. A two-and-a-half year manual infiltration campaign run by what was likely a well-resourced nation-state actor could, in principle, be replicated at far lower cost and at far greater scale with the right tooling.

I don’t think AI tools are net-negative for security, they’re also being used to find vulnerabilities, fuzz test codebases, and flag anomalous contributions. Yet at the same time, they make socially engineered supply chain attacks like the one explored above easier to perform.

References

Andres Freund — Original disclosure to the oss-security mailing list (March 29, 2024)
XZ Utils mailing list archive — xz-devel@tukaani.org
- Jia Tan’s first XZ contribution — the .editorconfig patch (October 2021)
- Jia Tan’s April 2022 patch that triggered the pressure campaign
- Jigar Kumar’s first complaint on the same thread (April 2022)
- Dennis Ens’s opening message asking if XZ for Java was maintained (May 2022)
- Lasse Collin’s reply disclosing his mental health struggles and mentioning Jia Tan (June 8, 2022)
- Dennis Ens’s response weaponising that disclosure (June 2022)
- Lasse Collin’s reply calling Jia Tan “practically a co-maintainer already” (June 29, 2022)
JiaT75’s first libarchive PR (October 2021)
Russ Cox — Timeline of the xz open source attack — the most comprehensive chronological reconstruction available
Evan Boehs — Everything I Know About the XZ Backdoor — an early and detailed community writeup
Kaspersky / Securelist — Social engineering aspect of the XZ incident — deep analysis of the sock puppet operation
Akamai — XZ Utils Backdoor: Everything You Need to Know
The Intercept — The Other Players Who Helped (Almost) Make the World’s Biggest Backdoor Hack
Wikipedia — XZ Utils backdoor — useful for cross-referencing disputed details
SoftwareSeni — The XZ Utils Backdoor CVE-2024-3094 and the Multi-Year Social Engineering Campaign Behind It
Hunted Labs — A Complete Analysis of Jia Tan’s GitHub History
Fortune — After a failed Linux backdoor attempt, open-source leaders warn of more attacks