The myth of tainted blockchain addresses [part I]

[Full disclosure: This blogger is not an attorney and what follows is not legal advice.]

Unsolicited gifts on chain

In the aftermath of the OFAC sanctions against Tornado Cash, it has become an article faith in the cryptocurrency community that banning blockchain addresses sets a dangerous precedent. Some have argued that blacklisting Tornado addresses and everyone who interacts with them will have dangerous downstream effects due to the interconnectedness of networks. Funds flow from one address to another, the argument goes, often merging with unrelated pools of capital before being split off again. Once we decide one pool of funds are tainted by virtue of being associated with a bad actor or event—a scam, rug-pull or garden-variety theft—that association propagates unchecked and continues to taint funds belonging to innocent bystanders who were not in any way involved with the original crime. As if to illustrate the point, shortly after the ban imposed on US residents from interacting with the Tornado mixer, some wisecrack decided to use that very mixer to send unsolicited funds to prominent blockchain addresses. These were either addresses with unusually high balances (eg “whales” in industry parlance) or previously tagged as belonging to celebrities or well-known cryptocurrency businesses such as exchanges. Here is an unsolicited “donation” sent to the Kraken cold-wallet through the Tornado mixer.

That raises the question: are these unwitting recipients also in violation of OFAC sanctions? Are all funds in those wallets now permanently tainted because of an inbound transaction, a transaction they neither asked for or had any realistic means to prevent given the way blockchains operate? With a few exceptions, anyone can send funds from their own address to any other address on most blockchains; the recipient cannot prevent this. Granted, there are a few special cases where the recipient can limit unwanted transfers. For example, Algorand requires the recipient to opt-in to supporting a specific ASA before they can receive assets of that type. But that does not in any way prevent free transfer of the native currency ALGO. Ethereum smart-contracts make it possible to take action on incoming transfers and reject them based on sender identity. Of course, this assumes the recipients have a way to identify “bad” addresses. Often such labels are introduced after the offending address has been active and transacting. Even if there was a 100% reliable way to flag and reject tainted transfers, requiring that would place an undue burden on every blockchain participant to implement expensive measures (including the use of smart-contracts and integrating with live data feeds of currently sanctioned addresses, according to all possible regulators around the world) to defend against a hypothetical scenario that few will encounter.

Given that inbound transfers from blacklisted addresses can not be prevented in any realistic setup, does that mean blacklisting Tornado Cash also incidentally blacklists all of these downstream recipients by association? While compelling on its face, this logic ignores the complexity of how distributed ledgers track balances and adopts one possible convention among many plausible ones for deciding how to track illicit funds in motion. This blog post will argue that there are equally valid conventions that make it easier to isolate funds associated with illicit activity and prevent this type of uncontrolled propagation of “taint” through the network. To make this case, we will start with the simple scenario where tainted funds are clearly isolated from legitimate financial activity and then work our way up to more complex situations where commingling of funds requires choosing a specific convention for separation.

Easy case: simple Bitcoin transactions

UTXO model

The Bitcoin network makes it easy to separate different pools of money within an address, because the blockchain organizes funds into distinct lumps of assets called “unspent transaction outputs” or UTXO. The concept “balance of address” does not exist natively in the Bitcoin ledger; it is a synthetic metric created by aggregating UTXO that all share the same destination address.

Conceptually, one can speculate if this was a consequence of the relentless focus on privacy Satoshi advocated. The Bitcoin whitepaper warns about the dangers of address reuse, urging participants to only use each address once. In this extreme model, it does not make sense to track balances over time, since each address only appears twice on the blockchain. First when funds are deposited at that address, temporarily creating a non-zero balance. The second and last reference occurs when funds are withdrawn, after which point the balance will always be zero. This is not how most bitcoin wallets operate in reality. Address reuse is common and often necessary for improving operational controls around funds movement. Address whitelisting is a very common security feature used to restrict transfers to known, previously defined trusted destinations. That model can only scale if each participant has a handful of fixed blockchain addresses such that all counterparties interacting with that person can record those entries in their whitelist of “safe” destinations.

For these reasons it is convenient to speak of an address “balance” as a single figure and draw charts depicting how that number varies over time. But it is important to remember that single number is a synthetic creation representing an aggregate over discrete, individual UTXOs. In fact the same balance may behave differently depending on the organization of its spend candidates. Consider these two addresses:

First one is comprised of a thousand UTXOs each worth 0.00001 BTC
Second one is a single 0.01 BTC UTXO.

On paper, both addresses have a balance of “0.01 bitcoin.” In reality, the second address is far useful for commercial activity. Recall that each bitcoin transaction pays a mining fee proportional to the size of the transaction in bytes—not proportional to the value transacted, as most payment networks operate. Inputs typically account for the bulk of transaction size due to the presence of cryptographic signatures, even after accounting for the artificial discount introduced by segregated witness. That means scrounging together dozens of inputs is less efficient than using a single output to supply a given amount. In the extreme case of “dust outputs,” the mining fees required to include a UTXO as input may exceed the amount of funds that input contributes. Including the UTXO would effectively be a net negative. Such a UTXO is economically unusable unless network fees decline.

Isolating tainted funds

This organization of funds into such distinct lumps makes it easy to isolate unsolicited contributions. Every UTXO stands on its own. If funds are sent from a sanctioned actor, the result is a distinct UTXO that stands apart on the bitcoin ledger from all the other UTXOs sharing the recipient address. That UTXO should be considered tainted in its entirety, subject to asset freeze/seizure or whatever remedy the powers-that-be deem appropriate for the situation. Everything else is completely free of taint.

One way to implement this is for the wallet software to exclude such UTXOs when calculating available balances or picking available UTXOs to prepare a new outbound transaction. The wallet owner effectively acts as if that UTXO did not exist This model extends naturally to downstream transfers. If the tainted UTXO is used as input into another subsequent bitcoin transaction (perhaps because the wallet owner did not know it was tainted) it will end up creating another tainted UTXO while leaving everything else untouched.

Mixed bag: transactions with mismatched inputs/outputs

The previous section glossed over an important aspect of bitcoin transactions: they can have multiple inputs and outputs. The more general case is illustrated by transaction C here:

Transaction graph example from the Bitcoin wiki

Consider an example along the lines of transaction C, but using a different arrangement of inputs and outputs:
Inputs:

First input of clean 5 BTC
Second one worth 1 BTC originating from a sanctioned address

Outputs:

First output is designated to receive 3 BTC
Second output receives 2.999BTC (leaving 0.001 BTC in mining fees)

Due to the different amounts, there is no way to map inputs/outputs in one-to-one relationship. There is a total of 1 tainted bitcoin on the input side that must be somehow passed through to outputs. (There is of course the question of where the mining fees came from. It would be convenient to argue that those were paid for using tainted funds to reduce the total. But here we will make the worst-case assumption: tainted funds must be propagated 100% to outputs. Mining fees are assumed to come out of the “clean” portion of inputs.)

A problem of convention

Clearly there are different ways taint can be allocated among the outputs within those constraints. Here are some examples:

Allocate evenly, with 50% assigned to each output. This will result in partial taint of both outputs.
Allocate 100% to the first output. That output is now partially tainted while the remaining output is clean. (More generally, we may need to allocate to multiple outputs until the entire tainted input is fully accounted for. If the first input of 5BTC had been the one from a sanctioned address, it would have required both outputs to fully cover the amount.)
Same as #2 but select the tainted outputs in a different order. The adjectives “first” and “second” are in reference to the transaction layout on blockchain, where inputs and outputs are strictly ordered. But for the purposes of tracking tainted funds, we do not have to follow the same order. Here are some other reasonable criteria:
FIFO or first-in-first-out. Match inputs and outputs in order. In this case since the first output can be paid out of the first clean input entirely, it is considered clean. But the second output requires the additional tainted 1 BTC so it is partially tainted.
Highest balance first. To reduce the number of tainted outputs, use the outputs in decreasing order until the tainted input is fully consumed.

Regardless of which convention is adopted, one conclusion stands: No matter how one slices and dices the outputs, there are scenarios where some UTXO will be partially tainted, even if the starting state follows the all-or-none characterization. Previous remedies to quarantine or otherwise exclude that UTXO in its entirety from usable assets are no longer appropriate.

Instead of trying to solve for this special case in bitcoin, we look at how the comparable situation arises in Ethereum, which differs from Bitcoin in a crucial way: it does not have the concept of UTXO. Here the concept of “balance” is native to the blockchain and associated with every address. That means the neat separation of funds into discrete “clean” and “tainted” chunks cannot possibly work in Ethereum, forcing us to confront this problem of commingled funds in a broader context.

[continued]

Remote attestation: from security feature to anticompetitive lock-in

Lessons from the first instant-messaging war

In the late 1990s and early 2000s instant messaging was all the rage. A tiny Israeli startup Mirabilis set the stage with ICQ but IM quickly become a battle ground of tech giants, running counter to the usual dot-com era mythology of small startups disrupting incumbents on the way to heady IPO valuations. AOL Instant Messenger had taken a commanding lead from the gate while MSFT was living up to its reputation as “fast-follower” (or put less charitably, tail-light chaser) with MSN Messenger. Google had yet to throw its hat into the arena with GChat. These IM networks were completely isolated: an AOL user could only communicate with other AOL users. This resulted in most users having to maintain multiple accounts to participate in different networks, each with their barrage of notifications and task-bar icons.

While these companies were locked in what they viewed as zero-sum game for marketshare, the benefits of interoperability to consumers were clear. In fact one software vendor called Trillion even made a multiple-network client that effectively aggregated the protocols for all the different IM services. Standards for interoperability such as SIP and XMPP were still a ways off from becoming relevant; everyone invented their own client/server protocol for instant messaging, and expected to provide both sides of the implementation from scratch. But there was a more basic reason why some IM services were resistant to adopting an open standard: it was not necessarily good for the bottom line. Interop is asymmetric: it helps the smaller challenger compete against the incumbent behemoth. If you are MSN Messenger trying to win customers away from AOL, it is a selling point if you can build an IM client that can exchange messages with both MSN and AOL customer. Presto: AOL users can switch to your application, still keep in touch with their existing contacts while becoming part of the MSN ecosystem. Granted the same dynamics operate in the other direction: in principle AOL could have built an IM client that connected its customers with MSN users. But this is where existing market shares matter: AOL has more to lose from by allowing such interoperability and opening itself up to competition with MSN, compared to keeping its users locked up in the wallet garden.

Not surprisingly then they did go to of their way to keep each IM service an island unto its own. Interestingly for tech giants, this skirmish was fought in code instead of the more common practice of lawyers exchanging nastygrams. AOL tried to prevent any client other than the official AIM client from connecting to its service. You would think this is an easy problem: after all, they control the software on both sides. They could ship a new IM client that includes a subtle, specific quirk when communicating with the IM server. AOL servers would in turn look for that quirk and reject any “rogue” clients missing that quirk.

White lies for compatibility

This idea runs into several problems. A practical engineering constraint in the early 2000s was the lack of automatic software updates. AOL could ship a new client but in those Dark Ages of software delivery, “ship” meant uploading the new version to a website— itself a much heralded improvement from the “shrink-wrap” model of actually burning software on CDs and selling them in a retail store. There was no easy way to force-upgrade the entire customer base. If the server insisted on enforcing the new client fingerprint, they would have to turn away a large percent of customers running legacy versions or make them jump through hoops to download the latest version— and who knows, maybe some of those customers would decide to switch to MSN in frustration. That problem is tractable and ultimately solved with better software engineering. Windows Update and later Google Chrome made automatic software updates into a feature customers take for granted today. But there is a more fundamental problem with attempting to fingerprint clients: competitors can reverse-engineer the fingerprint and incorporate it in their own software.

This may sound vaguely nefarious but software impersonating other pieces of software is in fact quite common for compatibility. In fact web browsers practically invented that game. Take a look at the user-agent string early versions of Internet Explorer sent every website:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; SLCC2; .NET CLR 2.0.50727; Media Center PC 6.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729)

There is a lot of verbiage here but one jumps out: Mozilla. That is the codename used by Netscape Navigator in its own user agent strings. The true identity of this browser is buried in a parenthetical comment—“MSIE6.0” for Internet Explorer 6— but why is IE trying to pretend to be Netscape? Because of compatibility. Web pages were designed assuming a set of browser features that could not be taken for granted— such as support for images and JavaScript. Given the proliferation of different web browsers and versions at the time— see point about lack of automatic updates— websites used a heuristic shortcut to determine if visitors were using an appropriate browser. Instead of trying to check for the availability of each feature, they began to check for a specific version of a known browser. “Mozilla/4.0” was a way to signal that the current web browser could be treated as if it were Netscape 4. Instead of turning away users with an unfriendly message along the lines of “Please install Netscape 4 to use this site” the service could assume all the requisite features were present and proceed as usual.

These white-lies are ubiquitous on the web because compatibility is in the interests of everyone. Site publishers just want things to work. Amazon wants to sell books. With the exception of a few websites closely affiliated with a browser vendor, they do not care whether customers use Netscape, IE or Lynx to place their orders. There is no reason for websites to be skeptical about user-agent claims or run additional fingerprinting code to determine if a given web browser was really Netscape 4 or simply pretending to be. (Even if they wanted to such fingerprinting would have been difficult; for example IE often aimed for bug-for-bug compatibility with Netscape, even when that meant diverging from official W3C standards.)

Software discrimination and the bottom line

For reasons noted above, the competitive dynamics of IM were fundamentally different than web browsers. Most of the business models built around IM assumed full control over the client stack. For example, MSN Messenger floated ideas of making money by displaying ads in the client. This model runs into problems when customers run a different but interoperable client: Messenger could connect to the AOL network— effectively using resources & generating costs for AOL— while displaying ads chosen by MSN, earning revenue for MSFT.

Not surprisingly this resulted in an escalating arms-race. AOL included more and more subtle features into AIM that the server could use for fingerprinting. MSFT attempted to reverse engineer that functionality out of the latest AIM client and incorporate identical behavior into MSN Messenger. It helps that the PC platform was and to a large extent still is very much open to tinkering. Owners can inspect binaries running on their machine inspect network communications originating from a process or attach a debugger to a running application to understand exactly what that app is doing under specific circumstances. (Intel SGX is an example of a recent hardware development on x86 that breaks that assumption. It allows code to run inside protected “enclaves” shielded from any debugging/inspection capabilities from an outside observer.)

In no small measure of irony, the Messenger team voluntarily threw in the towel on interoperability when AOL escalated the arms-race to a point MSFT was unwilling to go: they deliberately included a remote code execution vulnerability in AIM intended for AOL servers to exploit. Whenever a client connect, the server would exploit the vulnerability to execute arbitrary code to look around the process and check on the identity of the application. Today such a bug would earn a critical severity rating and associated CVE if it were discovered in an IM client. (Consider that in the 1990s most Internet traffic was not encrypted, so it would have been much easier to exploit that bug; the AIM client had very little assurance that it was communicating with the legitimate AOL servers.) If it were alleged that a software publisher deliberately inserted such a bug into an application used by millions of people, it would be all over the news and possibly result in the responsible executives being dragged in front of congress for a ritual public flogging. In the 1990s it was business-as-usual.

Trusted computing and the dream of remote attestation

While the MSN Messenger team may have voluntarily hoisted the white-flag on that particular battle with AOL, a far-more powerful department within the company was working to make AOL’s wishes come true: a reliable solution to verifying the authenticity of software running on a remote peer, preferably without playing a game of chicken with deliberately introduced security vulnerabilities. This was the Trusted Computing initiative, later associated with the anodyne but awkward acronym NGSCB (“Next Generation Secure Computing Base”) though better remembered by its codename “Palladium.”

The lynchpin of this initiative was a new hardware component called the “Trusted Platform Module” meant to be included as an additional component on the motherboard. TPM was an early example of system-on-a-chip or SoC: it had its own memory, processor and persistent storage, all independent of the PC. That independence meant the TPM could function as a separate root of trust. Even if malware compromises the primary operating system and gets to run arbitrary code in kernel mode— the highest privilege level possible—it still can not tamper with the TPM or alter security logic embedded in that chip.

Measured boot

While the TPM specification defined a kitchen sink of functionality ranging from key management (generate and store keys on-board TPM in non-extractable fashion) to serving as a generic cryptographic co-processor, one feature stood out for use in securing the integrity of the operating system during the boot process: the notion of measured boot. At a high level, the TPM maintained a set of values in RAM dubbed “platform configuration register” or PCRs. When the TPM is started, these all start out at zero. What distinguishes PCRs is the way they are updated. It is not possible to write an arbitrary value into a PCR. Instead the existing value is combined with the new input and run through a cryptographic hash function such as SHA1; this is called “extending” the PCR in TCG terminology. Similarly it is not possible to reset the values back to zero, short of restarting the TPM chip, which only happens when the machine itself is power-cycled. In this way the final PCR value becomes a concise record of all the inputs that were processed through that PCR. Any slight change to any of the inputs or even changing the order of inputs results in a completely different value with no discernible relationship to the original.

This enables what TCG called the “measurement of trust” during the boot process, by updating the PCR with measurements of all code executed. For example, the initial BIOS code that takes control when a machine is first powered on updates PCR #0 with a hash of its own binary. Before passing control to the boot sector on disk, it records the hash of that sector in a different PCR. Similarly the early-stage boot loader first computes a cryptographic hash of the OS boot-loader and updates a PCR with that value, before executing the next stage. In this way, a chain of trust is created for the entire boot process with every link in the chain except the very first one recorded in some PCR before that link is allowed to execute. (Note the measurement must be performed by the predecessor. Otherwise a malicious boot-loader could update the PCR with a bogus hash instead of its own. Components are not allowed to self-certify their code; it must be an earlier piece of code that performs the PCR update before passing control.)

TCG specifications define the conventions for what components are measured into what PCR. These are different between legacy BIOS and the newer UEFI specifications. Suffice it to say that by the time a modern operating system boots, close to a dozen PCRs will have been extended with a record of the different components booted:

So what can be done with this cryptographic record of the boot process? While these values look random, they are entirely deterministic. Assuming the exact same system is powered on for two different occasions, identical PCR values will result. For that matter, if two different machines have the exact same installation— same firmware, same version of the operating system, same applications installed— it is expected that their PCRs will be identical. These examples hint at two immediate security applications:

Comparison over time: verify that a system is still in the same known-good state it was at a given point in the past. For example we can record the state of PCRs after a server is initially provisioned and before it is deployed into production. By comparing those measurements against the current state, it is possible to detect if critical software has been tampered with.
Comparison against a reference image: Instead of looking at the same machine over time, we can also compare different machines in a data-center. If we have PCR measurements for a known-good “reference image,” any server in healthy state is expected to have the same measurements in the running configuration.

Interestingly neither scenario requires knowing what the PCRs are ahead of time or even the exact details of how PCRs are extended. We are only interested in deltas between two sets of measurements. Since PCRs are deterministic, for a given set of binaries involved in a boot process we can predict ahead of time exactly what PCR values should result. There is a different use-case when those exact value matters: ascertaining whether a remote system is running a particular configuration.

Getting better at discrimination

Consider the problem of distinguishing a machine running Windows from one running Linux. These operating systems use a different boot-loader and the hash of that boot-loader gets captured into a specific PCR during measured boot. The value of that PCR will now act as a signal of what operating system is booted. Recall that each step in the boot-chain is responsible for verifying the next link; a Windows boot-loader will not pass control to a Linux kernel image.

This means PCR values can be used to prove to a remote system that you are running Windows or even running it in a particular configuration. There is one more feature required for this: a way to authenticate those PCRs. If clients were allowed to self-certify their own PCR measurements, a Linux machine could masquerade as a Windows box by reporting the “correct” PCR values expected after a Windows boot. The missing piece is called “quoting” in TPM terminology. Each TPM can digitally sign its PCR measurements with a private-key permanently bound to that TPM. This is called the attestation key and it is only used for signing such proofs unique to the TPM. (The other use case is certifying that some key-pair was generated on the TPM, by signing a structure containing the public key.) This prevents the owner from forging bogus quotes by asking the TPM to sign random messages.

This shifts the problem into a different plane: verifying the provenance of the “alleged” attestation, namely that it really belongs to a TPM. After all anyone can generate a key-pair and sign a bunch of PCR measurements with a worthless key. This is where the protocols get complicated and kludgy, partly because TCG tried hard placate privacy advocates. If every TPM had a unique, global AK for signing quotes, that key could be used as a global identifier for the device. TPM2 specification instead creates a level of indirection: there is an endorsement key (EK) and associated X509 certificate baked into the TPM at manufacture time. But EK is not used to directly sign quotes; instead users generate one or more attestation keys and prove that specific AK lives on the same TPM as the EK, using a challenge-response protocol. That links the AK to a chain of trust anchored in the manufacturer via the X509 certificate.

The resulting end-to-end protocol provides a higher level of assurance than is possible with software-only approaches such as “health agents.” Health agents are typically pieces of software running inside the operating system that perform various checks (check if latest software updates have been applied, firewall is enabled, no listening ports etc.) and report the results. The problem is those applications rely on the OS for their security. A privileged attacker with administrator rights can easily subvert the agent by feeding bogus observations or forging a report. Boot measurements on the other are implemented by firmware and TPM, outside the operating system and safe against any interference by OS-level malware regardless of how far it has escalated its privileges.

On the Internet, no one knows you are running Linux?

The previous example underscores a troubling link between measured boot and platform lock-in. Internet applications are commonly defined in terms of a protocol. As long as both sides conform to the protocol, they can play. For example XMPP is an open instant-messaging standard that emerged after the IM wars of the 1990s. Any conformant XMPP client following this protocol can interface with an XMPP server written according to the same specifications. Of course there may be additional restrictions associated with each XMPP server—such as begin able to authenticate as a valid user, making payments out-of-band if the service requires one etc. Yet these conditions exist outside of the software implementation. There is no a priori reason an XMPP client running on Mac or Linux could not connect to the same service as long as the same condition are fulfilled: the customer paid their bill and typed in the correct password.

With measured boot and remote attestation, it is possible for the service to unilaterally dictate new terms such as “you must be running Windows.” There is no provision in XMPP spec today to convey PCR quotes, but nothing stops MSFT from building an extension to accommodate that. The kicker: that extension can be completely transparent and openly documented. There is no need to rely on security through obscurity and hope no one reverse-engineers the divergence from XMPP. Even with full knowledge of the change, authors of XMPP clients for other operating systems are prevented from creating interoperable clients.

No need to stop with the OS itself. While TCG specs reserve the first few PCRs for use during the boot process, there are many more available. In particular PCRs 8-16 are intended for the operating system itself to record other measurements it cares about. (Linux Integrity Measurement Architecture or IMA does exactly that.) For example the OS can reserve a PCR to measure all device drivers loaded, all installed applications or even the current choice of default web browser. Using Chrome instead of Internet Explorer? Access denied. Assuming attestation keys were set up in advance and the OS itself is in a trusted state, one can provide reliable proof of any of this criteria to a remote service and create a walled-garden that only admits consumers running approved software.

The line between security feature and platform lock-in

Granted none of the scenarios described above have come to pass yet— at least not in the context of general purpose personal computers. Chromebooks come closest with their own notion of remote verification and attempt to create walled-gardens that limit accessibility only to applications running on a Chromebook. Smart-phones are a different story: starting with the iPhone, they were pitched as closed, blackbox appliances where owners had little hope of tinkering. De facto platform lock-in due to “iOS only” availability of applications is very common for services that are designed for mobile use in-mind. This is the default state of affairs even when the service provider is not making any deliberate attempts to exclude other platforms or use anything heavyweight along the lines of remote attestation.

This raises the question: is there anything wrong with a service provider restricting access based on implementation? The answer depends on the context.

Consider the following examples:

Enterprise case. An IT department wants to enforce that employees only connect to the VPN from a company issued device (Not their own personal laptop)
Historic instant messaging example. AOL wants to limit access to its IM service to users running the official AIM client (Not a compatible open-source clone or the MSN Messenger client published by MSFT)
Leveraging online services to achieve browser monopoly. Google launches a new service and wants to restrict access only to consumers running Google Chrome as their choice of web-browser

It is difficult to argue with the first one. The company has identified sensitive resources— it could be customer PII, health records, financial information etc.— and is trying to implement reasonable access controls around that system. Given that company-issued devices are often configured to higher security standards than personal devices, it seems entirely reasonable to mandate that access to these sensitive systems only take place from the more trustworthy devices. Remote attestation is a good solution here: it proves that the access is originating with a device in known configuration. In fact PCR quotes are not the only way to get this effect; there are other ways to leverage the TPM to similar ends. For example, TPM specification allows generating key-pairs with a policy attached saying the key is only usable when the PCRs are in a specific state. Using such a key as the credential for connecting to the VPN provides an indirect way to verify the state of the device. Suppose employees are expected to be running a particular Linux distribution on their laptop. If they boot that OS, the PCR measurements will be correct and the key will work. If they install Windows on their system and boot that, PCR measurements will be different and their VPN key will not work. (Caveat: This is glossing over some additional risks. In a more realistic setting, we have to make sure VPN state can not be exported to another device after authentication or for that matter, a random Windows box can not SSH into the legitimate Linux machine and use its TPM keys for impersonation.)

By comparison, the second case is motivated by strategic considerations. AOL deems interoperability between IM clients a threat to its business interests. That is not an unreasonable view: interop gives challengers in the market a leg up against entrenched incumbents, by lowering switching costs. At the time AOL was the clear leader, far outpacing MSN and similar competitors in number of subscribers. The point is AOL is not acting to protect its customers privacy or save them from harm; AOL is only trying to protect the AOL bottom line. Since IM is offered as a free service, the only potential sources of revenue are:

Advertising
Selling data obtained by surveilling users
Other applications installed with the client

The first one requires absolute control over the client. If an MSN Messenger user connects to the AOL network, that client will be displaying ads selected by Microsoft, not AOL. In principle the second piece still works as long as the customer is using AIM: every message sent is readable by AOL, along with metadata such as usage frequency and IP addresses used to access the service. But a native client can collect far more information by tapping into the local system: hardware profile, other applications installed, even browsing history, depending on how unscrupulous the vendors are (Given that AOL deliberately planted a critical vulnerability, there is no reason to expect they would stop shy of mining navigation history.) The last option also requires full control over the client. For example if Adobe were to offer AOL 1¢ for distributing Flash with every install of AIM, AOL could only collect this revenue from users installing the official AIM client, not interoperable ones that do not include Flash bundled. In all cases AOL stands to lose money if people could access the IM service without running the official AOL client.

The final hypothetical is a textbook example of leveraging monopoly in one business—online search for Google— to gain market share in another “adjacent” vertical, by artificially bundling two products. That exact pattern of behavior was at the heart of the DOJ antitrust lawsuit against MSFT in the late 1990s, alleging that the company illegally used its Windows monopoly to handicap Netscape Navigator and gain unfair advantage for Internet Explorer in market share. Except that by comparison the Google example is even more stark. While it was not a popular argument, some rallied to MSFT’s defense by pointing out that the controls of an “operating system” are not fixed and web browsers may one day be seen as an integral component, no different than TCP/IP networking. (In a delightful irony, Google itself proved this point later by grafting a lobotomized Linux distribution around the Chrome web-browser to create ChromeOS. This was an inversion of the usual hierarchy: instead of being yet another application included with the OS, the browser is now the main attraction that happens to include an operating system as bonus.) There is no such case to be made about creating a dependency between search engines in the cloud and web browsers used for accessing them. If Google resorted to using technologies such as measured-boot to enforce that interdependency— and in fairness, it has not, this remains a hypothetical at the time of writing— the company would be adding to a long rap-sheet of anticompetitive behavior that placed it in the crosshairs of regulators on both sides of the Atlantic.

An exchange is a mixer, or why few people need Tornado Cash

The OFAC sanctions against the Ethereum mixer Tornado Cash have been widely panned by the cryptocurrency community as an attack on financial privacy. This line of argument claims that Tornado has legitimate uses (never mind that its actual usage appears to be largely laundering the proceeds of criminal activity) for consumers looking to hide their on-chain transactions from prying eyes. The problem with this argument is that the alleged target audience already has access to mixers that work just as well as Tornado Cash for most scenarios and happen to be a lot easier to use. Every major cryptocurrency exchange naturally functions as a mixer— and for the vast majority of consumers, that is a far more logical way to improve their privacy on-chain compared to interacting with a smart-contract.

Lifecycle of a cryptocurrency trade

To better illustrate why a garden-variety exchange functions—inadvertently—as a mixer, let’s look at the lifecycle of a typical trade. Suppose Alice wants to sell 1 bitcoin under her own self-custody wallet for dollars and conversely Bob wants to buy 1 bitcoin for USD. Looking at the on-chain events corresponding to this trade:

Alice sends her 1 bitcoin into the exchange. This is an unusual aspects of trading cryptocurrency: there are no prime brokers involved and all trades must be prefunded by delivering the asset to the exchange ahead of time. This is an on-chain transaction, with the bitcoin moving from Alice’s wallet to a new address controlled by the exchange.
Similarly Bob must deliver his funds in fiat, via ACH or wire transfers.
Alice and Bob place orders on the exchange order book. The matching engine pairs those trades and executes the order. This takes place entirely off-chain, only updating the internal balances assigned to each customer.
Bob withdraws the proceeds of the trade. This is an on-chain transaction with 1 bitcoin moving from an exchange-controlled address to one designated by Bob.
Similarly Alice can withdraw her proceeds by requesting an ACH or wire transfer to her own bank account.

Omnibus wallet management

One important question is the relationship between the exchange addresses involved in steps #1 and #4. Alice must send her bitcoin to some address owned by the exchange. In theory an exchange could use the same address to receive funds from all customers. But this would make it very difficult to attribute incoming funds. Recall that an exchange may be receiving deposits from hundreds of customers originating from any number of bitcoin addresses at any given moment. Each of those transaction. A standard bitcoin transaction does not have a “memo” field where Alice could indicate that a particular deposit was intended for her account. (Strictly speaking, it is possible to inject extra data into signature scripts. However that advanced capability is not widely supported by most wallet applications and in any case would require everyone to agree on conventions for conveying sender information, not just for Bitcoin but for every other blockchain.

This is where the concept of dedicated deposit addresses come into play. Typically exchanges assign one or more unique addresses to each customer for deposits. Having distinct deposit addresses provides a clean solution to the attribution problem: any incoming funds to one of Alice’s deposit addresses will always be attributed to her and result in crediting her balance on the internal exchange ledger. This holds true regardless of where the deposit originated from. For example, she could share her deposit address for a friend and the friend could send bitcoin payments directly to Alice’s address. Alice does not even have to alert the exchange that she is expecting a payment: any blockchain transfer to that address are automatically credited to Alice.

(Aside: Similar attribution problems arise for fiat deposits. ACH attribution is relatively straightfoward since it is initiated by the customer through the exchange UI; in other words, it is a “pull” approach. But wire transfers pose a problem since there is no such thing as per-customer bank accounts. All wires are delivered to a single bank account associated with the exchange. Commonly this is solved by having customers provide wire IDs to match incoming wires to the sender.)

Incoming and outgoing

Where things get interesting is when Bob is withdrawing his newly purchased 1 bitcoin balance. While it is tempting to assume that 1 bitcoin must come from Alice’s original deposit address where she sent her funds, this is not necessary. Most exchanges implement a commingled “omnibus” wallet where funds are not segregated per customer on-chain. When Alice executes a trade to sell her bitcoin to Bob, that transaction takes place to entirely off-chain. The exchange makes an update to its own internal ledger, crediting and debiting entries in a database recording how much of each asset every customer owns. That trade is not reflected on-chain. Funds are not moved from an “Alice address” into a “Bob address” each time trades execute.

This is motivated by efficiency concerns: blockchains have limited bandwidth and moving funds on-chain costs money in the form of miner fees. Settling every trade on-chain by redistributing funds between addresses would be prohibitively expensive. Instead, the exchange maintains a single logical wallet that holds funds for all its customers. The allocation of funds among all these customers is not visible on chain; it is tracked on an internal database.

A corollary of this is that when a customer requests to withdraw their cryptocurrency, that withdrawal can originate from any address in the omnibus wallet. Exchange addresses are completely fungible. In the example above, while Bob “bought” his bitcoin from Alice—in the sense that his buy order executed against a corresponding sell order from Alice—there is no guarantee that his withdrawal of proceeds will originate from Alice’s address. Depending on the blockchain involved, different strategies can be used to satisfy withdrawal requests in an economical manner. In the case of bitcoin complex strategies are required to manage “unspent transaction outputs” or UTXO in an efficient manner. Among other reasons:

It is more efficient to supply a single 10BTC input to serve a 9BTC withdrawal, instead of assembling nine different inputs of one bitcoin each. (More inputs → larger transaction → higher fees)
Due to long confirmation times on bitcoin, exchanges will typically batch withdrawals. That is, if 9 customers each requesting 1 bitcoin, it is more economical to broadcast a single transaction with a 10BTC input and 9 outputs each going to one customer, as opposed to nine distinct transactions with one input/output.

In short, there is no relationship between the original address where incoming funds arrive and the final address which appears as the sender of record when those funds are withdrawn after a trade.

Coin mixing by accident

This hypothetical example tracked the life cycle of a bitcoin going through a trade between Alice and Bob. But the same points about omnibus wallet management also apply to a single person. Consider this sequence of events:

Alice deposits 1 bitcoin into the exchange
At some future date she withdraws 1 bitcoin

While the first transaction is going into one of her unique deposit addresses, the second one could be coming out of any address in the exchange omnibus wallet. It looks indistinguishable from all other 1 bitcoin withdrawals occurring around the same time. As long as Alice uses a fresh destination address to withdraw, external observes cannot link the deposit and withdrawal actions. In effect the exchange “mixed” her coins by accepting bitcoin that was known to be associated with Alice and spitting out an identical amount of bitcoin that is not linked to the original source on-chain.

In other words, an exchange with an omnibus wallet also functions as a natural mixer.

Centralized vs decentralized mixers

How favorably that mixer compares to Tornado Cash depends on the threat model. The main selling points of Tornado Cash are trustless operation and open participation.

Tornado is implemented as a set of immutable smart-contracts on Ethereum. Those contracts are designed to perform one function and exactly one function: mix coins. There is no leeway in the logic. It cannot abscond with funds or even refuse to perform the designated function. There is no reliance on the honest behavior of a particular counterparty. This stands in stark contrast to using a centralized exchange— those venues have full custody over customer funds. There is no guarantee the exchange will return the funds after they have been deposited. It could experience a security breach resulting in theft of assets. Or it could deliberately choose to freeze customer assets in response to a court order. Those possibilities do not exist for a decentralized system such as Tornado.
Closely related is that privacy is provided by all other users taking advantage of the mixer around the same time. The more transactions going through Tornado, the better each transaction is shielded among the crowd. Crucially, there is no single trusted party able to deanonymize all users, regardless of how unpopular the usage. By contrast, a centralized exchange has full visibility into fund flows. It can “connect the dots” between incoming and outgoing transactions.
There are no restrictions on who can interact with Tornado smart contract. Meanwhile centralized exchanges typically have an onboard flow and may impose restrictions on sign-ups, such as only permitting customers from specific countries or requiring proof of identity to comply with Know-Your-Customer regulations.

Reconciling the threat model

Whether these theoretical advantages translate into a real difference for a given customer depends on the specific threat model. Here is a concrete example from CoinGecko defending for legitimate uses of Tornado:

“For instance, a software employee paid in cryptocurrency and is unwilling to let their employer know much about their financial transactions can use Tornado Cash for payment. Also, an NFT artist who has recently made a killing and is not ready to draw online attention can use Tornado Cash to improve their on-chain privacy.”

CoinGecko article

The problem with these hypothetical examples is they assume all financial transactions occur in the hermetically sealed ecosystem of cryptocurrency. In reality, very few commercial transactions can be conducted in cryptocurrency today—and those are primarily in Bitcoin using the Lightning Network, where Tornado is of exactly zero value since it operates on the unrelated Ethereum blockchain. The privacy-conscious software developer still needs an off-ramp from Ethereum to a fiat currency such as US dollars. That means an existing relationship with an exchange that allows digital assets for old fashioned fiat. (While it is possible to trade ether for stablecoins such as Tether or USDC using permissionless decentralized exchanges, that still does not help. The landlord and the utility company expect to get paid in real fiat, not fiat equivalents.)

Looked another way, the vast majority of cryptocurrency holders already have an existing relationship with an exchange because that is where they purchase and custody their cryptocurrency in the first place. For these investors, using one of those exchanges as a mixer to improve privacy is the path of least resistance. While there have been notable failures of exchanges resulting in loss of customer funds—FTX being a prominent example—it is worth noting that the counterparty exposure is much more limited for this usage pattern. Funds are routed through an exchange wallet temporarily, not custodied long term. There is a limited time-window when the exchange holds the funds, until they are withdrawn in one or more transactions to new blockchain addresses that are disconnected from the original source. If anything, a major centralized exchange will afford more privacy from external observers due to its large customer base and ease of use, compared to the difficulty of interacting with Tornado contracts through web3 layers such as Metamask. While the customer has no privacy against the exchange, this is not the threat model under consideration: recall the above excerpt refers to a software developer trying to shield their transactions from their employer who pays their salary in cryptocurrency. That employer does not have any more visibility into what goes on inside the exchange than they have into say personal ATM or credit-card transactions for their employees. (In an extra-paranoid threat model where we are concerned about say Coinbase ratting on its customers, one is always free to choose a different, more trustworthy exchange or better yet mix coins through a cascade of multiple exchanges, requiring collusion among all of them to link inputs and outputs.)

That leaves Tornado Cash as a preferred choice only for a niche group of users: those who are unable to onboard with any reputable exchange (because they are truly toxic customers eg OFAC sanctioned entities) or those operating under the combination of a truly tin-foil-hat threat model (“no centralized exchange can be trusted, they will all embezzle funds and disclose customer transactions willy-nilly…”) and an abiding belief that all necessary economic transactions can be conducted on a blockchain without ever requiring an off-ramp to fiat currencies.

Immutable NFTs with plain HTTP

Ethereal content

One of the recurring problems with NFT digital art has been the volatility of storage. While the NFT recording ownership of the artwork lives on a blockchain such as Ethereum, the content itself—the actual image or video—is usually too large to keep on chain. Instead there is a URL reference in the NFT pointing to the content. In the early days those were garden-variety web links. That made all kinds of shenanigans possible, some intended others not:

Since websites can go away for good (because the domain is not renewed) the NFT could disappear for good.
Alternatively the website could still be around but its contents can change. There is no rule that says some link such as https://example.com/MyNFT will always return the same content. The buyer of an NFT could find that the artwork they purchased has morphed. It could even be different based on time of day or the person accessing the link. (This last example was demonstrated in a recent stunt arguing that Web3 is not decentralized at all, by returning deliberately different image when the NFT is accessed through OpenSea.)

IPFS, Arweave and similar systems have been proposed as a solution to this problem. Instead of uploading NFTs to a website which may go out of business or start returing bogus concept, they are instead stored on special distributed systems. In this blog post we will describe a proof-of-concept for approximating the same effect using vanilla HTTPS links.

Before diving into the implementation details, we need to distinguish between two different requirements behind the ambiguous goal of “persistence:”

1. Immutability

2. Censorship resistance

The first one states that the content does not change over time. If the image looked a certain way when you purchased the NFT, it will always look that way when you return to view it again. (Unless of course the NFT itself incorporates elements of randomness, such as an image rendered slightly different each time. But even in that scenario, the algorithmic model for generating the image itself is constant.)

The second property states that the content is always accessible. If you were able to view the NFT once, you can do so again in the future. It will not disappear or become unavailable due to a system outage.

This distinction is important because each can be achieved independently of the other. Immutability alone may be sufficient for some use cases. In fact there is an argument to be made that #2 is not a desirable requirement in the absolute sense. Most would agree that beheading videos, CSAM or even copyrighted content should be taken down even if they were minted as an NFT.

To that end we focus on the first objective only: create an NFT that is immutable. There is no assurance that the NFT will be accessible at all times, or that it cannot be permanently taken down if enough people agree. But we can guarantee that if you can view the NFT, it will always be this particular image or that particular movie.

Subresource Integrity

At first it looks like there is already a web-standard that solves this problem out of the box: subresource integrity or SRI for short. With SRI one can link to content such as a Javascript library or a stylesheet hosted by an untrusted third-party. If that third-party attempts to tamper with the appearance and functionality of your website by serving an altered version of the content—for example a back-doored version of the Javascript library that logs keystrokes and steals passwords—it will be detected and blocked from loading. Note that SRI does not guarantee availability: that website may still have an outage or it may outright refuse to serve any content. Both of those events will still interfere with the functioning of the page; but at least the originating site can detect this condition and display an error. From a security perspective that is a major improvement over continuing to execute logic that has been corrupted (undetected) by a third-party.

Limitations & caveats

While the solution sketched here is based on SRI, there are two problems that preclude a straightforward application:

SRI only works inside HTML documents.
SRI only applies to link and script elements. Strictly speaking this is not a limitation of the specification, but the practical reality of the extent most web-browsers have implemented the spec.

To make the first limitation more concrete, this is how a website would include a snippet of JS hosted by a third-party:

<script src="https://example.com/third-party-library.js"
integrity="sha256-xzKeRPLnOjN6inNfYWKfDt4RIa7mMhQhOlafengSDvU="
crossorigin="anonymous">

That second attribute is SRI at work. By specifying the expected SHA256 hash of the Javascript code to be included in this page, we are preventing the third-party from serving any other code. Even the slightest alteration to the script returned will be flagged as an error and prevent the code from executing.

It is tempting to conclude that this one trick is sufficient to create an immutable NFT (according to the modest definition above) but there are two problems.

1. There is no “short-hand” version of SRI that encodes this integrity check in the URL itself. In an ideal world one could craft a third-party link along the lines of:

https://example.com/code.js#/script[@integrity=‘sha256-xzKeRPLnOjN6inNfYWKfDt4RIa7mMhQhOlafengSDvU=']”

This (entirely hypothetical) version is borrowing syntax from XPath, combining URIs with an XML-style query language to “search” for an element that meets a particular criteria, in this case having a given SHA256 hash. But as of this writing, there is no web standard for incorporating integrity checks into the URI this way. (The closest is an RFC for hash-links.) For now we have to content ourselves with specifying the integrity as an out-of-band HTML attribute of the element.

2. As a matter of browser implementations, SRI is only applied to specific types of content; notably, javascript and stylesheets. This is consistent across Chrome, Firefox and Edge. Neither images or iframes are covered. That means even if we could somehow solve the first problem, we can not link to an “immutable” image by using an ordinary HTML image tag.

Emulating SRI for images

Working around both of these limitations requires a more complicated solution, where the document is built up in stages. While it is not possible to make a plain HTTPS URL immutable due to limitation #1 in SIR, there is one scheme that supports immutability by default. In fact all URLs of this type are always immutable. This is the “data” scheme where the content is inlined; it is in the URL itself. Since no content is retrieved from an external server, this is immutable by definition. Data URLs can encode an HTML document, which serves as our starting point or stage #1. The URL associated with the NFT on-chain will have this form.

In theory we could encode an entire HTML document, complete with embedded images, this way. But that runs into a more mundane problem: blockchain space is expensive and the NFT URL lives on chain. That calls for minimizing the amount of data stored within the smart-contract, using only the minimal amount of HTML to boostrap the intended content. In our case, the specific HTML document will follow a simple template:

<!DOCTYPE html> <html> <head> <script src="https://example.com/stage2.js"
integrity="sha256-xzKeRPLnOjN6inNfYWKfDt4RIa7mMhQhOlafengSDvU="
crossorigin="anonymous">
</script>
</head>
</html>

This is just a way of invoking stage #2, which is a chunk of bootstrap JavaScript hosted on an external service and made immutable using SRI. If that hosting service decides to go rogue and start returning different content, the load will fail, leaving the user starting at a blank page. But the hosting service cannot successfully cause altered javascript to execute, because of the integrity check enforced by SRI.

Stage #2 itself is also simple. It is a way of invoking stage #3, where the actual content rendering occurs.

var contents='” … contents of stage #3 HTML document … “;
document.write(contents);

This replaces the current document by new HTML from the string. The heavy lifting takes place after the third stage has loaded:

It will fetch additional javascript libraries, using SRI to guarantee that they cannot be tampered with.
In particular, we pull in an existing open-source library from 2017 to emulate SRI for images, since the NFT is an image. This polyfill library supports an alternative syntax for loading images, with the URL and expected SHA256 hash specified as proprietary HTML attributes.
Stage #3 also contains a reference to the actual NFT image. But this image is not loaded using the standard <img src=”…”> syntax in HTML; that would not be covered by SRI due to the problem of browser support discussed above.
Instead, we wait until the document has rendered and kick-off custom script that invokes the JS library to do a controlled image load, comparing the content retrieved by XmlHttpRequest against the integrity check to make sure the server returned our expected NFT.
If the server returned the correct image, it will be rendered. Otherwise a brusque modal dialog appears to inform the viewer that something is wrong.

Putting it all together, here is a data URL encoding an immutable NFT:

data:text/html;charset=utf-8;base64,PCFET0NUWVBFIGh0bWw+CjxodG1sPgogIDxoZWFkPgogICAgPHNjcmlwdCBzcmM9Imh0dHBzOi8vd3d3LmlkZWVzZml4ZXMuaW8vaW1tdXRhYmxlL3N0YWdlMi5qcyIKCSAgICBpbnRlZ3JpdHk9InNoYTI1Ni1YSlF3UkFvZWtUa083eE85Y3ozaExrZFBDSzRxckJINDF5dlNSaXg4MmhVPSIKCSAgICBjcm9zc29yaWdpbj0iYW5vbnltb3VzIj4KICAgIDwvc2NyaXB0PgogIDwvaGVhZD4KPC9odG1sPgo=

We can also embed it on other webpages (such as NFT marketplaces and galleries) using an iframe. as in this example:

Embedded NFT viewer

Chrome does not allow navigating the top-level document to a data URL, requiring indirection through the iframe. In this case the viewer itself must be trusted, since it can cheat by pointing the iframe at a bogus URL instead of the correct scheme printed above. But such corruptions are only “local” since other honest viewers will continue to enforce the integrity check.

What happens if the server hosting the image were to replace our hypothetical motorcycle NFT by a different picture?

Linking to the image with a plain HTTPS URL will display the corrupted NFT:

But going through the immutable URL above will detect the tampering attempt and not render the image:

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Random Oracle

Building and breaking systems

Month: December 2022