From TPM quotes to QR codes: surfacing boot measurements

The Trusted Platform Module (TPM) plays a critical role in measured boot: verifying that the state of a machine after booting into the operating system. This is accomplished by a chain of trust rooted in the firmware, with each link in the chain recording measurements of the next link in the TPM before passing control to its successor. For example, the firmware measures option ROMs before executing each one, then the master-boot record (for legacy BIOS) or GPT configuration (for EFI) before handing control over to the initial loader such as the shim for Linux, which in turn measures the OS boot-loader grub. These measurements are recorded in a series of platform-configuration registers or PCRs of the TPM. These registers have the unusual property that consecutive measurements can only be accumulated; they can not be overwritten or cleared until a reboot. It is not possible for a malicious component coming later in the boot chain to “undo” a previous measurement or replace it by a bogus value. In TPM terminology one speaks of extending a PCR instead of writing to the PCR, since the result is a function of both existing value present in the PCR— the history of all previous measurements accumulated so far— and current value being recorded.

At the end of the boot process, we are left with a series of measurements across different PCRs. The question remains: what good are these measurements?

TPM specifications suggest a few ideas:

  • Remote attestation. Prove to a third-party that we booted our system in this state. TPMs have a notion of “quoting”— signing a statement about the current state of the PCRs, using an attestation key that is bound to that TPM. The statement also incorporates a challenge chosen by our remote peer, to prove freshness of the quote. (Otherwise we could have obtained the quote last week and then booted a different image today.) As side-note: remote attestation was one of the most controversial features of the TPM specification, because it allows remote peers to discriminate based on software users are running. 
  • Local binding of keys. TPM specification has an extensive policy language around controlling when keys generated on a TPM can be used. In the basic case, it is possible to generate an RSA key that can only be used when a password is supplied. But more interestingly, key usage can be made conditional on the value of a persistent counter, the password associated with a different TPM object (this indirection allows changing password all at once on multiple keys) or specific values of PCRs. Policies can also be combined using logical conjunctions and disjunctions.

PCR policies are the most promising feature for our purposes. hey allow binding a cryptographic key to a specific state of the system. Unless the system is booted into this exact state— including the entire chain from firmware to kernel, depending on what is measured— that key is not usable. This is how disk-encryption schemes such as Bitlocker and equivalent DIY-implementations built on LUKS work: the TPM key encrypting the disk is bound to some set of PCRs. More precisely, the master key that is used to encrypt the disk is itself “wrapped” using a TPM key that is only accessible when PCRs are correct. Upshot of this design is that unless the boot process results in the same exact PCR measurements, disk contents can not be decrypted. (Strictly speaking, Bitlocker uses another way to achieve that binding. TPM also allows defining persistent storage areas called “NVRAM indices.” In the same way usage policies can be set on PCR, NVRAM indices can be associated with an access policy such that their contents are only readable if PCRs are in a given state.)

To see what threats are mitigated by this approach, imagine a hypothetical Bitlocker-like scheme where PCR bindings are not used and a TPM key exists that can decrypt the boot volume on a laptop without any policy restrictions. If that laptop is stolen and an adversary now has physical access to the machine, she can simply swap out the boot volume with a completely different physical disk that contains a Linux image. Since that image is fully controlled by the attacker, she can login and run arbitrary code after boot. Of course that random image does not contain any data from the original victim disk, so there is nothing of value to be found immediately. But since the TPM is accessible from this second OS, attacker can execute a series of commands to ask the TPM to decrypt the wrapped-key from the original volume. Absent PCR bindings, the TPM has no way to distinguish between the random Linux image that booted and the “correct” Windows image associated with that key.

The problem with PCR bindings

This security against unauthorized changes to the system comes at the cost of fragility: any change to PCR values will render TPM objects unusable, including those that are “honest.” Firmware itself is typically measured into PCR0 and if a TPM key is bound to that PCR, it will stop working after the upgrade. In TPM2 parlance, we would say that it is no longer possible to satisfy the authorization policy associated with the object. (In fact, since firmware upgrades are often irreversible to avoid downgrading to vulnerable versions, that policy is likely never satisfiable again.) The extent of fragility depends on selected PCRs and frequency of expected changes. Firmware upgrades are infrequent, they are increasingly integrated with OS software update mechanism such as fwupd on Linux. On the other hand, Linux Integrity Measurement Architecture or “IMA” feature measures key operating system binaries into PCR10. That measurement can change frequently with kernel and boot-loader upgrades. In fact since IMA is configurable in what gets measured, it is possible to configure it to measure more components and register even minor OS configuration tweaks. There is intrinsic tension between security and flexibility here: the more OS components are measured, fewer opportunities left for an attacker to backdoor the system unnoticed. But it also means fewer opportunities to modify that system since any change to a measured component will brick keys bound to those measurement.

There are some work-arounds for dealing with this fragility. In some scenarios, one can deal with an unusable TPM key by using an out-of-band backup. For example, LUKS disk encryption supports multiple keys. In case the TPM key is unavailable, the user can still decrypt using a recovery key. Bitlocker also supports multiple keys but MSFT takes a more cautious approach, recommending that full-disk encryption is suspended prior to firmware updates. That strategy does not work when the TPM key is the only valid credential enabling a scenario. For example when an SSH or VPN key is bound to PCRs and the PCRs change, those credentials need to be reissued.

Another work-around is using wildcard policies. TPM2 policy authorizations can express very complex statements. For example wildcard policies allow an object to be used as long as an external private-key signs a challenge from the TPM. Similarly policies can be combined using logical AND and OR operators, such that a key is usable either based on correct PCR values or a wildcard policy as fallback. In this model, decryption would normally use PCR bindings but in case the PCRs have changed, some other entity would inspect the new state and authorize use of the key if those PCRs look healthy.

Surfacing PCR measurements

In this proof-of-concept, we look at solving a slightly orthogonal problem: surfacing PCR measurements to the owner for that person make a trust decision. That decision may involve providing the wildcard authorization or something more mundane, such as entering the backup passphrase to unlock their disk. More generally, PCR measurements on a system can act as a health check, concisely capturing critical state such as firmware version, secure-boot mode and boot-loader used. Users can then make a decision about whether they want to interact with this system based on these data points.

Of course simply displaying PCRs on screen will not work. A malicious system can simply  report the expected healthy measurements while following a different boot sequence. Luckily TPMs already have a solution for this, called quoting. Closely related to remote attestation, a quote is a signed statement from the TPM that includes a selection of PCRs along with a challenge selected by the user. This data structure is signed using an attestation key, which in turns is related to the endorsement key that is provisioned on the TPM by its manufacturer. (The endorsement key comes with an endorsement certificate baked into the TPM to prove its provenance, but it can not be used to sign quotes directly. In an attempt to improve privacy, TCG specifications complicated life by requiring one level of indirection with attestation keys, along with an interactive challenge/response protocol to prove relationship between EK and AK.) These signed quotes can be provided to users after boot or even during early boot stages as datapoint for making trust decisions.

Proof-of-concept with QR codes

There are many ways to communicate TPM quotes to the owner of a machine: for example the system could display it as text on screen, write out the quote as regular file on a removable volume such as USB drive or leverage any network interface such as Ethernet, Bluetooth or NFC to communicate them. QR codes have the advantage of simplicity in only requiring a display on the quoting side and a QR scanning app on the verification side. This makes it particularly suited to interactive scenarios where the machine owner is physically present to inspect its state. (For non-interactive scenarios such as servers sitting in a datacenter, transmitting quotes over the network would be a better option.)

As a first attempt, we can draw the QR code on the login background screen. This allows the device owner to check its state after it has booted but before the owner proceeds to enter their credentials. The same flow would apply for unlocking the screen after sleep/hibernation state. First step is to point the default login image. Several tutorials describe customizing the login screen for Ubuntu by tweaking a specific stylesheet file. Alternatively there is the gdm-settings utility for a more point-and-click approach. The more elaborate part is configuring a task to redraw that image periodically. Specifically, we schedule a task to run on boot and every time the device comes out of sleep. This task will:

  1. Choose a suitable challenge to prove freshness. In this example, it retrieves the last block-hash from the Bitcoin blockchain. That value is updated every 10 minutes on average, can be independently confirmed by any verifier consulting the same blockchain and can not be predicted/controlled by the attacker. (For technical reasons, the block hash must be truncated down to 16 bytes, the maximum challenge size accepted by TPM2 interface.)
  2. Generate a TPM quote using the previously generated attestation key. For simplicity, the PoC assumes that AK has been made into a persistent TPM object to avoid having to load it into the TPM repeatedly.
  3. Combine the quote with additional metadata, most important one being actual PCR measurements. The quote structure includes a hash of the included PCRs but not the raw measurements themselves. Recall that our objective is to surface measurements so the owner can make an informed decision, not proving that they equal to a previously known reference value. If expected value was cast in stone and known ahead of time, one could instead use PCR policy to permanently bind some TPM key to those measurements instead, at the cost of fragility discussed earlier. (This PoC also includes the list of PCR indices involved in the measurement for easy parsing, but that is redundant as the signed quote structure already includes that.)
  4. Encode everything using base64 or another alphabet that most QR code applications can handle. In principle QR codes can encode binary data but not every scannerhandles this case gracefully.
  5. Convert that text into a PNG file containing its QR representation and write this image out on the filesystem where we previously configured Ubuntu to locate its background image.
TPM quote rendered as QR code on Ubuntu login screen

This QR code now contains sufficient information for the device owner to make a trust decision regarding the state of the system as observed by the TPM.

Corresponding verification steps would be:

  1. Decode QR image
  2. Parse the different fields and base64 decode to binary values
  3. Verify the quote structure using the public half of the TPM attestation key
  4. Concatenate actual PCR measurements including in the QR code, hash the resulting sequence of bytes and verify that this hash is equal to the hash appearing inside the quote structure. This step is necessary to establish that the “alleged” raw PCR measurements attached to the quote are, in fact, the values going into that quote.
  5. Confirm that the validated PCR measurements represent a trustworthy state of the system.

Step #5 is easier said than done, since it is uncommon to find public sources of “correct” measurements published by OEMs. Most likely one would take a differential approach, comparing against previous measurements from the same system or measurements taken from other identical machines in the fleet. For example, if applying a certain OS upgrade to a laptop in known healthy state in the lab produces a set of measurements, one can conclude that observing the same PCRs on a different unit from the same manufacturer is not unexpected occurrence. On the other hand, if a device mysteriously starts reporting a new PCR2 value—associated with option ROMs from peripheral devices that are loaded by firmware during boot stage— it may warrant further investigation by the owner.

Moving earlier in the boot chain

One problem with using the lock-screen to render TPM quotes is that it is already too late for certain scenario. Encrypted LUKS partitions will already have been unlocked at that point, possibly by the user entering a passphrase or their smart-card PIN during the boot sequence. That means a compromised operating system already has full access to decrypted data by the time any QR code appears. At that point the QR code still has some value as a detection mechanism, as the TPM will not sign a bogus quote. An attacker can prevent the quote from being displayed, feign a bug or attempt to replay a previous quote containing stale challenges but these all produce detectable signals. More subtle attempts may wait until disk-encryption credentials have been collected from the owner, exfiltrate those credentials to attacker-controlled endpoint, fake a kernel panic to induce reboot back into a healthy state where the next TPM quote will be correct.

With a little more effort, the quote rendering can be moved to earlier in the boot sequence and give the device owner an opportunity to inspect system state before any disks are unlocked. The idea is to move the above logic into the initrd image which is a customizable virtual disk image that contains instructions for early boot stages after EFI firmware has already passed control to the kernel. Initrd images are customized using scripts that execute at various stages. By moving the TPM quote generation to occur around the same time as LUKS decryption, we can guarantee that information about PCR measurements are available before any trust decisions are made about the system. While the logic is similar to rendering QR quotes on the login screen, there are several implementation complexities to work around. Starting with the most obvious problem:

  • Displaying images without the benefit of a full GUI framework. Frame-buffer to the rescue. It turns out that this is already a solved problem for Linux: fbi and related utilities can render JPEG or PNGs even while operating in console mode by writing directly to the frame-buffer device. (Incidentally it is also possible to take screenshots by reading from the same device; this is how the screenshot attached below was captured.)
  • Sequencing, or making sure that the quote is available before the disk is unlocked.  One way to guarantee this is to force quote generation to occur as integral part of LUKS unlock operation. Systemd has removed support for LUKS unlock scripts, although crypttab remains an indirect way to execute them. In principle we could write a LUKS script that invokes the quote-rendering logic first, as a blocking element. But this would create a dependency between existing unlock logic and TPM measurement verification. (Case in point: unlocking with TPM-bound secrets used to require custom logic but is now supported out of the box with systemd-cryptenroll.)
  • Stepping back, we only need to make sure that quotes are available for the user to check, before they supply any secrets such as LUKS passphrase or smart-card PIN to the system. There is no point in forcing any additional user interaction, since a user may always elect to ignore the quote and proceed with boot. To that end this PoC handles quote generation asynchronously. It opens a new virtual terminal (VT) and brings it to the foreground. All necessary work for generating the QR code— including prompting the user for optional challenge— will take place in that virtual terminal. Once the user exits the image viewer by pressing escape, they are switched back to the original VT where the LUKS prompt awaits. Note there is no forcing function in this sequence: nothing stops the user from ignoring the quote generation logic and invoking the standard Ctrl+Alt+Function key combination to switch back to the original VT immediately if they choose to.
  • Choosing challenges. Automatically retrieving fresh quotes from an external, verifiable source of randomness such as the bitcoin blockchain assumes network connectivity. While basic networking can be available inside initrd images and even earlier during the execution of EFI boot-loaders, it is not going to be the same stack running as the operating system itself. For example, if the device normally connects to the internet using a wifi network with the passphrase stored by the operating system, that connection will not be available until after the OS has fully booted. Even for wired connections, there can be edge-cases such as proxy configuration or 802.1X authentication that would be difficult to fully replicate inside the initrd image.
    This PoC takes a different tack to work around networking requirement, by using a combination of EFI variables and prompting the user. For the sunny-day path, the EFI variable is updated using an OS task scheduled to execute at shutdown, writing the same challenge (eg most recent block-hash from Bitcoin) into firmware flash. On boot this value will be available for the initrd scripts to retrieve for quote generation. If the challenge did not update correctly eg during a kernel-panic induced reboot, the owner can opt for manually entering a random value from the keyboard.
TPM quote rendered as QR code during boot, before LUKS unlock

Another minor implementation difference is getting by without a TPM resource manager. When executing in multi-user mode, TPM access is mediated by the tabrmd service, which stands for “TPM Access Broker and Resource Manager Daemon.” That service has exclusive access to the raw TPM device typically at /dev/tpm0 and all other processes seeking to interact with the TPM communicate with the resource manager over dbus. While it is possible to carry over the same model to initrd scripts, it is more efficient to simply have our TPM commands directly access the device node since they are executing as root and there is no risk of contention from other processes vying for TPM access.

CP

Evading Safe Links with S/MIME: when security features collide

From an attacker perspective email remains one of the most reliable channels for reaching their targets. Many security breaches start out with an employee making a poor judgment call to open a dangerous attachment or click on a link from their inbox. Not surprisingly cloud providers of email service invest significant time in building security features to protect against such risks. Safe Links is part of the defenses built into MSFT Defender for Office 365. As the second word implies, it is concerned with links: specifically protecting users from following harmful links embedded in email messages.

Working from first principles, there are two ways one could go about designing that functionality:

  1. Validate the link when the email is first received by the cloud service
  2. Validate it when the message is read by the recipient.

In both cases the verb “validate” assumes there is some blackbox that can look at a website and pass judgment on whether it is malicious, perhaps with a confidence score attached. In practice that would be a combination of crowd-sourced blacklists— for example, URLs previously reported as phishing by other users— and machine learning models trained to recognize specific signals of malicious activity..

There are trade-offs to either approach. Scanning as soon as email is received (but before it is delivered to the user inbox) allows for early detection of attacks. By not allowing users to ever see that email, we can short circuit human factors and avoid the risk that someone may be tempted to click on the link. On the other hand, it runs into a common design flaw known as TOCTOU or time-of-check-time-of-use. Here of “time of check” is when the webpage is scanned. “Time of use” is when the user clicks on the link. In between the content that the link points at can change; what started out as a benign page can morph into phishing or serve up  browser exploits.

Cloaking malicious content this way would be trivial for attackers, since they have full control over the content returned at all times. At the time they send their phishing email, the  server could be configured to serve anodyne, harmless pages. After waiting a few hours— or perhaps waiting until the initial scan, which is quite predictable in the case of MSFT Defender— they can flip a switch to start the attack. (Bonus points for sending email outside business hours, improving the odds that the victim will not accidentally stumble on the link until after the real payload is activated.) There is also a more mundane possibility that the page never changes but the classifiers get it wrong, mistakenly labeling it as harmless until other users manually report the page as malicious. Validating links on every click avoids such hijinks, leveraging most up-to-date information about the destination.

Wrapping links

While validating links every time is the more sound design, it poses a problem for email service providers. They do not have visibility into every possible situation where users are following links from email. In the fantasy universe MSFT sales teams inhibit, all customers read their email on MSFT Outlook on PCs running Windows with a copy of Defender for Endpoint installed for extra safety. In the real world, enterprises have heterogenous environments where employees could be reading email on iPhones, Android, Macs or even on Linux machines without  a trace of MSFT software in the picture.

Safe Links solves that problem by rewriting links in the original email before delivering it to the customer inbox. Instead of pointing to the original URL, the links now point to a MSFT website that can perform checks every time it is accessed and only redirect the user to the original site if considered safe. Once the original copy of the message has been altered, it no longer matters which email client or device the user prefers. They will all render messages with modified links pointing back to MSFT servers. (There is a certain irony to MSFT actively modifying customer communications in the name of security, after running a FUD campaign accusing Google of merely reading their customers’ messages. But this is an opt-in enterprise feature that customers actually pay extra for. As with most enterprise IT decisions, it is inflicted on a user population that has little say on the policy decisions affecting their computing environment.) 

Safe Links at work

To take an example, consider the fate of a direct link to Google when it appears in a message addressed to an Office 365 user with Defender policies enabled. There are different ways the link can appear, such as plaintext, as hyperlink from text section or image. and image with hyperlink. Here is the message according to the sender:


Here is the same message from the vantage point of the Office 365 recipient:

Visually these look identical. But on closer inspection the links have been altered. This is easiest to observe from the MSFT webmail client. True URLs are displayed in the browser status bar at the bottom of the window when hovering over links:


The alterations are more blatant when links are sent as plaintext email:


In this case Safe Links alters the visual appearance of the message, because the URL appears as plain- text instead of being encoded in the HTML mark-up which is not directly rendered.

Structure of altered links

Modifications follow the same pattern:

  • Hostname points to “outlook.com” a domain controlled by MSFT.
  • Original URL is included verbatim as one of the query-string parameters
  • Email address of the recipient also makes an appearance, in the next parameter
  • What is not obvious from the syntactic structure of the link but can be verified experimentally: this link does not require authentication. It is not personalized. Anyone— not just the original recipient— can request that URL from MSFT and will be served a redirect to Google. In other words these links function as semi-open redirectors.
  • There is an integrity check in the link. Tampering with any of the query-string parameters or removing them results in an error from the server. (“sdata” field could indicate a Signature over the Data field. It is exactly 32 bytes of base64-encoded content, consistent with an HMAC-SHA256 or similar MAC intended for verification only by MSFT.) This is what happens if the target URL is modified from Google to Bing:
Safe Link integrity checks: even MSFT’s own redirector refuses to send customers to Bing 🤷‍♂️

Bad interactions: end-to-end encryption

Given this background, now we can discuss a trivial bypass. S/MIME is a standard for end-to-end encryption and authentication of email traffic. It is ancient in Internet years, dating back to the 1990s. Emerging around the same time as PGP, it is arguably the “enterprisey” buttoned-down response to PGP, compliant with other fashionably enterprise-centric standards of its era. While PGP defined its own format for everything from public-keys to message structure, S/MIME built on X509 for digital certificates and CMS/PKCS7 for ciphertext/signature formatting. (Incidentally both are based on the binary encoding format ASN1.) As with PGP, it has not taken off broadly except in the context of certain high-security enterprise and government/defense settings.

At the nuts and bolts level, S/MIME uses public-key cryptography. Each participant has their own key-pair. Their public-key is embedded in a digital certificate issued by a trusted CA that can vouch for the owner of that public key. If Alice and Bob have each others’ certificates, she can encrypt emails that only Bob can read. She can also digitally sign those messages such that Bob can be confident they could only have originated with Alice.

How does all this interact with Safe Links? There is an obvious case involving encrypted messages: if an incoming message is encrypted such that Bob can only read it after decrypting with his private key— which no one else possesses— then the email service provider can not do any inspection, let alone rewriting, of hyperlinks present. That applies broadly to any link scanning implemented by a middle-man, not just MSFT Safe Links.  (Tangent: that restriction only holds for true end-to-end encryption. Cloud providers such as Google have muddied the waters with lobotomized/watered-down variants where private-keys are escrowed to the cloud provider in order to sign/decrypt on behalf of the end user. That is S/MIME in name only and more accurately “encraption.”)

In practice, this bypass is not very useful for a typical adversary running a garden-variety phishing campaign:

  • Most targets do not use S/MIME
  • For those who do— while few in number, these will be high-value targets with national security implications— the attacker likely does not have access to the victim certificate to properly encrypt the message. (Granted, this is security through obscurity. It will not deter a resourceful attacker.)
  • Finally even if they could compose encrypted emails, such messages are likely to raise suspicion. The presence of encryption can be used as a signal in machine learning models as contributing sign of malicious behavior, as in the case of encrypted zip files sent as attachments. Even the recipient may have heightened awareness of unusual behavior if opening the email requires taking unusual steps, such as entering the PIN for their smart-card to perform decryption.

Trivial bypass with signatures

But there is a more subtle interaction between Safe Links and S/MIME. Recall that digital signatures are extremely sensitive to any alteration of the message. Anything that modifies message content would invalidate signatures. It appears that the Safe Links design accounted for this and shows particular restraint: clear-text messages bearing an S/MIME signature are exempt from link-wrapping.

Interestingly, the exemption from Safe Links works regardless of whether the S/MIME certificate used for signing is trusted. In the above above screenshot from Outlook on MacOS, there is an informational message about the presence of a digital signature, accompanied by the reassuring visual indication of security, the ubiquitous lock icon. But taking a closer look via “Details” shows the certificate was explicitly marked as untrusted in MacOS keychain:

Similarly the web UI merely contains a disclaimer about signature status being unverifiable due to a missing S/MIME control. (Notwithstanding the legacy IE/ActiveX terminology of “control” that appears to be a reference to a Chrome extension for using S/MIME with webmail.) This limitation is natural: signature validation is done locally by the email client running on a user device. Safe Links operates in the cloud and must make a decision about rewriting links at the time the email is received, without knowing how the recipient will view it. Without full visibility into the trusted CAs associated with every possible user device, a cloud service can not make an accurate prediction about whether the signing certificate is valid for this purpose. MSFT makes a conservative assumption, namely that the signature may be valid for some device somewhere. It follows that signed messages must be exempt from tampering by Safe Links.

Exploiting cleartext signed messages to evade Safe Links is straightforward. Anyone can roll out their own CA and issue themselves certificates suitable for S/MIME. The main requirement is the presence of a particular OID in the extended key-usage (EKU) attribute indicating that the key is meant for email protection. While such certificates will not be trusted by anyone, the mere existence of a signature is enough to exempt messages from Safe Links and allow links to reach their target without tampering.

Crafting such messages does not require any special capability on the part of the target. Recall that they are signed by the sender— in other words, the attacker— but they are not encrypted. There is no need for the recipient to have S/MIME setup or even know what S/MIME is. Depending on the email client, there may be visual indications in the UI about the presence of a digital signature, as well as its trust status. (Worst case scenario, if there are obtrusive warnings concerning an untrusted signature, attackers can also get free S/MIME certificates from a publicly trusted CA such as Actalis. This is unlikely to be necessary. Given the lessons from Why Johnny Can’t Encrypt, subtle warnings are unlikely to influence trust decisions made by end users.)

Options for mitigation

While this post focused on cloud-hosted Exchange, the same dilemma applies to any so-called “email security” solution predicated on rewriting email contents: it can either break S/MIME signatures or fail-open by allowing all links in signed messages through unchanged. Even the weak, key-escrow model espoused by companies such as Google is of little help. GMail can decrypt incoming messages on behalf of a customer of Google Apps, if the customer willingly relinquishes their end-to-end confidentiality and hands over their private key to Google. But GMail still can not re-sign an altered message that was originally signed by unaffiliated party.

Given the rarity of S/MIME, a pragmatic approach is to allow enterprises to opt into the first option. If their employees are not setup for S/MIME and have no expectation of end-to-end authentication in the first place, this functionality is introducing pure risk with no benefit. In that scenario it makes sense for Exchange to not only rewrite the link, but remove signatures altogether to avoid confusion.

That will not fly in high-security environments where S/MIME is actually deployed and end-to-end encryption is important. In that case, more fine grained controls can be applied to cleartext signed messages. For example, the administrator could require that cleartext signed messages are only allowed if the sender certificate was issued by a handful of trusted CAs.

CP

Understanding Tornado Cash: code as speech vs code in action

Tornado Cash is a mixer on the Ethereum network. By design mixers obfuscate the movement of funds. They make it more difficult to trace how money is flowing among different blockchain addresses. In one view, mixers improve privacy for law-abiding ordinary citizens whose transactions are otherwise visible for everyone else in the world to track. A less charitable view contends that mixers help criminals launder the proceeds of criminal activity. Not surprisingly Tornado found a very happy customer in North Korea, a rogue nation-state with a penchant for stealing digital assets in order to evade sanctions. Regulators were not amused. Tornado Cash achieved the dubious distinction of becoming the first autonomous smart-contract to be sanctioned by the US Treasury. Its developers were arrested and put on trial for their role in operating the mixer. (One has already been convicted of money laundering by a Dutch court.)

Regardless of how Tornado ends up being used in the real world, lobbying groups have been quick to come to the defense of its developers. Some have gone so far as to cast the prosecution as an infringement on constitutionally protected speech, pointing to US precedents where source code was deemed in scope of first amendment rights. Underlying such hand-wringing is a slippery slope argument. If these handful of developers are held liable for North Koreans using their software to commit criminal activity, then what about the thousands of volunteers who are publicly releasing code under open-source licenses? It is almost certain that somebody somewhere in a sanctioned regime is using Linux to further the national interests of those rogue states. Does that mean every volunteer who contributed to Linux is at risk of getting rounded up next? 

This is a specious argument. It brushes aside decades of lessons learned from previous controversies around DRM and vulnerability disclosure in information security. To better understand where Tornado crosses the line, we need to look at the distinction between code as speech and code in action.

“It’s alright Ma— I’m only coding”

There is a crucial difference between Tornado Cash and the Linux operating system, or for that matter open-source applications such as the Firefox web browser. Tornado Cash is a hosted service. To better illustrate why that makes a difference, let’s move away from blockchains and money transmission, and into a simpler setting involving productivity applications. Imagine a not-entirely-hypothetical world where providing word processing software to Russia was declared illegal. Note this restriction is phrased very generically; it makes no assumptions about the distribution or business model.

For example the software could be a traditional, locally installed application. LibreOffice is an example of an open-source competitor to the better-known MSFT Office. If it turns out that somebody somewhere in Russia downloaded a copy of that code from one of the mirrors, are LibreOffice developers liable? The answer should be a resounding “no” for several reasons. First the volunteers behind LibreOffice never entered into an agreement with any Russian national/entity for supplying them with software intended for a particular purpose. Second, they had no awareness much less control over who can download their work product once it is released into the wild. Of course these points could also be marshaled in defense of Tornado Cash. Presumably they did not run a marketing campaign courting rogue regimes. Nor for that matter, did North Korean APT check-in with the developers first before using the mixer— at least, based on publicly known information about the case. 

But there is one objection that only holds true for the hypothetical case of stand-alone, locally installed software: that source-code downloaded from GitHub is inert content. It does not accomplish anything, until it is built and executed on some machine under control of the sanctioned entity. The same defense would not hold for a software-as-a-service (SaaS) offering such as Google Docs. If the Russian government switches to Google Docs because MSFT is no longer allowed to sell them copies of Word, Google can not disclaim knowledge of deliberately providing a service to a sanctioned entity. (That would hold true even if use was limited to the “free” version, with no money changing hands and no enterprise contract signed.) Google is not merely providing inert lines of code to customers. It has been animated into a fully functioning service, running on Google-owned hardware inside Google-owned data-centers. There is every expectation that Google can and should take steps to limit access to this service from sanctioned countries.

While the previous cases were cut and dry, gray areas emerge quickly. Suppose someone unaffiliated with the LibreOffice development team takes a copy that software and runs it on AWS as a service. With a little work, it would be possible for anyone in the world with a web browser to remotely connect and use this hosted offering for authoring documents. If it turns out such a hosted service is frequented by sanctioned entities, is that a problem? Provided one accepts that Google bears responsibility for the previous example, the answer here should be identical. But it is less straightforward who that responsible party ought to be. There is full separation of roles between development, hosting and operations. For Google Docs, they are all one and the same. Here code written by one group (open-source developers of LibreOffice) and runs on physical infrastructure provided by a different entity (Amazon.) But ultimately it is the operator who crossed the Rubicon. It was their deliberate decision to execute the code in a manner that would make its functionality publicly-accessible, including to participants who not supposed to have access. Any responsibility from misuse then lies squarely with the operator. The original developers are not involved. Neither is AWS. Amazon is merely the underlying platform provider, a neutral infrastructure that can be used for hosting any type of service, legal or not.

Ethereum as the infrastructure provider

Returning to Tornado Cash, it is clear that running a mixer on ethereum is closer in spirit to hosting a service at AWS, than it is to publishing open-source software. Billed as the “world computer,” Ethereum is a neutral platform for hosting applications— specifically, extremely niche types of distributed application requiring full transparency and decentralization. As with AWS, individuals can pay this platform in ether to host services— even if those services are written in an unusual programming language and have very limited facilities compared to what can be accomplished with Amazon. Just like AWS, those services can be used by other participants with access to the platform. Anyone with access to the blockchain can leverage those services. (There is in some sense a higher bar. Paying transaction fees in ether is necessary to interact with a service on the Ethereum blockchain. Using a website hosted at AWS requires nothing more than a working internet connection.) Those services could be free or have a commercial angle— as in the case of the Tornado mixer, which had an associated TORN token that its operators planned to profit from.

The implication is clear: Tornado team is responsible for their role as operators of a mixing service, not for their part developers writing the code. That would have been protected speech if they had stopped short of deploying a publicly-accessible contract, leaving it in the realm of research. Instead they opted for “breathing life into the code,” by launching a contract, complete with a token model they fully controlled.

One key difference is the immutable nature of blockchains: it may be impossible to stop or modify a service once it has been deployed. It is as if AWS allowed launching services without a kill switch. Once launched, the service becomes an autonomous entity that neither the original deployer or Amazon itself can shut down. But that does not absolve the operator of responsibility for deploying the service in the first place. There is no engineering rule that prevents a smart-contract from having additional safeguards, such as the ability to upgrade its code to address defects or even to temporarily pause it when problems are discovered. Such administrative controls— or backdoors, depending on perspective— are now common practice for critical ethereum contracts, including stablecoins and decentralized exchanges. For that matter, contracts can incorporate additional rules to blacklist specific addresses or seize funds in response to law enforcement requests. Stablecoin operators do this all the time. Even Tether with its checkered history has demonstrated a surprising appetite for promptly seizing funds in response to court orders. The Tornado team may protest they have no way to shut down or tweak the service in response to “shocking” revelations that it is being leveraged by North Korea. From an ethical perspective, the only response to that protest is: they should have known better than to launch a service based on code without adequate safeguards in the first place.

Parallels with vulnerability disclosure

Arguments over when developers cross a line into legal liability is not new. Information security has been on the frontlines of that debate for close to three decades owing to the question of proper vulnerability disclosure. Unsurprisingly software vendors have been wildly averse to public dissemination of any knowledge regarding defects in their precious products. Nothing offended those sensibilities more than the school of “full disclosure” especially when accompanied by working exploits. But try as they would like to criminalize such activity with colorful yet misguided metaphors (“handing out free guns to criminals!”) the consensus remains that a security researcher purely releasing research is not liable for the downstream actions of other individuals leveraging their work— even when the research includes fully weaponized exploit code ready-made for breaking into a system. (One exception has been the content industry, which managed to fare better, thanks to a draconian anti-circumvention measure in the DMCA. While that legislation certainly had a chilling effect on pure security research on copyright protection measures, in practice most of the litigation has focused on going after those committing infringement rather than researchers who developed code enabling that infringement.) 

Debates still continue around what constitutes “responsible” disclosure, where society can strike the optimal balance between incentivizing vendors to fix security vulnerabilities promptly without making it easier for threat actors to exploit those same vulnerabilities. Absent any pressure, negligent/incompetent vendors will delay patches arguing that risk is low because there is no evidence of public exploitation. (Of course absence of evidence is not evidence of absence and in any case, vendors have little control over when malicious actors will discover exploits independently.) But here we can step back from the question of optimal social outcomes, and focus on the narrow question of liability. It is not the exploit developer writing code but the person executing that exploit against a vulnerable target who ought to be held legally responsible for the consequences. (To paraphrase a bumper-sticker version of second amendment advocacy: “Exploits don’t pop machines; people pop machines.”) In the same vein, Tornado Cash team is fully culpable for their deliberate decision to turn code into a service. Once they launched the contract on chain, they were no longer mere developers. They became operators.

CP

Behavioral economics on Ethereum: stress-testing censorship resistance

Predicated on studying the behavior of real people (distinct from the mythical homo economicus of theory) behavioral economics has the challenge of constructing realistic experiments in a labarotory setting. That calls for signing up a sizable group of volunteers and putting them into an artificial situation with monetary incentives to influence their decision-making process. Could blockchains in general and Ethereum in particular help by making it easier to either recruit those participants or setup the experimental framework? In this blog post we explore that possibility using a series of hypothetical experiments, building up from simple two-person games to an open-ended version of the tragedy of the commons.

1. Simple case: the ultimatum game

The ultimatum game is a simple experiment involving two participants that explores the notion of fairness. The participants are randomly assigned to either “proposer” or “responder” role. A pool of funds is made available to the proposer, who has complete discretion on making an offer to allocate those funds between herself and the responder. If the responder accepts, the funds are distributed as agreed. If the responder vetos the offer— presumably for being too skewed towards the proposer— no one receives anything.

This experiment and its variations are notable in showing an unexpected divergence from the theoretical “profit maximization” model of economics 101. Given that the responder has no leverage, one would expect they will begrudgingly settle for any amount, including wildly unfair splits where the proposer decides keeps 99% of the funds. Given that the proposer is also a rational actor aware of that dynamic, the standard model predicts such highly uneven offers being made… and accepted. Yet that is not what experiments show: most offers are close to 50/50 and most responders outright reject offers that are considered too unequal. (This only scratches the surface of the complex dynamics revealed in the experiment. Subtle changes to the framing— such as telling the responders a tall tale about the split being randomly decided by a computer program instead of a sentient being— changes their willingness to accept unfair splits; possibly because one does not take “offense” at the outcome of a random process the same way they might react to the perceived greed of the fellow on the other side.)

Putting aside the surprising nature of the results, it is straightforward to implement this experiment in Ethereum. We assume both the proposer and responder have ethereum wallets with known addresses. In order to run the experiment on chain, researchers can deploy a smart-contract and fund it with the promised reward. This contract would have three functions:

  1. One only callable by the proposer, stating the intended allocation of funds.
  2. One only callable by the responder, accepting/rejecting that proposed allocation. Depending on the answer, the contract would either distribute funds or return the entire amount back to the experiment team. 
  3. For practical reasons, one would also include an escape-hatch in the form of a third function that can be called by anyone after some deadline has elapsed to recover the funds in case one or both subjects fail to complete the expected task. Depending on the which side reneged on their obligation, it would award the entire reward to the other participant.

There are some caveats that could influence the outcome: both participants must hold some ether already at their designated address, in order to make smart-contract calls. Alternatively the experimenters can supply just enough ETH to both volunteers to pay for the expected cost of those calls. But that runs the risk of participants deciding to abscond with funds instead of proceeding with the experiment. (The responder in particular faces an interesting dilemma when confronted with an unfair split they are inclined to reject. On the one hand, registering their displeasure on-chain sends a message to the greedy proposer, at the cost of spending ETH they had been given for the experiment. On the other hand, simply transferring that ETH to a personal wallet allows the proposer to walk away with something but only at the cost of allowing the greedy proposer to keep 100% of the funds due to the assumed default.) This effect is diminished to the extent that the prize money up for grabs is much larger than the transaction fees the participants are required to part with. More generally, transaction fees determine whether running this experiment on-chain would be any more efficient from an experimental stance than enticing volunteers with free food.

More subtle is the effect of perceived privacy— or lack thereof— in influencing participant behavior. Would a proposer be less inclined to reveal their true colors and offer an unfair split when interacting on-chain versus in real life? On the one hand, blockchains are public: anyone can observe that a particular proposal was greedy. Having their actions permanently on the record for the whole world to observe may motivate participants to follow social mores. On the other hand, blockchain addresses are pseudonyms, without any identifying information. “Bravery of the keyboard” may result in fewer inhibitions about diverging from social expectations and making greedy offers when one is not directly interacting with other persons.

2. Open to all: the Ether auction

The ultimatum game is played between two players. Running that experiment still requires finding suitable volunteers, pairing them up and launching a smart-contracts for each pair. (The contract will only accept inputs from the proposer and responder, and as such must have awareness of their address.) But there are other experiments which can be run without requiring any advance coordination, beyond that of publicizing the existence of a smart-contract that implements the experimental setup. In effect anyone with an ethereum wallet can make an independent decision on whether they want to participate.

Tullock auctions in general and the better-known “dollar auction” in particular are a case in point. As with all auctions, the highest bidder wins by paying their offer. But unlike most auctions, everyone else who loses to that winning bid are still required to part with the amount they offered. Given those unforgiving dynamics— everyone except the winner must pay and still end up with nothing in return— it seems illogical that anyone would play along. Now consider the “dollar auction,” a slight variant that is simple enough to be demonstrated in classroom settings. The professor holds up a $100 bill and offers to give it to the highest bidder, subject to Tullock auction rules with $1 increments for bids. (Side note: in the original version, only the second-highest bidder is forced to pay while all others are spared. That still does not alter the underlying competitive dynamic between the leading two bidders.) Once the students get over their sense of disbelief that their wise teacher— economics professor by training, of all things—  is willing to part with a $100 bill for a pittance, this looks like a great deal. So one student quickly comes up with the minimum $1 bid, spotting an easy $99 profit. Unfortunately the next student sees an equally easy way to score $98 by bidding $2. Slightly less profit than the first bidder imagined achieving if they remained uncontested, but still a decent amount that any rational participant would rightfully chase after. It follows that the same logic could motivate fellow students to bid $3, $4 and higher amounts in an ever increasing cycle of bids. But even before the third student jumps in, there is one person who has suddenly become more motivated to escalate: the initial bidder. Having lost the lead, they are faced with the prospect of losing $1— since everyone must pay their bid, win or lose. That fundamentally alters their expected value calculus, compared to other students currently sitting on the sidelines. A student who has not entered the auction must decide between zero gain (by remaining a spectator in this experiment) or jumping into the fray to chase after the $100 being dangled in front of the class. By contrast a student who has been already out-bid is looking at a choice between a guaranteed loss of their original or escalating the bid to convert that losses into gains.

Informally these auctions distill the notion of “arms race” or “winner-take-all” situations, where multiple participants expend resources chasing after an objective but only one of them can walk away with the prize while everyone else sees their effort wasted. Economists cited examples where such dynamics are inadvertently created, for example in the competition between American cities competing to win HUD grants. (Presumably the same goes for countries making elaborate bids to host the next Olympics or FIFA World Cup, considering that only one of them will be granted the opportunity.)

Shifting this classroom exercise into Ethereum is straightforward: we create a smart-contract and seed it with the initial prize money of 1 ETH. The contract accepts bids from any address, provided it exceeds the current leading bid by some incremental amount. In fact the bid can be automatically deduced from the amount of ether sent. Participants do not need MetaMask or similar noncustodial wallets with contract invocation capabilities. They can simply send ETH to the contract from an online centralized exchange. Contracts can be design with a default payment function that is triggered when any ether is sent, even without an explicit function call. That default fallback function can take care of the book-keeping associated with bids, including distinguishing between initial vs updated bids. If an address was encountered before, any subsequent calls with value attached are considered an increment on top of the original bid. (That is, if you bid 0.5 ETH and now want to raise that offer to 0.7ETH in order to counter someone else bidding 0.6 ETH, your second call only need to attach the delta 0.2 ETH.) At some previously agreed upon block-height or time, the contract stops accepting any more bids. The same fallback function can be invoked by anyone to “finalize” the outcome and send the prize money to the winner, and presumably any left over ETH from incoming bids to a GoFundMe campaign for funding behavioral economics education.

While this seems straightforward, there are some subtle differences from the class-room setting due to the way Ethereum operates. These can distort the auction dynamics and create alternative ways to maximize profit. The most problematic one arises from the interaction of an auction deadline with the block-by-block manner ethereum transactions are added to the blockchain. One way to guarantee not being outbid is to make sure one is submitting the last bid. If the auction is set to conclude at a particular block height or even instance in time (ethereum blocks contain a timestamp) all rational actors would engage in a game of chicken, waiting until that last block to submit their bids.

In fact, since transactions are visible in mempool before they are mined, rational actors would continue to bide there time even during the interval for that block. Optimal strategy calls for waiting out other bidders to send their transactions, and only submit a higher bid after all of those suckers prematurely tipped their hand. Naturally this runs a different risk that the bid may arrive too late, if the block has already been constructed and attested without your bid getting the last laugh. (That also raises a question of what the contract should do with bids arriving after the deadline. It can bounce the ETH as a favor to the slow poke or alternatively, eat up the ETH without taking the bid into consideration as punishment for playing this game of chicken.)

Even this model is too simplistic for not taking MEV into account. Ethereum blocks are not simply constructed out of some “fair” ordering of all public transactions sitting around in the mempool according to gas paid. Miners have been operating a complex ecosystem of revenue optimization by accepting out-of-band bids— that is, payment outside the standard model of fees— to prioritize specific transactions or reorder transactions within a block. (This system became institutionalized with the switch to proof-of-stake.) Why would participants compete over ordering within a block? Because transactions are executed in the order they appear in the block, and there may be a significant arbitrage opportunity for the first mover. Suppose a decentralized exchange has temporarily mispriced a particular asset because the liquidity pools got out of balance. One lucky trader to  to execute a trade on that DEX can monopolize all of the available profit caused by the temporary dislocation. If the competition to get that transaction mined first were conducted out in the open— as the fee marketplace originally operated— the outcome would be a massively wasteful escalation in transaction fees. Those chasing the arbitrage opportunity would send transactions with increasing fees to convince miners to include their TX ahead of everyone else in the block. This is incredibly inefficient: while multiple such transactions can and will be included in the next block, only the earliest one gets to exploit the price discrepancy on the DEX and collect the reward, while everyone else ends up wasting transaction fees and block space for no good reason. If that dynamic sounds familiar, it is effectively a type of Tullock auction conducted in mempool.

MEV solves that problem by having market participants submit individual transactions or even entire blocks to miners/validators through a semi-private interface. Instead of competing with each other out in the open and paying for TX that did not “win” the race to appear first, MEV converts the Tullock auction into a garden-variety sealed bid first price auction where the losers are no longer stuck paying their bid.

By the same logic, MEV provides a way out of the bizarre dynamics created by the Ether auction: instead of publicly duking it out with bids officially registered on-chain or even sitting in mempool, rational participants can use a service such as Flashbots to privately compete for the prize. There is still no guarantee of winning— a block proposer could ignore Flashbots and just choose to claim the prize for themselves by putting their own bid TX first— but MEV removes the downside for the losers.

3. Tragedy of the commons: Luring Lottery

Here is a different experiment that MEV will not help with. For real life inspiration, consider an experiment Douglas Hofstadter ran in Scientific American in 1983. (Metamagical Themas covers this episode in great detail.) SciAm announced a lottery with a whopping million dollar bounty to be awarded to the lucky winner. The rules were simple enough: any one can participate by sending a postcard with a positive integer written on the card. Their odds of winning are proportional to the submitted number. There is one catch: the prize money is divided by the total of all submission received. In the “best case” scenario— or worst case, depending on the publisher perspective— only one person participates and sends in the number 1. At that point SciAm may be looking at chapter 11, because that lucky reader is guaranteed to collect the promised million dollars. Alternatively in a “worst case” scenario, millions of readers participate or a handful of determined contestants submit astronomically large numbers to win at all costs. SciAm lives to publish another issue, as the prize money is diluted down to a few dollars.

Unlike the dollar auction, there is nothing subtle about the dynamics here. The contest is manifestly designed to discourage selfish behavior: submitting large numbers (or even making any submission at all, since a truly altruistic reader could deliberately opt not to participate) will increase individual chances of winning, but reduce the overall prize. While the magazine editors were rightfully concerned about having to payout a very large sum if most readers did not take the bait, Hofstadter was not flustered.

No one familiar with Garrett Hardin’s fundamental insight in “The tragedy of the commons” will be surprised by what transpired: not only were there plenty of submissions, some creative readers sent entries that were massive— not expressed as ordinary numbers but defined with complex mathematical formulas, to the point where the contest organizers could not even compare these numbers to gauge probabilities. Not that it mattered, as the prize money was correspondingly reduced to an infinitesimal fraction of a penny. No payouts necessary.

Again this experiment can be run as an Ethereum smart-contract and this time MEV will not help participants game the system. As before, we launch a contract and seed it with the prize money ahead of time. It has two public functions:

  • One accepts submissions for a limited time. Unlike the SciAm lottery judged by sentient beings, we have to constrain submissions to actual integers (no mathematical formulas) and restrict their range to something the EVM can comfortably operate on without overflow, such as 2200. Any address can submit a number. They can even submit multiple entries; this was permitted in the original SciAm lottery which serves as the inspiration for the experiment. The contract keeps track of totals submitted for each address, along with a global tally to serve as denominator when calculating winning odds for every address.
  • The second function can be called by anyone once the lottery concludes. It selects a winner using a fair randomness source, with each address weighted according to the ratio of their total submissions to the overall sum. It also adjusts the payout according to the total submissions and sends the reward to the winner, returning the remainder back to the organizers. (Side note: getting fair randomness can be tricky on chain, requiring a trusted randomness beacon. Seemingly “random” properties such as the previous block hash can be influenced by participants trying to steer the lottery outcome to favor their entry.)

Given this setup, consider the incentives facing every Ethereum user when the lottery is initially launched. There is a million dollars sitting at an immutable contract that is guaranteed to pay out the prize money according to predefined rules. (Unlike SciAm, the contract can not file for bankruptcy or renege on the promise.) One call to the contract is all it takes to participate in the lottery. Win or lose, there is no commitment of funds beyond the transactions fees required for that call. This is a crucial difference from the Tullock auction where every bid represents a sunk cost for the bidder. Downsides are capped regardless of what other network participants are doing.

Also unlike the Tullock auction, there is little disincentive to participate early. There is no advantage to be gained by waiting until the last minute and submitting the last entry. Certainly one can wait to submit a number much higher than previous entries to stack the odds, but doing so also reduces the expected payout. Not to mention that since submissions are capped in a finite range, participants can simply submit the maximum number to begin with. Meanwhile, MEV can not help with the one problem every rational actor would like to solve: prevent anyone else from joining the lottery. While it would be great to be the only participant or at least one of a handful of participants with an entry, existing MEV mechanisms can not indefinitely prevent others from participating in the lottery. At best bidders could delay submissions for a few blocks by paying to influence the content of those blocks. It would require a coordinated censorship effort to exclude all transactions destined to the lottery contract for an extended period of time.

If anything MEV encourages early participation. No one can be assured they will have the “last say” in the construction of the final block before the lottery stops accepting submissions. Therefore the rational course of action is to submit an entry earlier to guarantee a chance at winning. In fact there may even be optimal strategies around “spoiling” the payout for everyone else immediately with a large entry, such that the expected reward for future entries is barely enough to offset transaction fees for any future participant. This is an accelerated tragedy of the commons— equivalent to the herdsmen from Hardin’s anecdote burning the grasslands to discourage other herdsmen from grazing their cattle on the same land.

Testing censorship resistance

Alternatively the ether auction and Luring Lottery serve as empirical tests of miner censorship on Ethereum. In both cases, rational actors have a strong incentive to participate themselves while preventing others from participating. This is because participation by others reduces the expected gain. For the ether auction, being outbid results in going from guaranteed profit to guaranteed loss. For the Tullock lottery, competition from other participants is not quite as detrimental but it undermines expected return in two ways: by reducing the probability of winning and slashing the prize money on offer. It follows that rational actors have an incentive to censor transactions from other participants.

If 100% reliable censorship is possible, then both experiments have a trivial winning strategy for the censor. For the Tullock auction, submit the minimum acceptable bid and prevent anyone else from making higher offers. Likewise for the Luring Lottery, send the number “1” and block anyone else from submitting an entry that would dilute the prize money.

On Ethereum, block proposers are in the best position to engage in such censorship at minimal cost. While they would be foregoing fees associated with the censored TX, that loss is dwarfed by the outsized gain expected from monopolizing the full rewards available from the hypothetical contract. Could such censorship be sustained indefinitely? This seems unlikely, even if multiple validators were colluding under an agreement to split the gains. It only takes a defection from a single proposer to get a transaction included. Validators could choose to pursue a risky strategy of ignoring the undesirable block, on the pretense that the proposer failed to produce a block in time as expected. They could then wait for a block from the next “compliant” proposer who follows the censorship plan. This approach will fail and incur additional penalties if other attesters/validators accept the block and continue building on top of it. Short of near universal agreement on the censorship plan— as with OFAC compliance— a coalition with sufficient validator share is unlikely to materialize. On the other hand 100% reliable censorship is not necessary for the Luring lottery: block proposers can not stop other proposers from participating when it is their turn, but they can certainly exclude any competing TX from their own blocks. That effectively limits participation to proposers or at best ethereum users aligned with a friendly proposer willing to share the prize. But such tacit collusion would be highly unstable: even if every proposer correctly diagnosed the situation and limited themselves to a single submission of “1” to avoid unnecessary escalation, there would still be an incentive to rig the lottery with a last-minute submission that dwarfs all previous entries. 

CP

[Edited: May 4th]

Browser in the middle: 25 years after the MSFT antitrust trial

In May 1998 the US Department of Justice and the Attorneys General of 20 states along with the District of Columbia sued Microsoft in federal court, alleging predatory strategies and anticompetitive business practices. At the heart of the lawsuit was the web browser Internet Explorer, and strong-arm tactics MSFT adopted with business partners to increase the share of IE over the competing Netscape Navigator. 25 years later in a drastically altered technology landscape, DOJ is now going after Google for its monopoly power in search and advertising. With the benefit of hindsight, there are many lessons in the MSFT experience that could offer useful parallels for the new era of antitrust enforcement, as both sides prepare for the trial in September.

The first browser wars

By all indications, MSFT was late to the Internet disruption. The company badly missed the swift rise of the open web, instead investing in once-promising ideas such such interactive TV or walled-garden in the style of early Prodigy. It was not until the Gates’s 1995 “Tidal Wave” memo that the company began to mobilize its resources. Some of the changes were laughably amateurish— teams hiring for dedicating “Internet program manager” roles. Others proved more strategic, including the decision to build a new browser. In the rush to get something out the door, the first version of Internet Explorer was based on Spyglass Mosaic, a commercial version of the first popular browser NCSA Mosaic developed at the University of Illinois. (The team behind Mosaic would go on to create Netscape Navigator.)  Even the name itself betrayed the Windows-centric and incrementalist attitude prevalent in Redmond: “explorer” was the name of the Windows GUI or “shell” for browsing content on the local machine. Internet Explorer would be its networked cousin helping users explore the wild wild web.

By the time IE 1.0 shipped in August 1995, Netscape already had a commanding lead in market share, not to mention the better product measured in features and functionality. But by this time MSFT had mobilized its considerable resources, greatly expanding investment in the browser team, replacing Spyglass code with its own proprietary implementation. IE3 was the first credible version to have some semblance of feature parity against Navigator, having added support for frames, cascading stylesheets, and Javascript. It was also the first time MSFT started going on the offensive, and responding with its own proprietary alternatives to technologies introduced by Netscape. Navigator had the Netscape Plugin API (NPAPI) for writing browser extensions; IE introduced ActiveX— completely incompatible with NAPI but entirely built on other MSFT-centric technologies including COM and OLE. Over the next two years this pattern would repeat as IE and Navigator duked it out for market share by introducing competing technologies. Netscape allowed web pages to run dynamic content with a new scripting language Javascript; MSFT would support that in the name of compatibility but also subtly try to undermine JS by pushing vbscript, based on the Visual Basic language so familiar to existing Windows developers.

Bundle of trouble

While competition heated up over functionality— and chasing fads, such as the “push” craze of the late 1990s that resulted in Channel Definition Language— there was one weapon uniquely available to MSFT for grabbing market share: shipping IE with Windows. Netscape depended on users downloading the software from their website. Quaint as this sounds in 2024, it was a significant barrier to adoption in an age when most of the world had not made the transition to being online. How does one download Navigator from the official Netscape website if they do not have a web browser to begin with? MSFT had a well-established channel exempt from this bootstrapping problem: copies of Windows distributed “offline” using decidedly low-tech means such as shrink-wrapped boxes of CDs or preinstalled on PCs. In principle Netscape could seek out similar arrangements with Dell or HP to include its browser instead. Unless of course MSFT made the OEMs an offer they could not refuse.

That became the core of the government accusation for anticompetitive practices: MSFT pushed for exclusive deals, pressuring partners such as PC manufactures (OEM or “original equipment manufacturer in industry lingo) to not only include a copy of Internet Explorer with prominent desktop placement but also rule out shipping any alternative browsers. Redmond clearly had far more leverage than Mountain View over PC manufacturers: shipping any browser at all was icing on the cake, but a copy of the reigning OS was practically mandatory.

What started out as a sales/marketing strategy rapidly crossed over into the realm of software engineering when later releases of Windows began to integrate Internet Explorer in what MSFT claimed was an inextricable fashion. The government objected to this characterization: IE was an additional piece of software downloaded from the web or installed from CDs at the consumer’s discretion. Shipping a copy with Windows out-of-the-box may have been convenient to save users the effort of jumping through those installation hoops, but surely a version of Windows could also be distributed without this optional component

 When MSFT objected that these versions of Windows could not function properly without IE, the government sought out a parade of expert witnesses to disprove this. What followed was a comedy of errors on both sides. One expert declared the mission accomplished after removing the icon and primary executable, forgetting about all of the shared libraries (dynamic link library or DLL in Windows parlance) that provide the majority of browser functionality. IE was designed to be modular, to allow “embedding” the rendering engine or even subsets of functionality such as the HTTP stack into as many applications as possible. The actual “Internet Explorer” icon users clicked on was only the tip of the iceberg. Deleting that was the equivalent of arguing that the electrical system in a car can be safely removed by smashing the headlights and noting the car still drives fine without lights. Meanwhile MSFT botched its own demonstration of how a more comprehensive removal of all browser components results in broken OS functionality. A key piece of evidence entered by the defense was allegedly a screen recording from a PC showing everything that goes wrong with Windows when IE components are missing. Plaintiffs lawyers were quick to point out strange discontinuities and changes in the screenshots, eventually forcing MSFT into an embarrassing admission that the demonstration was spliced together from multiple sequences.

Holding back the tide

The next decade of developments would vindicate MSFT, proving that company leadership was fully justified in worrying about the impact of the web. MSFT mobilized to keep Windows relevant, playing the game on two fronts:

  1. Inject Windows dependencies into the web platform, ensuring that even if websites were accessible on any platform in theory, they worked best on Windows viewed in IE. Pushing ActiveX was a good example of this. Instead of pushing to standardize cross-platform APIs, IE added appealing features such as the initial incarnation of XML HTTP request as ActiveX controls. Another example was the addition of Windows-specific quirks into the MSFT version of Java. This provoked a lawsuit from Sun for violating the trademark “Java” with incompatible implementation. MSFT responded by deciding to remove the JVM from every product that previously shipped it.
  2. Stop further investments into the browser once it became clear that IE won the browser wars.  The development of IE4 involved a massive spike of resources. That release also marked the turning of the tide, and IE starting to win out in comparisons against Navigator 4. IE5 was an incremental effort by comparison. By IE6, the team had been reduced to a shadow of itself where it would remain for the next ten years until Google Chrome came upon the scene. (Even the “security push” in the early 2000s culminating in SP2 focused narrowly on cleaning up the cesspool of vulnerabilities in the IE codebase. It was never about adding features and enhancing functionality for a more capable web.)

This lack of investment from MSFT had repercussions far beyond the Redmond campus. It effectively put the web platform into deep freeze. HTML and JavaScript evolved very quickly in the 1990s. HTML2 was published as an RFC in 1995. Later the World Wide Web Consortium took up the mantle of standardizing HTML. HTML3 came out in 1997. It took less than a year for HTML4 to be published as an official W3C “recommendation”— what would be called a standard under any other organization. This was a time of rapid evolution for the web, with Netscape, MSFT and many other companies participating to drive the evolution forward. It would be another 17 years before HTML5 followed.

Granted MSFT had its own horse in the race with MSN, building out web properties and making key investments such as the acquisition of Hotmail. Some even achieved a modicum of success, such as the travel site Expedia which was spun out into a public company in 1999. But a clear consensus had emerged inside the company around the nature of software development. Applications accessed through a web browser were fine for “simple” tasks, characterized by limited functionality, with correspondingly low performance expectations: minimalist UI, laggy/unresponsive interface, only accessible with an Internet connection and even then constrained by the limits of broadband at the time. Anything more required native applications, installed locally and designed to target the Windows API. These were also called “rich clients” in a not-so-subtle dig at the implied inferiority of web applications.

Given that bifurcated mindset, it is no surprise the web browser became an afterthought in the early 2000s. IE had emerged triumphant from the first browser wars, while Netscape disappeared into the corporate bureaucracy of AOL following the acquisition. Mozilla Firefox was just staring to emerge phoenix-like from the open-sourced remains of the Navigator codebase, far from posing any threat to market share. The much-heralded Java applets in the browser that were going to restore parity with native applications failed to materialize. There were no web-based word processors or spreadsheets to compete against Office. In fact there seemed to be hardly any profitable applications on the web, with sites still trying to work out the economics of “free” services funded by increasingly annoying advertising. 

Meanwhile MSFT itself had walked away from the antitrust trial mostly unscathed. After losing the initial round in federal court after a badly botched defense, the company handily won at the appellate court. In a scathing ruling the circuit court not only reversed the breakup order but found the trial judge to have engaged in unethical, biased conduct. Facing another trial under a new judge, the DOJ blinked and decided it was no longer seeking a structural remedy. The dramatic antitrust trial of the decade ended with a whimper: the parties agreed to a mild settlement that required MSFT to modify its licensing practices and better documents its APIs for third-parties to develop interoperable software.

This outcome was widely panned as a minor slap-on-the-wrist by industry pundits, raising concerns that it left the company without any constraints to continue engaging in the same pattern of anticompetitive behavior. In hindsight the trial did have an important consequence that was difficult to observe from the outside: it changed the rules of engagement within MSFT. Highly motivated to avoid another extended legal confrontation that would drag on share price and distract attention, leadership grew more cautious about pushing the envelope around business practices. It may have been too little too late for Netscape, but this shift in mindset meant that when the next credible challenger to IE materialized in the shape of Google Chrome, the browser was left to fend for itself, competing strictly on its own merits. There would be no help from the OS monopoly.

Second chances for the web

More than any other company it was Google responsible for revitalizing the web as a capable platform for rich applications. For much of the 2000s, it appeared that the battle for developer mindshare had settled into a stalemate: HTML and Javascript were good for basic applications (augmented by the ubiquitous Adobe Flash for extra pizzaz when necessary) but any heavy lifting— CPU intensive computing, fancy graphics, interacting with peripheral devices— required a locally installed desktop application. Posting updates on social life and sharing hot-takes on recent events? Web browsers proved  perfectly adequate for that. But if you planned to crunch numbers on a spreadsheet with complex formulas, touch up high-resolution pictures or hold a video conference, the consensus held that you needed “real” software written in a low-level language such as C/C++ and directly interfacing with the operating system API.

Google challenged that orthodoxy, seeking to move more applications to the cloud. It was Google continually pushing the limits of what existing browsers can do, often with surprising results. Gmail was an eye opener for its responsive, fast UI as much as it was for the generous gigabyte of space every user received and the controversial revenue model driven by contextual advertising based on the content of emails. Google Maps— an acquisition, unlike the home-grown Gmail which had started out as one engineer’s side project— and later street view proved that even high resolution imagery overlaid with local search results could be delivered over existing browsers with decent user experience. Google Docs and Spreadsheets (also acquisitions) were even more ambitious undertakings aimed at the enterprise segment cornered by MSFT Office until that point.

These were mere opening moves in the overall strategic plan: every application running in the cloud, accessed through a web browser. Standing in the way of that grand vision was the inadequacy of existing browsers. They were limited in principle by the modest capabilities of standard HTML5 and Javascript APIs defined at the time, without venturing into proprietary, platform-dependent extensions such as Flash, Silverlight and ActiveX. They were hamstrung in practice even further thanks to the mediocre implementation of those capabilities by the dominant browser of the time, namely Internet Explorer. What good would innovative cloud applications do when users had to access them through a buggy, slow browser riddled with software vulnerabilities? (There is no small measure of irony that the 2009 “Aurora” breach of Google by Chinese APT started out with an IE6 SP2 zero-day vulnerability.)

Google was quick to recognize the web browser as a vital component for its business strategy, in much the same way MSFT correctly perceived the danger Netscape posed. Initially Google put its weight behind Mozilla Firefox. The search deal to become the default engine for Firefox (realistically, did anyone want Bing?) provided much of the revenue for the fledgling browser early on. While swearing by the benefits of having an open-source alternative to the sclerotic IE, Google would soon realize that a development model driven by democratic consensus came with one undesirable downsides: despite being a major source of revenue for Firefox, it could exert only so much influence over the product roadmap. For Google controlling its own fate all but made inevitable to embark on its own browser project.

Browser wars 2.0

Chrome was the ultimate Trojan horse for advancing the Google strategy: wrapped in the mantle of “open-source” without any of the checks-and-balances of an outside developer community to decide what features are prioritized (a tactic that  Android would soon come to perfect in the even more cut-throat setting of mobile platforms.) Those lack of constraints allowed Google to move quickly and decisively on the main objective: advance the web platform. Simply shipping a faster and safer browser alone would not have been enough to achieve parity with desktop applications. HTML and Javascript itself had to evolve.

More than anything else Chrome gave Google a seat at the table for standardization of future web technologies. While work on HTML5 had started in 2004 at the instigation of Firefox and Opera representatives, it was not until Chrome reignited the browser wars that bits and pieces of the specification began to find their way into working code. Crucially the presence of a viable alternative to IE meant standardization efforts were no longer an academic exercise. The finished output of W3C working groups is called a “recommendation.” That is no false modesty in terminology because at the end of the day W3C has no authority or even indirect influence to compel browser publishers to implement anything. In a world where most users are running an outdated version of IE (with most desktops were stuck on IE6 SP2 or IE7) the W3C can keep cranking out enhancements to HTML5 on paper without delivering any tangible benefit to users. It’s difficult enough to incentivize websites to take advantage of new features. The path of least resistance already dictates coding for the least common denominator. Suppose some website crucially depends on a browser feature missing from 10% of visitors who are running an ancient version of IE. Whether they do not care enough to upgrade, or perhaps can not upgrade as with enterprise users at the mercy of their IT department for software choice, these users will be shut out of using the website, representing a lost revenue opportunity. By contrast a competitor with more modest requirements from their customers’ software, or alternatively a more ambitious development mindset dedicated to backwards compatibility will have no problem monetizing that segment.

The credibility of a web browser backed by the might of Google shifted that calculus. Observing the clear trend with Chrome and Firefox capturing market share from IE (and crucially, declining share of legacy IE versions) made it easier to justify building new applications for a modern web incorporating the latest and greatest from the W3C drawing board: canvas, web-sockets, RTC, offline mode, drag & drop, web storage… It no longer seemed like questionable business judgment to bet on that trend and build novel applications assuming a target audience with modern browsers. In 2009 YouTube engineers snuck in a banner threatening to cut off support for IE, careful to stay under the radar lest their new overlords at Google object to this protest. By 2012 the tide had turned to the point that an Australian retailer began imposing a surcharge on IE7 users to offset the cost of catering to their ancient browser.

While the second round of the browser wars is not quite over, some conclusions are obvious. Google Chrome has a decisive lead over all other browsers especially in the desktop market. Firefox share is declining, creating doubts about the future of the only independent open-source web browser that can claim the mantle of representing users as stakeholders. As for MSFT, despite getting its act together and investing in auto-update functionality to avoid getting stuck with another case of the “IE6 installed-base legacy” problem, Internet Explorer versions have steadily lost market share during the 2010s. Technology publications cheered on every milestones such as the demise of IE6 and the “flipping” point when Google Chrome reached 50%. Eventually Redmond gave up and decided to start over with a new browser altogether dubbed “Edge,” premised on a full rewrite instead of incremental tweaks. That has not fared much better either. After triumphantly unveiling a new HTML rendering engine written from scratch to replace IE’s “Trident,” MSFT quickly threw in the towel, announcing that it would adopt Blink— the engine from Chrome. (In as much as MSFT of the 1990s was irrationally combative in its rejection of technology not invented in Redmond, its current incarnation had no qualms admitting defeat and making pragmatic business decisions to leverage competing platforms.) Despite multiple legal skirmishes with EU regulators over its ads and browser monopoly, there are no credible challengers to Chrome on the desktop today. When it comes to market power, Google Chrome is the new IE.

The browser disruption in hindsight

Did MSFT overreact to the Netscape Navigator threat and knee-cap itself by inviting a regulatory showdown through its aggressive business tactics? Subsequent history vindicates the company leadership in correctly judging the disruption potential but not necessarily the response. It turned out the browser was indeed a critical piece of software— it literally became the window users through which users experience the infinite variety of content and applications beyond the narrow confines of their local device. Platform-agnostic and outside the control of companies providing the hardware/software powering that local device, it was an escape hatch out of the “Wintel” duopoly. Winning the battle against Netscape diffused that immediate threat for MSFT. Windows did not become “a set of poorly debugged device drivers” as Netscape’s Marc Andreessen had once quipped.

An expansive take on “operating system”

MSFT was ahead of its time in another respect: browsers are considered an intrinsic component of the operating system, a building block for other applications to leverage. Today a consumer OS shipping without some rudimentary browser out-of-the-box would be an anomaly. To pick two examples comparable to Windows:

  • MacOS includes Safari starting with the Panther release in 2003.
  • Ubuntu desktop releases come with Firefox as the default browser.

On the mobile front, browser bundling is not only standard but pervasive in its reach:

  • iOS not only ships a mobile version of Safari but the webkit rendering engine is tightly integrated into the operating system, as the mandatory embedded browser to be leverage by all other apps that intend to display web content. In fact until recently Apple forbid shipping any alternative browser that is not built on webkit. The version of “Chrome” for iOS is nothing more than a glossy paint-job over the same internals powering Safari. Crucially, Apple can enforce this policy. Unlike desktop platforms with their open ecosystem where users are free to source software from anywhere, mobile devices are closed appliances. Apple exerts 100% control on software distribution for iOS.
  • Android releases have included Google Chrome since 2012. Unlike iOS, Google has no restrictions on alternative browsers as independent applications. However embedded web views in Android are still based on the Chrome rendering engine.

During the antitrust trial, some astute observers pointed out that only a few years ago even the most rudimentary networking functionality— namely the all important TCP/IP stack— was an optional component in Windows. Today it is not only a web browser that has become table stakes. Here are three examples of functionality once considered strictly distinct lines of business from providing an operating system:

  1. Productivity suites: MacOS comes with Pages for word processing, Sheets for spreadsheets and Keynote for crafting slide-decks. Similarly many Linux distributions include LibreOffice suite which includes open-source replacements for Word, Excel, PowerPoint etc. (This is a line even MSFT did not cross: to this day no version of Windows includes a copy of the “Office suite” understood as a set of native applications.)
  2. Video conferencing and real-time collaboration: Again each vendor has been putting forward their preferred solution, with Google including Meet (previously Hangouts), Apple promoting FaceTime and MSFT pivoting to Teams after giving up on Skype.
  3. Cloud storage: To pick an example where the integration runs much deeper, Apple devices have seamless access to iCloud storage, Android & ChromeOS are tightly coupled to Google Drive for backups. Once the raison d’être of unicorn startups Dropbox and Box, this functionality has been steadily incorporated into the operating system casting doubt on the commercial prospects of these public companies. Even MSFT has not shied away from integrating its competing OneDrive service with Windows.

There are multiple reasons why these examples raise few eyebrows from the antitrust camp. In some cases the applications are copycats or also-rans: Apple’s productivity suite can interop with MSFT Office formats (owing in large part to EU consent decree that forced MSFT to start documenting its proprietary formats) but still remains a pale imitation of the real thing. In other cases, the added functionality is not considered a strategic platform or has little impact on the competitive landscape. FaceTime is strictly a consumer-oriented product that has no bearing on the lucrative enterprise market. While Teams and Meet have commercial aspirations, they face strong headwinds competing against established players Zoom and WebEx specializing in this space. No one is arguing that Zoom is somehow disadvantaged on Android because it has to be installed as a separate application from Play Store. But even when integration obviously favors an adjacent business unit— as in the case of mobile platforms creating entrenched dependencies on the cloud storage offering from the same company— there is a growing recognition that the definition of an “operating system” is subject to expansion. Actions that once may have been portrayed as leveraging platform monopoly to take over another market— Apple & Google rendering Dropbox irrelevant— become the natural outcome of evolving customer expectations.

Safari on iOS may look like a separate application with its own icon, but it is also the underlying software that powers embedded “web views” for all other iOS apps when those apps are displaying web content inside their interface. Google Chrome provides a similar function for Android apps by default. No one in their right mind would resurrect the DOJ argument of the 1990s that a browser is an entirely separate piece of functionality and weaving it into the OS is an arbitrary marketing choice without engineering merits. (Of course that still leaves open the question of whether that built-in component should be swappable and/or extensible. Much like authentication or cryptography capabilities for modern platforms have an extensibility mechanism to replace default, out-of-the-box software with alternatives, it is fair to insist that the platform allow substituting a replacement browser designated by the consumer.) Google turned the whole model upside down with Chromebooks, building an entire operating system around a web browser.

All hail the new browser monopoly

Control over the browser temporarily handed MSFT significant leeway over the future direction of the web platform. If that platform remained surprisingly stagnant afterwards— compared to its frantic pace of innovation during the 1990s— that was mainly because MSFT had neither the urgency or vision to take it to the next level. (Witness the smart tags debacle.) Meanwhile the W3C ran around in circles, alternating between incremental tweaks— introducing XHTML, HTML repackaged as well-formed XML— and ambitious visions of a “semantic web.” The latter imagined a clean separation of content from style, two distinct layers HTML munged together, making it possible for software to extract information, process it and combine it in novel ways for the benefit of users. Outside the W3C there were few takers. Critics derided it as angle-brackets-everywhere: XSLT, XPath, XQuery, Xlink. The semantic web did not get the opportunity it deserved for large-scale demonstration to test its premise. For a user sitting in front of their browser and accessing websites, it would have been difficult to articulate the immediate benefits. Over time Google and ChatGPT would prove machines were more than adequate for grokking unstructured information on web pages even without the benefit of XML tagging.

Luckily for the web, plenty of startups did have more compelling visions of how the web should work and what future possibilities could be realized— given the right capabilities. This dovetailed nicely with the shift in emphasis from shipping software to operating services. (It certainly helped that the economics were favorable. Instead of selling a piece of software once for a lump sum and hope the customer upgrades when the next version comes out, what if you could count on a recurring source of revenue from monthly subscriptions?) The common refrain for all of these entrepreneurs: the web browser had become the bottleneck. PCs kept getting faster and even operating systems became more capable over time, but websites could only access a tiny fraction of those resources through HTML and Javascript APIs, and only through a notoriously buggy, fragile implementation held together by duct-tape: Internet Explorer.

In hindsight it is clear something would change; there was too much market pressure against a decrepit piece of software guarding an increasingly untenable OS monopoly. Surprisingly that change came in the form of not one but two major developments in the 2010s. One shift had nothing to do with browsers: smart-phones gave developers a compelling new way to reach users. It was a clean slate, with new powerful APIs unconstrained by the web platform. MSFT did not have a credible response to the rise of iOS and Android any more than it did to Chrome. Windows Mobile never made much inroads with device manufacturers, despite or perhaps because of the Nokia acquisition. It had even less success winning over developers, failing to complete the virtuous cycle between supply & demand that drives platform. (At one point a desperate  MSFT started outright offering money to publishers of popular apps to port their iOS & Android apps to Windows Mobile.)

Perhaps the strongest evidence that MSFT judged the risk accurately comes from Google Chrome itself. Where MSFT saw one-sided threat to the Windows and Office revenue streams, Google perceived a balanced mix of opportunity and risk. The “right” browser could accelerate the shift to replace local software with web applications— such as the Google Apps suite— by closing the perceived functionality gap between them. The “wrong” browser would continue to frustrate that shift or even push the web towards another dead-end proprietary model tightly coupled to one competitor. Continued investment in Chrome is how the odds get tilted towards the first outcome. Having watched MSFT squander its browser monopoly with years of neglect, Google knows better than to rest on its laurels.

CP

The elusive nature of ownership in Web3

A critical take on Read-Write-Own

In the recently published “Read Write Own,” Chris Dixon makes the case that blockchains allow consumers to capture more of the returns from the value generated in a network because of the strongly enshrined rules of ownership. This is an argument about fairness: the value of networks is derived from the contributions of participants. Whether it is Facebook users sharing updates with their network or Twitter/X influencers opining on latest trends, it is Metcalfe’s law that allows these systems to become so valuable. But as the history of social networks has demonstrated time and again, that value accrues to a handful of employees and investors who control the company. Not only do customers not capture any of those returns (hence the often used analogy of “sharecroppers” operating on Facebook’s land) they are stuck with the negative externalities, including degraded privacy, disinformation and in the case of Facebook, repercussions that spill out into the real-world including outbreaks of violence.

The linchpin of this argument is that blockchains can guarantee ownership in ways that the two prevailing alternatives (“protocol networks” such as SMTP or HTTP and the better-known “corporate networks” such as Twitter) can not. Twitter can take away any handle, shadow-ban the account or modify their ranking algorithms to reduce its distribution. By comparison if you own a virtual good such as some NFT issued on a blockchain, no one can interfere with your rightful ownership of that asset. This blog post delves into some counterarguments on why this sense of ownership may prove illusory in most cases. The arguments will run from the least-likely and theoretical to most probably, in each case demonstrating ways these vaunted property rights fail.

Immutability of blockchains

The first shibboleth that we can dispense with is the idea that blockchains operate according to immutable rules cast in stone. An early dramatic illustration of this came about in 2016, as a result of the DAO attack on Ethereum. The DAO was effectively a joint investment project operated by a smart-contract on the Ethereum chain. Unfortunately that contract had a serious bug, resulting in an critical security vulnerability. An attacker exploited that vulnerability to drain most of the funds, to tune of $150MM USD notional at the time.

This left the Ethereum project with a difficult choice. They could double down on the doctrine that Code-Is-Law and let the theft stand: argue that the “attacker” did nothing wrong, since they used the contract in exactly the way it was implemented. (Incidentally, that is a mischaracterization of the way Larry Lessig intended that phrase. “Code and other laws of cyberspace” where the phrase originates was prescient in warning about the dangers in allowing privately developed software, or “West Coast Code” as Lessig termed it, to usurp democratically created laws or “East Coast Code” in regulating behavior.) Or they could orchestrate a difficult, disruptive hard-fork to change the rules governing the blockchain and rewrite history to pretend the DAO breach never occurred. This option would return stolen funds back to investors.

Without reopening the charged debate around which option was “correct” from an ideological perspective, we note the Ethereum foundation emphatically took the second route. From the attacker perspective, their “ownership” of stolen ether proved very short lived.

While this episode demonstrated the limits of blockchain immutability, it is also the least relevant to the sense of property rights that most users are concerned about. Despite fears that the DAO rescue could set a precedent and force the Ethereum foundation to repeatedly bailout vulnerable projects, no such hard-forks followed. Over the years much larger security failures occurred on Ethereum (measured in notional dollar value) with the majority attributed with high confidence to rogue states such as North Korea. None of them merited so much as a serious discussion of whether another hard-fork is justified to undo the theft and restore the funds to rightful owners. If hundreds of million dollars in tokens ending up in the coffers of a sanctioned state does not warrant breaking blockchain immutability, it is fair to say the average NFT holder has little reason to fear that some property dispute will result in blockchain-scale reorganization that takes away their pixelated monkey images.

Smart-contract design: backdoors and compliance “features”

Much more relevant to the threat model of a typical participant is the way virtual assets are managed on-chain: using smart-contracts that are developed by private companies and often subject to private control. Limiting our focus to Ethereum for now, recall that the only “native” asset on chain is ether. All other assets such as fungible ERC-20 tokens and collectible NFTs must be defined by smart contracts, in other words software that someone authors. Those contracts govern the operation of the asset: conditions under which it can be “minted”— in other words, created out of thin air—transferred or destroyed. To take a concrete example: a stablecoin such as Circle (USDC) is designed to be pegged 1:1 to the US dollar. More USDC is issued on chain when Circle the company receives fiat deposits from a counterparty requesting virtual assets. Similarly USDC must be taken out of circulation or “burned” when a counterparty returns their virtual dollars and demands ordinary dollars back in a bank account.

None of this is surprising. As long the contract properly enforces rules around who can invoke those actions on chain, this is exactly how one envisions a stablecoin to operate. (There is a separate question around whether the 1:1 backing is maintained, but that can only be resolved by off-chain audits. It is outside the scope of enforcement by blockchain rules.) Less appreciated is the fact that most stablecoins contracts also grant the operator ability to freeze funds or even seize assets from any participant. This is not a hypothetical capability; issuers have not shied away from using it when necessary. To pick two examples:

While the existence of such a “backdoor” or “God mode” may sound sinister in general, these specific interventions are hardly objectionable. But it serves to illustrate the general point: even if blockchains themselves are immutable and arbitrary hard-forks a relic of the past, virtual assets themselves are governed not by “native” rules ordained by the blockchain, but independent software authored by the entity originating that asset. That code can include arbitrary logic granting the issuer any right they wish to reserve.

To be clear, that logic will be visible on-chain for anyone to view. Most prominent smart-contracts today have their source code published for inspection. (For example, here is the Circle USD contract.) Even if the contract did not disclose its source code, the logic can be reverse engineered from the low-level EVM bytecode available on chain. In that sense there should be no “surprises” about whether an issuer can seize an NFT or refuse to honor a transfer privately agreed upon by two parties. One could argue that users will not purchase virtual assets from issuers who grant themselves such broad privileges to override property rights by virtue of their contract logic. But that is a question of market power and whether any meaningful alternative exists for consumers who want to vote with their wallet. It may well become the norm that all virtual assets are subject to permanent control by the issuer, something users accept without a second thought much like the terms-of-use agreements one clicks through without hesitation when registering for advertising-supported services. The precedent with stablecoins is not encouraging: Tether and Circle are by far the two largest stablecoins by market capitalization. The existence of administrative overrides in their code was no secret. Even multiple invocations of that power has not resulting in a mass exodus of customer into alternative stablecoins.

When ownership rights can be ignored

Let’s posit that popular virtual assets will be managed by “fair” smart-contracts without designed-in backdoors that would enable infringement of ownership rights. This brings us to the most intractable problem: real-world systems are not bound by ownership rights expressed on the blockchain.

Consider the prototypical example of ownership that proponents argue can benefit from blockchains: in-game virtual goods. Suppose your game character has earned a magical sword after significant time spent completing challenges. In most games today, your ownership of that virtual sword is recorded as an entry in the internal database of the game studio, subject to their whims. You may be allowed to trade it, but only on a sanctioned platform most likely affiliated with the same studio. The studio could confiscate that item because you were overdue on payments or unwittingly violated some other rule in the virtual universe. They could even make the item “disappear” one day if they decide there are too many of these swords or they grant an unfair advantage. If that virtual sword was instead represented by an NFT on chain, the argument runs, the game studio would be constrained in these types of capricious actions. You could even take the same item to another gaming universe created by a different publisher.

On the face of it, this argument looks sound, subject to the caveats about the smart-contract not having backdoors. But it is a case of confusing the map with the territory. There is no need for the game publisher to tamper with on-chain state in order to manipulate property rights; nothing prevents the game software from ignoring on-chain state. On-chain state could very well reflect that you are the rightful owner of that sword while in-game logic refuses to render your character holding that object. The game software is not running on the blockchain or in any way constrained by the Ethereum network or even the smart-contract managing virtual goods. It is running on servers controlled by a single company— the game studio. That software may, at its discretion, consult the Ethereum blockchain to check on ownership assignments. That is not the same as being constrained by on-chain state. Just because the blockchain ledger indicates you are the rightful owner of a sword or avatar does not automatically force the game rendering software to depict your character with those attributes in the game universe. In fact the publisher may deliberately depart from on-chain state for good reasons. Suppose an investigation determines that Bob bought that virtual sword from someone who stole it from Alice. Or there have been multiple complaints about a user-designed avatar being offensive and violating community standards. Few would object to the game universe being rendered in a way that is inconsistent with on-chain ownership records under these circumstances. Yet the general principle stands: users are still subject to the judgment of one centralized entity on when it is “fair game” to ignore blockchain state and operate as if that virtual asset did not exist.

Case of the disappearing NFT

An instructive case of  “pretend-it-does-not-exist” took place in 2021 when Moxie Marlinspike created a proof-of-concept NFT that renders differently depending on which website it is viewed from. Moxie listed the NFT on OpenSea, at the time the leading marketplace for trading NFTs. While it was intended in good spirit as a humorous demonstration of the mutability and transience of NFTs, OpenSea was not amused. Not only did they take down the listing, but the NFT was removed from the results returned by OpenSea API. As it turns out, a lot of websites rely on that API for NFT inventories. Once OpenSea ghosted Moxie’s API, it is as if the NFT did not exist. To be clear: OpenSea did not and could make any changes to blockchain state. The NFT was still there on-chain and Moxie was its rightful owner as far as the Ethereum network is concerned. But once the OpenSea API started returning alternative facts, the NFT vanished from view for every other service relying on that API instead of directly inspecting the blockchain themselves. (It turns out there were a lot of them, further reinforcing Moxie’s critique of the extent of centralization.)

Suppose customers disagree with the policy of the game studio. What recourse do they have? Not much within that particular game universe, anymore than the average user has any leverage with Twitter or Facebook in reversing their trust & safety decisions. Users can certainly try to take the same item to another game but there are limits to portability. While blockchain state is universal, game universes are not. The magic sword from the medieval setting will not do much good in a Call Of Duty title set in WW2.

In that sense, owners of virtual game assets are in a more difficult situation than Moxie with his problematic NFT. OpenSea can disregard that NFT but can not preclude listing on competing marketplaces or even arranging private sales to a willing buyer who values it on collectible or artistic merits. It would be the exact same situation if OpenSea for some bizarre reason came to insist that you do not own a bitcoin that you rightfully own on blockchain. OpenSea persisting at such a delusion would not detract in any way from the value of your bitcoin. Plenty of sensible buyers exist elsewhere who can form an independent judgment about blockchain state and accept that bitcoin in exchange for services. But when the value of a virtual asset is determined primarily by its function within a single ecosystem— namely that of the game universe controlled by a centralized publisher— what those independent observers think about ownership status carries little weight.

CP

We can bill you: antagonistic gadgets and dystopian visions of Philip K Dick

Dystopian visions

Acclaimed science-fiction author Isaac Asimov’s stories on robots involved a set of three rules that all robots were expected to obey:

  1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
  2. A robot must obey orders given it by human beings except where such orders would conflict with the First Law.
  3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

The prioritization leaves no ambiguity in the relationship between robots and their creators. Regardless of their level of artificial intelligence and autonomy, robots were to avoid harm to human beings. Long before sentient robots running amok and turning against their creators became a staple of science-fiction (Mary Shelly’s 19th century novel“Frankenstein” could be seen as their predecessor) Asimov systematically was formulating the intended relationship. Ethical implications of artificial intelligence are a recurring theme today. Can our own creations end up undermining humanity after they achieve sentience? But there are far more subtle and less imaginative ways that technology works against people in everyday settings, and this involves emergent AI to blame. This version too was predicted by science-fiction.

Three decades after Asimov, the dystopian imagination of Philip K Dick produced a more conflicted relationship between man and his creations. In the 1969 novel “Ubik”, the protagonist inhabits a world where advanced technology controls basic household functions from kitchen appliances to locks on the door. But there is a twist: all of these gadgets operate on what today would be called a subscription model. The coffee-maker refuses to brew the morning cup of joe until coins are inserted. (For all the richness and wild imagination of his alternate realities, PKD did not bother devising an alternative payment system for this future world.) When he runs out of money, he is held hostage at home; his front-door will not open without coins.

Compared to some of the more fanciful alternate universes brought to life in PKD fiction— Germany winning World War II in “The man in the high castle” or an omniscient police-state preemptively arresting criminals before they commit crimes as in “The minority report”— this level of dystopia is mild, completely benign. But it is also one that bears a striking resemblance to where the tech industry is stridently marching towards. Consumers are increasingly losing control over their devices— devices which they have fully and rightfully paid for. Not only is the question of ownership being challenged with increasing restrictions on what they can do to hardware they have every right to expect 100% control over, but those devices are actively working against the interests of the consumer, doing the bidding of third-parties be it the manufacturer, the service-provider or possibly the government.

License to tinker

There is a long tradition in American culture of hobbyists tinkering with their gadgets. This predates the Internet or the personal computer. Perhaps automobiles and motorcycles were the first technology which lent itself to mass tinkering. Mass production made cars accessible to everyone and for those with a knack for spending long hours in the garage, they were relatively easy to modify- for different aesthetics or better performance under the hood. In one sense the hot-rodders of the 1950s and 1960s were one of the cultural predecessors of today’s software hobbyists. Cars at the time were relatively low-tech; with carburetors and manual transmission being the norm, spending a few months in high school shop-class would provide adequate background. But more importantly the platform was tinkering-friendly. Manufacturers did not go out of their way to prevent buyers from modifying their hardware. That is partly related to technical limitations. It is not as-if cars could be equipped with tamper-detection sensors to immobilize the vehicle if the owner installed parts the manufacturer did not approve of. But more importantly, ease of customization was itself considered a competitive advantage. In fact some of the most cherished vehicles of the 20th century including muscle-cars, V-twin motorcycles and air-cooled Volkswagens owed part of their iconic status to their vibrant aftermarket for mods.

Natural limits existed on how far owners could modify their vehicle. To drive on public-roads, it had to be road-legal after all. One could install a different exhaust system to improve engine sound, but not have flames shooting out the back. More subtly an economic disincentive existed. Owners risked giving up on their warranty coverage for modified parts, a significant consideration given that Detroit was not exactly known for high-quality, low-defect manufacturing at the time. But even that setback was localized. Replace the stereo or rewire the speakers yourself, and you can no longer complain about electrical system malfunctions. But you would still expect the transmission to operate as advertised and the manufacturer to continue honoring any warranty coverage for the drivetrain. There was no warning sticker anywhere that loosening this or that bolt would void the entire warranty on every other part of the vehicle. Crucially consumers were given a meaningful choice: you are free to modify the car for personal expression in exchange for giving up warranty claims against the manufacturer.

From honor code to software enforcement

Cars from the golden-era of hot-rodding were relatively dumb gadgets. Part of the reason manufacturers did not have much of a say in how owners could modify their vehicle is that they had no feasible technology to enforce those restrictions once the proud new owner drove it off the lot. By contrast, software can enforce very specific restrictions on how a particular system operates. In fact it can impose entirely arbitrary limitations to disallow specific uses of the hardware, even when the hardware itself is perfectly capable of performing those functions.

Here is an example. In the early days Windows NT 3.51 had two editions: workstation and server, differentiated by the type of scenario they were intended for. The high-end server SKU supported machines with up to 8 processors while the workstation maxed out at 2. If you happened to have more powerful hardware, even if you did not need any of the bells-and-whistles of server, you had to spring for the more expensive product. (Note: there is a significant difference between uniprocessor and multiprocessor kernels; juggling multiple CPUs requires substantial changes but going from 2 to 8 processors does not.) What was the major difference between those editions? From an economical perspective, $800 measured in 1996 dollars. From a technology perspective, handful of bytes in a registry key describing which type of installation occurred. As noted in a 1996 article titled Differences Between NT Server and Workstation Are Minimal:

“We have found that NTS and NTW have identical kernels; in fact, NT is a single operating system with two modes. Only two registry settings are needed to switch between these two modes in NT 4.0, and only one setting in NT 3.51. This is extremely significant, and calls into question the related legal limitations and costly upgrades that currently face NTW users.”

There is no intrinsic technical reason why the lower-priced edition could not take advantage of more powerful hardware, or for that matter, allow more than 10 concurrents connections to function as a web-server— as Microsoft later relented after customer backlash. These are arbitrary calls made by someone on sales team who, in their infinite wisdom, concluded that customers with expensive hardware or web-sites ought to pay more for their operating system.

Two tangents worth exploring about this case. First the proprietary nature of the software and its licensing model is crucial for enforcing these types of policies. Arbitrary restrictions would not fly with open-source software. If a clueless vendor shipped a version of Linux with random, limit on the number of CPUs or memory which does not originate from technical limitations, customers could modify the source-code to lift that restriction. Second, the ability to enforce draconian restrictions dreamt up by marketing is greatly constrained by platform limitations. That’s because the personal computer is an open platform. Even with a proprietary operating system such as Windows, users get full control over their machine. You could edit the registry or tamper with OS logic to trigger an identity crisis between workstation/server.  Granted, that would be an almost certain violation of the shrink-warp license nobody read when installing the OS. MSFT would not look kindly upon this practice if carried out large scale. It’s one thing for hobbyists to demonstrate the possibility as a symbolic gesture; it is another level of malicious intent for an enterprise with thousands of Windows licenses to engage in systematic software piracy by giving themselves a free upgrade. So at the end of the day, enforcement still relied on messy social norms and imperfect contractual obligations. Software did not aspire to replace the conscience of the consumer, to stop them from perceived wrongdoing at all costs.

Insert quarters to continue

In fact software licensing in the enterprise has a history of such arbitrary restrictions enforced through a combination of business logic implemented in proprietary-software along with dubious reliance on over-arching “terms of use” that discourage tampering with said logic. To this day copies of Windows Server are sold with client-licenses, dictating the number of concurrent users that the server is willing to support. If the system is licensed for 10 clients, the eleventh user attempting to connect will be turned away regardless of how much spare CPU or memory capacity is left. You must purchase more licenses. In other words: insert quarters to continue.

Yet this looks very different than Philip K Dick’s dystopian coffeemaker and does not elicit anywhere near the same level of indignation. There are several reasons for that. First, enterprise software has acclimatized to the notion of discriminatory pricing. Vendors extract higher prices from companies who are in a position to pay. The reasoning goes: if you can afford that fancy server with a dozen CPUs and boatloads of memory, surely you can also spring for the high-end edition of Windows server that will agree to fully utilize the hardware? Second, the complex negotiations around software licensing are rarely surfaced to end-users. It is the responsibility of the IT department to work out how many licenses are required and determine the right mix of hardware/software required to support the business. If an employee is unable to perform her job because she is turned down by a server having reached its cap on maximum simultaneous users—an arbitrary limit that exists only in the realm of software licensing it must be noted, not in the absolute resources available in the underlying hardware— she is not expected to solve that problem by taking out her credit-card and personally paying for the additional license. Finally this scenario is removed from everyday considerations. Not everyone works in a large enterprise subject to stringent licensing rules, and even for those who are unlucky enough to run into this situation, the inconvenience created by an uncooperative server is relatively mild- far cry from the front-door that refuses to open and locks its occupants inside.

From open platforms to appliances

One of more disconcerting trends of the past decade is that what used to be norm in the enterprise segment is now trickling down into the consumer space. We may not have  coffeemakers operating on a subscription model yet. Doors that demand payment for performing their basic function would likely never pass fire-code regulations. But gradually consumer electronics have started imposing greater restrictions on their alleged owners, restrictions that are equally arbitrary and disconnected from the capabilities of hardware, chosen unilaterally by their manufacturers. Consider some examples from consumer electronics:

  • Region coding in DVD players. DVD players are designed to play content manufactured only for a specific region, even though in principle there is nothing that prevents the hardware from being able to play discs purchased anywhere in the world. Why? Because of disparities in purchasing power, DVDs are priced much lower in developing regions than they are in Western countries. If DVD players sold to American consumers could play content, it would suddenly become possible to “arbitrage” this price difference by purchasing cheap DVDs in say Taiwan and play them in the US. Region coding protects the revenue model of content providers, which depends crucial on price discrimination: charging more to US consumers for the same thing as Taiwanese consumers because they can afford to pay higher prices for movies.
  • Generalizing from the state of DVD players, any Digital Rights Management (or as it has been derisively called “digital restrictions management”)  technology is an attempt to hamper the capabilities of software/hardware platforms to further the interests of content owners. While rest of the software industry is focused on doing more with existing resources— squeeze more performance out of the CPU, add more features to an application that users will enjoy— those working on DRM are trying to get devices to do less. Information is inherently copyable; DRM tries to stop users from copying bits. By default audio and video signals can be freely sent to any output device; HDMI tries to restrict where they can be routed in the name of battling piracy. The restrictions do not even stop with anything involving content. Because the PC platform is inherently open, DRM enforcement inevitably takes an expansive view of its mission and begins to monitor the user for any signs that they could be doing perfectly valid activity that could potentially undermine DRM such as installing unsigned device drivers or enabling kernel-mode debugging on Windows.
  • Many cell phones sold in North America are “locked” to a specific carrier, typically the one where the customer bought their phone from. It is not possible to switch to another wireless carrier while keeping the device. Again there is no technical reason for this. Much like the number of processors that an operating system will agree to run on, it is an arbitrary setting. (In fact it takes more work to implement such checks.) The standard excuse is that cost of devices are greatly subsidized by the carrier in hidden costs buried into the service contract. But this argument fails basic sanity checks. Presumably the subsidy is paid-off after some number of months but phones remain locked. Meanwhile customers who bring their own unlocked device are not rewarded with any special discounts, effectively distorting the market. Also carriers already charge an early termination fee to customers who walk away from their contract prematurely, surely they can also include additional costs to cover the lost subsidy?
  • Speaking of cell phones, they are increasingly becoming more and more locked down appliances to use the terminology from Zittrain’s “The future of the Internet,” instead of open computing platforms. Virtually all PCs allow users to replace the operating system. Not a fan of Windows 8? Feel free to wipe the slate clean and install Linux. Today consumers can even purchase PCs preloaded with Linux to escape the dreaded “Microsoft tax” where cost of Windows licenses are implicitly factored into hardware prices. And if the idea of Linux-on- the-desktop turns out to be wishful thinking yet again, you can repent and install Windows 10 on that PC which came with Ubuntu out of the box. By contrast phones ship with one operating system picked by the all-knowing manufacturer and it is very difficult to change that. On the surface, consumers have plenty of choice because they can pick from thousands of apps written for that operating system. Yet one level below that, they are stuck with the operating system as an immutable choice. In fact, some Android devices never receive software updates from the manufacturer or carrier, so they are “immutable” in a very real sense. Users must go out of their way to exploit a security vulnerability in order to jailbreak/root their devices to replace the OS wholesale or even extend its capabilities in ways the manufacturer did not envision. OEMs further exploit this confusion to discourage users from tinkering with their devices, trying to equate such practices with weakening security— as if users are better off sticking to their abandoned “stock” OS with known vulnerabilities that will never get patched.

Unfree at any speed

Automative technology too is evolving in this direction of locked-down appliances. Cars remained relatively dumb until the 1990s when microprocessors slowly started making their way into every system, starting with engine management. On the way to becoming more software- driven, effectively computers-on-wheels, something funny happened: the vehicle gained greater capability to sense present conditions and more importantly, it became capable of reacting to these inputs. Initially this looks like an unalloyed good. All of the initial applications are uncontroversial, improving occupant safety: antilock brakes, airbags and traction control. All depend on software monitoring input from sensors and promptly responding to signals indicating that a dangerous condition is imminent.

The next phase may be less clear-cut, as enterprising companies continue pushing the line between choice and coercion. Insurers such as Geico offer pay-per-mile plans that use gadgets attached to the OBD-II port to collect statistics on how far the vehicle is driven, and presumably on how aggressively the driver attacks corners. While some may consider this an invasion of privacy, at least there is a clear opt-out: do not sign up for that plan. In other cases, opt-out becomes ambiguous. GM found itself in a pickle over the Stingray Corvette recording occupants with a camera in the rearview mirror. This was a feature not a bug, designed to create YouTube-worthy videos while the car was being put through its paces. But if occupants are not aware that they are being recorded, it is not clear they consented to appearing as extras in a Sebastian-Vettel-role-playing game. At the extreme end of the informed consent scale is use of remote immobilizers for vehicles sold to consumers with subprime credit. In these cases the dealers literally get a remote kill-switch for disabling operation of the vehicle if the consumer fails to stay current on payments. (At least that is the idea; NYT reports allegations of mistaken or unwarranted remote shutdown by unscrupulous lenders.) One imagines the next version of these gadgets will incorporate a credit-card reader to better approximate the PKD dystopia. Insert quarters to continue.

What is at stake here is a question of fairness and rights, but not in the legal sense. Very little has changed about the mechanics of consumer financing: purchasing a car on credit still obligates the borrower to make payments promptly until the balance is paid off. Failure to fulfill that obligation entitles the seller to repossess the vehicle. This is not some new-fangled notion of how to handle loans in default; the right to repossess or foreclose has always existed on the books. In practice exercising that right often required some dramatic, made-for-TV adventures in for tracking down the consumer or vehicle in question. Software has greatly amplified ability of lenders to enforce their rights and collect on their entitlements under the law.

From outright ownership to permanent tenancy

Closely related is a shift from ownership to subscription models. Software has made it possible to recast what used to be one-time purchases into ongoing subscriptions or pay-per-use models. Powerful social norms exist around how goods are distributed according to one or other model. No one expects that they can pay for electricity or cable with a lump-sum payment once and call it a day, receiving free service in perpetuity. If you stop paying for cable, the screen will eventually go blank. By contrast hardware gadgets such as television sets are expected to operate according to a different model: once you bring it home, it is yours. It may have been purchased with borrowed money, with credit extended by the store or your own credit-card issuer. But consumers would be outraged if their bank, BestBuy or TV manufacturer remotely reached out to brick their television in response to late on payments. Even under most subscription models, there are strict limitations on how service providers can retaliate against consumers breaking the contract.  If you stop paying for water, the utility can shut off future supply of water. They can not send vigilantes over to your house to drain the water tank or “take back” water you are presumably no longer entitled to.

Such absurd scenarios can and do happen in software. Perhaps missing the symbolism, Amazon remotely wiped copies of George Orwell’s 1984 from Kindles over copyright problems. (The irony can only be exceeded if Amazon threatens to remove  copies of Philip K Dick’s “Ubik” unless customers pay up.) These were not die-hard Orwell fans or DMCA protestors deliberately pirating the novel; they had purchased their copies from the official Amazon store. Yet the company defended its decision, arguing that the publisher who had offered those novels on its marketplace lacked the proper rights. Kindle is a locked-down appliance where Amazon calls the shots and customers have no recourse no matter however arbitrary those decisions appear.

What about computers? It used to be the case that if you bought a PC, it was yours for the keeping. It would continue running until its hardware failed.  In 2006 Microsoft launched FlexGo, a pay-as-you-go model for PC ownership in emerging markets. Echoing the words of the used car-salesmen on the benefits bestowed on consumers while barely suppressing a sense of colonialist contempt, a spokesperson for a partnering bank in Brazil enthuses: “Our lower-income customers are excited to finally buy their first PC with minimal upfront investment, paying for time as they need it, and owning a computer with superior features and genuine software.” (Emphasis on genuine software, since consumers in China or Brazil never had any problem getting their hands on pirated versions of Windows.) MSFT takes a more measured approach in touting the benefits of this alternative: “Customers can get a full featured Windows-enabled PC with low entry costs that they can access using prepaid cards or through a monthly subscription.” Insert quarters to continue.

FlexGo did not did not crater like “Bob,” Vista or others in the pantheon of MSFT disasters. Instead it faded into obscurity, having bet on the wrong vision of “making computing accessible” soon rendered irrelevant on both financial and technology grounds. Hardware prices continued to drop Better access to banking services and consumer credit meant citizens in developing countries got access to flexible payment options to buy a regular PC, without an OEM or software vendor in the loop to supervise the loan or tweak the operating system to enforce alternative licensing models. More dramatically the emergence of smartphones cast into doubt whether everyone in Brazil actually needed that “full-featured Windows-enabled PC” in the first place to cross the digital divide.

FlexGo may have disappeared but the siren song of subscription models still exerts its pull on the technology industry. Economics favor such models on both sides. Compared to the infrequent purchase of big-ticket items, the steady revenue stream from monthly subscribers smooths out seasonal fluctuations in revenue. From the consumer perspective, making “small” monthly payments over time instead of one big lump payment may look more appealing due to cognitive biases.

If anything the waning of PC as the dominant platform paves the way for this transformation. Manufacturers can push locked-down “appliances” without the historical baggage associated with the notion of a personal computer. Ideas that would never fly on the PC platform, practices that would provoke widespread consumer outrage and derision—locked boot-loaders, mandatory data-collection, always-on microphones and cameras, remote kill capabilities— can become the new normal for a world of locked-down appliances. In this ecosystem users no longer “own” their devices in the traditional sense; even if they were paid in full and no one can legally show up at the door to demand their return. These gadgets suffer from a serious case of split-personality disorder. On the one hand they are designed to provide some useful service to their presumed “owner;” this is the ostensible purpose they are advertised and purchased for. At the same time the gadget software contains business logic to serve the interests of the device manufacturer/service-provider/whoever happens to actually control the bits running there. These two goals are not always aligned. In a hypothetical universe with efficient markets, one would expect strong correlation. If the gadget deliberately sacrificed functionality to protect the manufacturer’s platform or artificially sustain an untenable revenue model, enlightened consumers would flock to an alternative from a competitor that is not saddled with such baggage. In reality such competitive dynamics operate imperfectly if at all, and the winner-takes-all nature of many market segments means that it is very difficult for a new entrant to make significant gains against entrenched leaders by touting openness or user-control as distinguishing feature. (Case in point: troubled history of open-source mobile phone projects and their failure to reach mass adoption.)

Going against the grain?

If there are forces counteracting the irresistible pull of locked-down appliances, they will face an uneven playing field. The share of PCs continues to decline among all consumer devices; Android has recently surpassed Windows as the most common operating systems on the Internet. Meanwhile the highly fashionable Internet of Things (IoT) notion is predicated on blackbox devices which are not programmable or extensible by their ostensible owners. It turns out that in some cases, they are not even managed by the manufacturer; just ask owners of IP cameras who devices were unwittingly enrolled into the Mirai botnet.

Consumers looking for an alternative face a paradoxical situation. On the one hand, there is a dearth of off-the-shelf solutions designed with user rights in mind. The “market” favors polished solutions such as the Nest thermostat, where hardware, software and cloud services are inextricably bundled together. Suppose you are a fan of the hardware but skeptical about how much private information it is sending to a cloud service provider? Tough luck; there is no cherry picking allowed. On the other hand, there has never been a better time to be tinkering with hardware: Arduino, Raspberry Pi and a host of other low-cost embedded platforms made it easier than ever to put together your own custom solution. This is still a case of payment in exchange for preserving user rights but it is a uniquely undemocratic system. But this “payment” is in the form of additional time spent to engineer and operate home-brew solutions. More worrisome is that such capabilities are only available to a small number of people, distinguished by their ability to renegotiate the terms service providers attempt to impose on their customer base. While that capability is to be celebrated—and it is why every successful jailbreak of a locked-down appliance is celebrated in the security community— it is fundamentally undemocratic by virtue of being restricted to a new ruling class of technocrats.

CP

[Update: Edited Feb 27th to correct typo.]

The missing identity layer for DeFi

Bootstrapping permissioned applications

To paraphrase the infamous 1993 New Yorker cartoon: “On the blockchain, nobody knows that you are a dog.” All participants are identified by opaque addresses with no connection to their real-world identity. Privacy by default is a virtue, but even those who voluntarily want to link addresses to their identity have few good options that would be persuasive. This blog post can lay claim to the address 0xa12Db34D434A073cBEE0162bB99c0A3121698879 on Ethereum but can readers be certain? (Maybe Ethereum Naming Serivce or ENS can help.) On the one hand, there is an undeniable egalitarian ethos here: if the only relevant facts about an address are those represented on-chain— its balance in cryptocurrency, holdings of NFTs, track record of participating in DAO governance votes— there is no way to discriminate between addresses based on such “irrelevant” factors as the citizenship or geographic location of the person/entity controlling that address. Yet such discrimination based on real-world identity is exactly what many scenarios call for. To cite a few examples:

  1. Combating illicit financing of sanctioned entities. This is particularly relevant given that rogue states including North Korea have increasingly pivoted to committing digital asset theft as their access to the mainstream financial system is cut off.
  2. Launching a regulated financial service where the target audience must be limited by law, for example to citizens of a particular country only.
  3. Flip-side of the coin: excluding participants from a particular country (for example, the United States) in order to avoid triggering additional regulatory requirements that would come into play when serving customers in that jurisdiction.
  4. Limiting participation in high-risk products to accredited investors only. While this may seem trivial to check by inspecting the balance on-chain, the relevant criteria are total holdings of that person, which are unlikely to be all concentrated in one address.

As things stand, there are at best some half-baked solutions to the first problem. Blockchain analytics companies such as Chainalysis, TRM Labs and Elliptic surveil public blockchains, tracing funds movement associated with known criminal activity as these actors hop from address to address. Customers of these services can in turn receive intelligence about the state of an address or even an application such as lending pool. Chainalysis even makes this information conveniently accessible on-chain: the company maintains smart-contracts on Ethereum and other EVM compatible chains containing a list of OFAC sanctioned addresses. Any other contract can consult this registry to check on the status of an address they are interacting with.

The problem with these services is three-fold:

  1. The classifications are reactive. New addresses are innocent until proven guilty, when they are later involved in illicit activity. At that point, the damage has been done: other participants may have interacted with the address or allowed the address to participate in their decentralized applications. In some cases it may be possible to unwind specific transactions or isolate the impact. In other situations such as a lending pool where funds from multiple participants are effectively blended together, it is difficult to identify which transactions are now “tainted” by association and which ones are clean.
  2. “Not a terrorist organization” is a low bar to meet. Even if this could be ascertained promptly and 100% accurately, most applications have additional requirements of their participants. Some of the examples alluded to below include location, country of citizenship or accredited investor status. Excluding the tiny fraction of bad actors in the cross-hairs of FinCEN is useful but insufficient for building the types of regulated dapps that can take DeFi mainstream.
  3. All of these services follow a “blacklist” model: excluding bad actors. In information security, it is a well-known principle that this model is inferior to “whitelisting”— only accepting known good actors. In other words, a blacklist fails open: any address not on the list is assumed clean by default. The onus is on the maintainer of the list to keep up with the thousands of new addresses that crop up, not to mention any sign of nefarious activity by existing addresses previously considered safe. By contrast, whitelists require an affirmative step before addresses are considered trusted. If the maintainer is slow to react, the system fails safe: a good address is considered untrusted because the administrator has not gotten around to including them.

What would an ideal identity verification layer for blockchains look like? Some high-level requirements are:

  • Flexible. Instead of expressing a binary distinction between sanctioned vs not-yet-sanctioned, it must be capable of expressing a range of different identity attributes as required by a wide range of decentralized apps.
  • Opt-in. The decision to go through identity verification for an address must reside strictly with person or persons controlling that address. While we can not stop existing analytics companies from continuing to conduct surveillance of all blockchain activity and try to deanonymize addresses, we must avoid creating additional incentives or pressure for participants to voluntarily surrender their privacy. 
  • Universally accepted. The value of an authentication system increases with the number of applications and services accepting that identity. If each system is only useful for onboarding with a handful of dapps, it is difficult for participants to justify the time and cost of jumping through the hoops to get verified. Government identities such as driver’s licenses are valuable precisely because they are accepted everywhere. Imagine an alternative model where every bar had to perform its own age verification system and issue their own permits— not recognized by any other establishment— in order to enforce laws around drinking age.
  • Privacy respecting. The protocols involved in proving identity must limit information disclosed to the minimum required to achieve the objective. Since onboarding requirements vary between dapps, there is a risk of disclosing too much information to prove compliance. For example, if a particular dapp is only open to US residents, that is the only piece of information that must be disclosed, and not for example the exact address where the owner resides. Similarly proof of accredited investor status does not require disclosing total holdings, or proving that a person is not a minor can be done without revealing the exact date of birth. (This requirement has implications on design. In particular, it rules out simplistic approaches around issuing publicly readable “identity papers” directly on-chain, for example as a “soul-bound” token attached to the address. 

Absent such an identity layer, deploying permission DeFi apps is challenging. Aave’s dedicated ARC pool is an instructive example. Restricted to KYCed entities vetted by the custodian Fireblocks, it failed to achieve even a fraction of the total-value locked (TVL) available in the main Aave lending pool. While there were many headwinds facing the product due to timing and the general implosion of cryptocurrency markets in 2022 (here is a good post-mortem thread), the difficulty of scaling the market when participants must be hand-picked is one of the challenges. While ARC may have been one of the first and more prominent examples, competing projects are likely to face the same odds for boot-strapping their own identity system. In fact they do not even stand to benefit from the work done by the ARC team: while participants went through rigorous checks to gain access to that walled garden, there is no reusable, portable identity resulting from that process. Absent an open and universally recognized KYC standard, each project is required to engage in a wasteful effort to field their own identity system. In many ways, the situation is worse than the early days of web authentication. Before identity federation standards such as SAML and OAuth emerged to allow interoperability, every website resorted to building their own login solution. Not surprisingly, many of these were poorly designed and riddled with security vulnerabilities. Even in the best case when each system functioned correctly in isolation, collectively they burdened customers with the challenge of managing dozens of independent usernames and passwords. Yet web authentication is a self-contained problem, much simpler than trying to link online identities to real-world ones. 

What about participants’ incentives for jumping through the necessary hoops for on-boarding? Putting aside ARC, there is a chicken-egg problem to boot-strapping any identity system: Without interesting application that are gated on having that ID, participants have no compelling reason to sign up for one; the value proposition is not there. Meanwhile if few people have onboarded with that ID system, no developer wants to build an application limited to customers with one of those rare IDs— that would be tantamount to choking off your own customer acquisition pipeline. Typically this vicious cycle is only broken in one of two ways:

  1. An existing application with a proprietary identity system, which is already compelling and self-sustaining, sees value in opening up that system such that verified identities can be used elsewhere. (Either because it can monetize those identity services or due to competitive pressure from competing applications offering the same flexibility to their customers for free.) If there are multiple such applications with comparable criteria for vetting customers, this can result in an efficient and competitive outcome. Users are free to take their verified identity anywhere and participate in any permissions application, instead of being held hostage by the specific provider who happened to perform the initial verification. Meanwhile developers can focus on their core competency— building innovative applications— instead of reinventing the wheel to solve an ancillary problem around limiting access to the right audience.
  2. New regulations are introduced, forcing developers to enforce identity verification for their applications. This will often result in an inefficient scramble for each service provider to field something quickly to avoid the cost of noncompliance, leaving little room for industry-wide cooperation or standards to emerge. Alternatively it may result in a highly centralized outcome. One provider specializing in identity verification may be in the right place at the right time when rules go into effect, poised to become the de facto gatekeeper for all decentralized apps. 

In the case of DeFi, this second outcome is looking increasingly more likely.

CP

Blockchain thefts, retroactive bug-bounties and socially-responsible crime

Or, monetizing stolen cryptocurrency proves non-trivial.

It is not often one hears of bank robbers returning piles of cash after a score because they decided they could not find a way to spend the money. Yet this exact scenario has played out over and over again in the context of cryptocurrency in 2022. Multiple blockchain-based projects were breached, resulting in losses in millions of dollars. That part alone would not have been news, only business as usual. Where the stories take a turn for the bizarre is when the perpetrators strike a bargain with the project administrators to return most of the loot, typically in exchange for a token “bug bounty” to acknowledge the services of the thieves in uncovering a security vulnerability.

To name a handful:

  • August 2021, Poly Network. A generous attacker returns close to 600 million dollars in stolen funds back to the project.
  • Jan 2022, Multichain. Attacker returns 80% of the 1 million dollars stolen, deciding that he/she earned 20% for services rendered.
  • June 2022, Crema Finance. Attacker returns $8 million USD, keeping $1.6 million as “white-hat bounty.” (Narrator: That is not how legitimate white-hat rewards work.)
  • Oct 2022, Transit Swap. Perpetrator returns 16 million (about two-thirds of the total haul)
  • December 2022, Defrost Finance on Avalanche. Again the attacker returned close to 100% of funds.

While bug bounty programs are very common in information security, they are often carefully structured with rules governing the conduct of both the security researchers and affected companies. There is a clear distinction between a responsible disclosure of a vulnerability and outright attack. Case in point: disgraced former Uber CSO has been convicted of lying to Federal Investigators over an incident when the Uber security team retroactively tried to label an actual breach as a valid bug-bounty submission. It was a clear-cut case of an actual attack: the perpetrators had not merely identified a vulnerability but exploited it to the maximum extent to grab Uber customer data. They even tried to extort Uber for payment in exchange for keeping the incident under wraps—none of this is within the framework for what qualifies as responsible disclosure. To avoid negative PR, Uber took up the perpetrators on their offer, attempting to recharacterize a real breach after the fact as a legitimate report. That did not go over very well with the FTC or the Department of Justice who prosecuted the former Uber executive and obtained a guilty verdict.

Given that this charade did not work out for Uber, it is strange to see multiple DeFi projects embrace the same deception. It reeks of desperation, of the unique flavor experienced by a company facing an existential crisis. Absent a miracle to reverse the theft (along the lines of the DAO hard-fork the Ethereum foundation orchestrated to bail-out an early high-profile project) these projects would be out of business. The stakes are correspondingly much higher than they were for Uber circa 2017: given the number of ethics scandals and privacy debacles Uber experienced on a regular basis, the company could easily have weather one more security incident. But for fledgling DeFi projects, the abrupt loss of all (or even substantial part of) customer funds is the end of the road.

On the other hand, it is even more puzzling that the perpetrators—or “vulnerability researchers” if one goes along with the rhetoric—are playing along, giving up the lion’s share of their ill-gotten gains in exchange for… what exactly? While the terms of the negotiation between the perpetrators and project administrators are often kept confidential, there are a few plausible theories:

  • They are legitimate security researchers who discovered a serious vulnerability and decided to stage their own “rescue” operation. There are unique circumstances around vulnerability disclosure on blockchains. Bug collisions happen all the time and at any point, someone else— someone less scrupulous than our protagonist—may discover the same vulnerability and choose to exploit it for private gain. (This is quite different than say finding a critical Windows vulnerability. It would be as if you could exploit that bug on all Windows machines at the same time, regardless of where those targets are located in the world and how well they are defended otherwise. Blockchains are unique in this regard: anyone in the world can exploit a smart-contract vulnerability. The flip side of the coin is that anyone can role-play at being a hero and protecting all users of the vulnerable contract. Going back to our example, while one cannot “patch” Windows without help from MSFT and whoever owns the machine, it is possible to protect 100% of customers. The catch is one must race to exploit the vulnerability and seize all the funds at risk, in the name of safekeeping, before the black-hats can do the same for less noble purposes.
    While it possible that in at least some of these instances, the perpetrators were indeed socially-responsible whitehat researchers motivated by nothing more than protecting customers, that seems an unlikely explanation for all of the cases. Among other clues, virtually every incident occurred without any advance notification. One would expect that a responsible researcher would at least make an effort to contact the project in advance of executing a “rescue,” notifying them of their intentions and offering contact information. Instead project administrators were reduced to putting out public-service announcements on Twitter to reach out to the anonymous attackers, offering to negotiate for return of missing funds. There is no
  • Immunity from prosecution. If the thieves agree to return the majority of the funds taken, the administrators could agree not to press charges or otherwise pursue legal remedies. While this may sound compelling, it is unlikely the perpetrators could get much comfort from such an assurance. Law enforcement could still treat the incident as a criminal matter even if everyone officially associated with the project claims they have made peace with the perpetrators.
  • The perpetrators came to the sad realization that stealing digital assets is the easy part. Converting those assets into dollars or otherwise usable currency without linking that activity to their real-world identity is far more difficult.

That last possibility would be a remarkable turn-around; conventional wisdom holds that blockchains are the lawless Wild West of finance where criminal activity runs rampant and crooks have an easy time getting rich by taking money from hapless users. The frequency of security breaches suggests the first part of that statement may still be true: thefts are still rampant. But it turns out that when it comes to digital currency, stealing money and being able to spend it are two very different problems.

For all the progress made on enabling payments in cryptocurrency—mainly via the Lightning Network—most transactions still take place in fiat. Executing a heist on blockchain may be no more difficult than 2017 when coding secure smart-contracts was more art than science. One thing that has certainly changed in the past five years is regulatory scrutiny on the on/off-ramps from cryptocurrency into the fiat world. Criminals still have to convert their stolen bitcoin, ether or more esoteric ERC20 assets into “usable” form. Typically, that means money in a bank account; stablecoins such as Tether or Circle will not do the trick. By and large merchants demand US dollars, not dollar-equivalent digital assets requiring trust in the solvency of private issuers.

That necessity creates a convenient chokepoint for enforcement: cryptocurrency exchanges, which are the on-ramps and off-ramps between fiat money and digital assets. Decentralization makes it impossible to stop someone from exploiting a smart-contract—or what one recently arrested trader called a “highly profitable trading strategy”—by broadcasting a transaction into a distributed network. But there is nothing trustless or distributed about converting the proceeds of that exploit it into dollars spendable in the real world. That must go through a centralized exchange. To have any hope of sending/receiving US dollars, that exchange must have some rudimentary compliance program and at least make a token effort at following regulatory obligations, including Know Your Customer (KYC) and anti-money laundering (AML) rules. (Otherwise, the exchange risks experiencing the same fate as Bitfinex which was unceremoniously dropped by its correspondent bank Wells Fargo in 2017 much to the chagrin of Bitfinex executives.) Companies with aspirations to staying in business do not look kindly on having their platform being used to launder proceeds from criminal activity. They frequently cooperate with law enforcement in seizing assets as well as providing information leading to the arrest of perpetrators. Binance is a great demonstration of this in action. Once singled out by Reuters as the platform preferred by criminals laundering cryptocurrency, the exchange has responded by ramping up its compliance efforts and participating in several high-profile asset seizures. Lest the irony is lost: a cryptocurrency business proudly declares its commitment to surveilling its own customer base to look for evidence of anyone receiving funds originating with criminal activity. (The company even publishes hagiographic profiles on its compliance team retrieving assets from crooks foolish enough to choose Binance as their off-ramp to fiat land.)

This is not to say that monetizing theft on blockchains has become impossible. Determined actors with resources—such as the rogue state of North Korea—no doubt still retains access to avenues for exiting into fiat. (Even in that case, increased focus on enforcement can help by increasing the “haircut” or percentage of value lost by criminals when they convert digital assets into fiat through ever inefficient schemes.) But those complex arrangements are not accessible to a casual vulnerability researcher who stumbles into a serious flaw in a smart-contract or compromises the private keys controlling a large wallet. Put another way: there are far more exploitable vulnerabilities than ways of converting proceeds from that exploit into usable money. Immature development practices and gold-rush mentality around rushing poorly designed DeFi applications to market has created a target-rich environment. This is unlikely to change any time soon. On the flip side, increased focus on regulation and availability of better tools for law enforcement—including dedicated services such as Chainalysis and TRM Labs for tracing funds on chain—makes it far more difficult to monetize those attacks in any realistic way. It was a running joke in the information security community that blockchains come with a built-in bug bounty. Find a serious security vulnerability and monetary rewards shall follow automatically—even if the owner of the system ever bothered to create an official bounty program. Digital assets that are blacklisted by every reputable business and can never be exchanged for anything else of value are about as valuable as monopoly money. Given that dilemma, it is no surprise that creative vulnerability researchers would embrace the post hoc “white-hat disclosure” charade, choosing a modest but legitimate payout over holding on to a much larger sum of tainted funny-money they have little of being able to spend.

CP

The myth of tainted blockchain addresses [part II]

[continued from part I]

Ethereum and account-based blockchains

The Ethereum network does not have a concept of discrete “spend candidates” or UTXOs. Instead, funds are assigned to unique blockchain addresses. While this is a more natural model for how consumers expect digital assets to behave (and bitcoin wallet software goes out of its way to create the same appearance while juggling UTXOs under the covers) it also complicates the problem of separating clean vs dirty funds.

Consider this example:

  • Alice has a balance of 5 ETH balance on her Ethereum address
  • She receives 1 ETH from a sanctioned address (For simplicity assume 100% of these funds are tainted, for example because they represent stolen.)
  • She receives another 5 ETH from a clean address.
  • Alice sends 1 ETH to Bob.

If Alice and Bob are concerned about complying with AML rules, they may be asking themselves: are they in possession of tainted ETH that needs to be frozen or otherwise segregated for potential seizure by law enforcement? (Note in this example their interests are somewhat opposed: Alice would much prefer that the 1ETH she transferred to Bob “flushed” all the criminal proceeds out of her wallet, while Bob wants to operate under the assumption that he received all clean money and all tainted funds still reside with Alice.)

Commodities parallel

In one were to draw a crude—no pun intended—comparison to commodities, tainted Bitcoin behaves like blood diamonds while tainted Ethereum behaves like contraband oil imported from a sanctioned petro-dictatorship. While UTXO can be partially tainted, it does not “mix” with other UTXO associated with the same address. Imagine a precious stones vault containing diamonds. Some of these turn out to be conflict diamonds, others have a verifiable pedigree. While the vault may contain items of both type, there is no question whether any given sale includes conflict diamonds. In fact, once the owner becomes aware of the situation, they can make a point of putting those samples aside and never selling them to any customer. This is the UTXO model in bitcoin: any given transaction either references a given UTXO (and consumes 100% of the available funds there) or does not reference that UTXO at all. If the wallet owner is careful to never use tainted inputs in constructing their transaction, they can be confident that the outputs are also clean.

Ethereum balances do not behave this way because they are all aggregated together in one address. Stretching the commodity example, instead of a vault with boxes of precious gems, imagine an oil storage facility. There is a tank with a thousand barrels of domestic oil with side-entry mixer running inside to stir up the contents and avoid sludge settling at the bottom. Some joker dumps a thousand barrels of contraband petrostate oil of identical density and physical characteristics into this tank. Given that the contents are being continuously stirred, it would be difficult to separate out the product into its constituent parts. If someone tapped one barrel from that tank and sold it, should that barrel be considered sanctioned, clean or something in between such as “half sanctioned”?

There are logical arguments that could justify each of these decisions:

  1. One could take the extreme view that even the slightest amount of contraband oil mixed into the tank results in spoilage of the entire contents. This is the obsessive-compulsive school of blockchain hygiene, which holds that even de minimus amounts originating from a sanctioned address irreversibly poisons an entire wallet. In this case all 2000 barrels coming out of that tank will be tainted. In fact, if any more oil were added to that tank, it too would get tainted. At this point, one might as well shutter that facility altogether.
  2. A more lenient interpretation holds that there are indeed one thousand sanctioned barrels, but those are in the batch of second thousand barrels coming out of the spout. Since the first thousand original barrels were clean, we can tap up to that amount without a problem. This is known as FIFO or first-in-first-out ordering in computer science.
  3. Conversely, one could argue that the first thousand are contraband because those were the most recent additions to the tank, while the next thousand will be clean. That would be LIFO or last-in-first-out ordering.
  4. Finally, one could argue the state of being tainted exists on a continuum. Instead of a simple yes/no, each barrel is assigned a percentage. Given that the tank holds equal parts “righteous” and “nefarious” crude oil, every barrel coming out of it will be 50% tainted according to this logic.

Pre-Victorian legal precedents

While there may not be any physical principles for choosing between these hypotheses, it turns out this problem does come up in legal contexts and there is precedent for adopting a convention. In the paper Bitcoin Redux a group of researchers from the University of Cambridge expound on how an 1816 UK High Court ruling singles out a particular way of tracking stolen funds:

It was established in 1816, when a court had to tackle the problem of mixing after a bank went bust and its obligations relating to one customer account depended on what sums had been deposited and withdrawn in what order before the insolvency. Clayton’s case (as it’s known) sets a simple rule of first-in-first-out (FIFO): withdrawals from an account are deemed to be drawn against the deposits first made to it.

In fact, their work tackles a more complicated scenario where multiple types of taint are tracked, including stolen assets, funds from Iran (OFAC sanctioned) and funds coming out of a mixer. The authors compare the FIFO heuristic against the more radical “poison” approach which corresponds to #1 in our list above, as well as the “haircut” which corresponds to #4, highlighting its advantages:

The poison diagram shows how all outputs are fully tainted by all inputs. In the haircut diagram, the percentages of taint on each output are shown by the extent of the coloured bars. The taint diffuses so widely that the effect of aggressive asset recovery via regulated exchanges might be more akin to a tax on all users.
[…]
With the FIFO algorithm, the taint does not go across in percentages, but to individual components (indeed, individual Satoshis) of each output. Thus the first output has an untainted component, then the stolen component – both from the 9 first input – and then part of the Iranian component from the second input. As the taint does not spread or diffuse, the transaction processes it in a lossless way.

Ethereum revisited

While the Bitcoin Redux paper only considered the Bitcoin network, the FIFO heuristic translates naturally into the Ethereum context as it corresponds to option #2 in the crude-oil tank example. Going back to the Alice & Bob hypothetical, it vindicates Bob—in fact it means Alice can send another 4ETH from that address before getting to the tainted portion.

Incidentally the FIFO model has another important operational advantage: it allows the wallet owner to quarantine tainted funds in a fully deterministic, controlled manner. Suppose Alice’s compliance officer advises her to quarantine all tainted funds at a specific address for later disbursement to law enforcement. Recall that the tainted sum of 1 ETH is “sandwiched” chronologically between two chunks of clean ETH in arrival order. But Alice can create a series of transactions to isolate it:

  • If necessary, she needs to spend the first 5 ETH that were present at the address prior to the arrival of tainted funds. Alice could wait until this happens naturally, as in her outbound transfer to Bob. Any remaining amount can be immediately consumed in a loopback transaction sending funds back to the original address or she could temporarily shift those funds to another wallet under her control.
  • Now she creates another 1 ETH transaction to move the tainted portion to the quarantine address.

The important point here is that no one else can interfere with this sequence. If instead the LIFO heuristic had been adopted, Alice could receive a deposit between steps #1 and #2, resulting in her outbound transaction in the second step using up a different 1 ETH segment that does not correspond exactly to the portion she wanted to get rid of. This need not even be a malicious donation. For example, charities accepting donations on chain receive deposits from contributors without any prior arrangement. Knowing the donation address is sufficient; there is no need to notify the charity in advance of an upcoming payment. Similarly, cryptocurrency exchanges hand out deposit addresses to customers with the understanding that the customer is free to send funds to that address any time and they will be credited to her account. In these situations, the unexpected deposit would throw off the carefully orchestrated plan to isolate tainted funds but only if LIFO is used—because in that model the “last-in” addition going “first-out” is the surprise deposit.

In conclusion: blockchain addresses are not hopelessly tainted because of one unsolicited transaction sent by someone looking to make a point. Only specific chunks of assets associated with that address carry taint. Using Tornado Cash to permanently poison vast sums of ether holdings remains nothing more than wishful thinking because the affected portion can be reliably separated by those seeking to comply with AML rules, at the cost of some additional complexity in wallet operations.

CP