Password management quirks at Fidelity

IVR Login

“Please enter your password.”

While this prompt commonly appears on webpages, it is not often heard during a phone call. This was not a social-engineering scam either: it was part of the automated response from the Fidelity customer support line. The message politely invites callers to enter their Fidelity password using the dial-pad, in order to authenticate the customer before connecting them to a live agent. There is one complication: dial-pads only have digits and two symbols, asterisk/plus and pound key. To handle the full range of symbols present in passwords, the instructions call for using the standard convention for translating letters into digits helpfully displayed on most dial-pads:

Standard dial-pad, with mapping of letters to numbers
(Image from Wikipedia)

The existence of this process indicates a quirk in the way Fidelity stores passwords for their online brokerage service.

Recap: storing passwords without storing them

Any website providing personalized services is likely to have an authentication option that involves entering username and password. That website then has to verify whether the correct password is entered, by comparing the submitted version against the one recorded when the customer last set/changed their password.

Surprisingly there are ways to do that without storing the original password itself. In fact storing passwords is an anti-pattern. Even storing them encrypted is wrong: “encryption” by definition is reversible. Given encrypted data, there is a way to recover the original, namely by using the secret key in a decryption operation. Instead passwords are stored using a cryptographic hash-function. Hash functions are strictly one-way: there is no going back from the output to the original input. Unlike encryption, there is no magic secret key known to anyone— not even the person doing the hashing— that would permit reversing the process.

For example, using the popular hash function bcrypt, the infamous password “hunter2” becomes:

$2b$12$K1.sb/KyOqj6BdrAmiXuGezSRO11U.jYaRd5GhQW/ruceA9Yt4rx6

This cryptographic technique is a good fit for password storage because it allows the service to check passwords during login without storing the original. When someone claiming to be that customer shows up and attempts to login with what is allegedly their password, the exact same hash function can be applied to that submission. The output can be compared against the stored hash and if they are identical, one can conclude with very high probability that the submitted password was identical to the original. (Strictly speaking, there is a small probability of false positives: since hash functions are not one-to-one, in theory there is an an infinite number of “valid” submissions that yield the same hash output. Finding one of those false-positives is just as difficult as finding the correct password, owing to the design properties of hash functions.)

From a threat model perspective, one benefit of storing one-way hashes is that even if they are disclosed, an attacker does not learn the user credential and can not use it to impersonate the user. Because hashes are irreversible, the only option available to an attacker is to mount a brute-force attack: try billions of password guesses to see if any of them produce the same hash. The design of good hash-functions for password storage is therefore an arms-race between defenders and attackers. It is a careful trade-off between making hashing efficient enough for the legitimate service to process a handful of passwords quickly during a normal login process, while making it expensive enough to raise costs for an attacker trying to crack a hash by trying billions of guesses.

A miss is as good as a mile

Much ink has been spilled over the subtleties of designing good functions for password hashing. An open competition organized in 2013 sought to create a standardized, loyalty-free solution for the community. Without going too much into details of these constructions, the important property for our purposes is that good hash functions are highly sensitive to their input: change even one bit of the input password and the output changes drastically. For example, the passwords “hunter2” and “hunter3” literally differ in just one bit—the very last bit distinguishes the digit three from the digit two in ASCII code. Yet their bcrypt hashes look nothing alike:

Hash of hunter2 ➞ zSRO11U.jYaRd5GhQW/ruceA9Yt4rx6

Hash of hunter3 ➞ epdCpQLaQbcGTREZLZAFDKp5i/zQmp2

(One subtlety here: password hash functions including bcrypt incorporate a random “salt” value to make each hash output unique, even when they are invoked on the same password. To highlight changes caused by the password difference, we deliberately used the same salt when calculating these hashes above and omitted the salt from the displayed value. Normally it would appear as a prefix.)

Bottom line: there is no way to check if two passwords are “close enough” by comparing hash outputs. For example, there is no way to check if the customer got only a single letter of the password wrong or if they got all the letters correct except for lower/upper-case distinction. One can only check for exact quality.

The mystery at Fidelity

Having covered the basics of password hashing we can now turn to the unusual behavior of the Fidelity IVR system. Fidelity is authenticating customers by comparing against a variant of their original password. The password entered on the phone is derived from, but not identical to the one used on the web. It is not possible to perform that check using only a strong cryptographic hash. A proper hash function suitable for password storage would erase any semblance of similarity between these two versions.

There are two possibilities:

  1. Fidelity is storing passwords in a reversible manner, either directly as clear-text or in an encrypted form where they can still be recovered for comparison.
  2. Fidelity is canonicalizing passwords before hashing, converting them to the equivalent “IVR format” consisting of digits before applying the hash function.

It is not possible to distinguish between these, short of having visibility into the internal design of the system.

But there is another quirk that points towards the second possibility: the website and mobile apps also appear to validate passwords against the canonicalized version. A customer with the password “hunter2” can also login using the IVR-equivalent variant “HuNt3R2

Implications

IVR passwords are much weaker than expected against offline cracking attacks. While seemingly strong and difficult to guess, the password iI.I0esl>E`+:P9A is in reality equivalent to the less impressive 4404037503000792.

The notion of entropy can help quantify the problem. A sixteen character string composed of randomly selected printable ASCII characters has an entropy around 105 bits. That is a safety margin far outside the reach of any commercially motivated attacker. Now replace all of those 16 characters by digits and the entropy drops to 53 bits— or about nine quadrillion possibilities, which is surprisingly within range of hobbyists with home-brew password cracking systems.

In fact entropy reduction could be worse depending on exactly how users are generating passwords. The mapping from available password symbols to IVR digits is not uniform:

  • Only the character 1 maps to the digit 1 on the dial-pad.
  • Seven different characters (A, B, C, a, b, c, 2) are mapped to the digit 2
  • By contrast there are nine different symbols mapping to the digit 7, due to the presence of an extra letter for that key on the dial-pad
  • Zero is even more unusual in that all punctuation marks and special characters get mapped to it, for more than 30 different symbols in total

In other words, a customer who believes they are following best-practices, using a password manager application and having their app generate “random” 16 character password will not end up with a uniformly distributed IVR password. Zeroes will be significantly over-represented due to the bias in mapping while the digit one will be under-represented.

The second design flaw here involves carrying over the weakness of IVR passwords to the web interface and mobile applications. While the design constraints of phone-based customer support requires accepting simplified passwords, it does not follow that the web interface would also accept these. (This assumes the web interface grants higher privileges, in the sense that certain actions can only be carried out by logging into the website— such as adding bank accounts and initiating funds transfers— and not possible through IVR.)

Consider an alternative design where Fidelity asks customers to pick a secondary PIN for use on the phone. In this model there are two independent credentials. There is the original password used when logging in through the website or mobile apps. It is case-sensitive and permits all symbols. An independent PIN consisting only of digits is selected by customers for use during phone authentication. Independent being the operative keyword here: it is crucial that the IVR PIN is not derived from the original password via some transformation. Otherwise an attacker who can recover the weak IVR PIN can use that to get a head-start on recovering the full password.

Responsible disclosure

This issue was reported to the Fidelity security team in 2021. This is their response:

“Thank you for reaching out about your concerns, the security of your personal information is a primary concern at Fidelity. As such, we make use of industry standard encryption techniques, authentication procedures, and other proven protection measures to secure your information. We regularly adapt these controls to respond to changing requirements and advances in technology. We recommend that you follow current best practices when it comes to the overall security of your accounts and authentication credentials. For further information, you can visit the Security Center on your Fidelity.com account profile to manage your settings and account credentials.”

Postscript: why storing two hashes does not help

To emphasize the independence requirement, we consider another design and explain why it does not achieve the intended security effect. Suppose Fidelity kept the original design for a single password shared between web and IVR, but hashed it two ways:

  • First is a hash of the password verbatim, used for web logins.
  • Second is a hash of the canonicalized password, after being converted to IVR form by mapping all symbols to digits 0 through 9. This hash is only checked when the customer is trying to login through the phone interface.

This greatly weakens the full password against offline cracking because the IVR hash provides an intermediate stepping stone to drive guessing attacks against the full credential. We have already pointed out that the entropy in all numeric PINs is very low for any length customers could be expected to key into a dial-pad. The additional problem is that after recovering the IVR version, the attacker now has a much easier time cracking the full password. Here is a concrete example: suppose an attacker mounts a brute-force attack against the numeric PIN and determines that a stolen hash corresponds to “3695707711036959.” That data point is very useful when attempting to recovery the full password, since many guesses are not compatible with that IVR version. For example, the attacker need not need waste any CPU cycles on hashing the candidate password “uqyEzqGCyNnUCWaU.” That candidate starts with “u” which would map to the digit “8” on a dial-pad. It would be inconsistent with something the attacker already knows: the first character of the real password maps to 3, based on the IVR version. Working backwards from the same observation, the attacker knows that they only need to consider guesses where the first symbol is one of {8, t, u, v, T, U, V}.

This makes life much easier for an attacker trying to crack hashes. The effort to recover a full password is not even twice the amount required to go after the IVR version. Revisiting the example of 16 character random passwords: we noted that unrestricted entries have about 105 bits of entropy while the corresponding IVR version is capped to around 53 bits. That means once the simplified IVR PIN is recovered by trying 2**53 guesses at most, the attacker only needs another 2**(105-53) == 2**52 guesses to hit on the full password. This is only about 50% more incremental work— and on average the correct password will be discovered halfway through the search. In other words, having a weak “intermediate” target to attack has turned an intractable problem (2**105 guesses) into two eminently solvable sub-problems.

CP

Saved by third-party cookies: when phishing campaigns make mistakes

(Or, that rare instance when third-party cookies actually helped improve security.)

Third-party cookies have earned a bad reputation for enabling widespread tracking and advertising driven surveillance online— one that is entirely justified. After multiple half-hearted attempts to deprecate them by leveraging its browser monopoly with Chrome, even Google eventually threw in the towel. Not to challenge that narrative but this blog provides an example of an unusual incident where third-party cookies actually helped protect consumers, protecting them from an ongoing phishing attack. While this phishing campaign occurred in real life, we will refer to the site in question as Acme, after the hypothetical company in Wile E Coyote.

Recap on phishing

Phishing involves creating a look-alike, replica of a legitimate website in order to trick customers into disclosing sensitive information. Common targets are passwords which can be used to login to the real website by impersonating the user. But phishing can also go after personally identifiable information such as a credit-cards or social-security numbers directly, since those are often monetizable on their own. The open nature of the web makes it trivial to clone the visual appearance of websites for this purpose. One can simply download every image, stylesheet, video from that site, stash the same content on a different server controlled by the attacker and point users into visiting this latter copy. (While this sounds sinister, there are even legitimate use cases for it such as mirroring websites to reduce load or protect them from denial-of-service-attacks.)

Important point to remember is that visuals can be deceptive: the bogus site may be rendered pixel-for-pixel identical inside the browser view. The address bar is one of the only reliable clues to the provenance of the content; that is because it is part of the user-interface controlled 100% by the browser. It can not be manipulated by the attacker. But good luck spotting the difference between login.acme.com, loginacme.com, acnne.com or any of the dozen other surprising ways to confuse the unwary. Names that look “close” at the outset can be controlled by completely unrelated entities, thanks to the way DNS works.

What makes phishing so challenging to combat is that the legitimate website is completely out of the picture at the crucial moment the attack is going on. The customer is unwittingly interacting with a website 100% controlled by an adversary out to get them, while under the false impression that they are dealing with a trusted service they have been using for years. Much as the real site may want to jump in with a scary warning dialog to stop their customer from making a crucial judgment error, there is no opportunity for such interventions. Recall that the customer is only interacting with the replica. All content the user sees is sourced from the malicious site, even if it happens to be a copy of content originally copied from the legitimate one.

Rookie mistake

Unless that is, the attacker makes a mistake. That is what happened with the crooks targeting Acme: they failed to clone all of the content and create a self-contained replica. One of the Acme security team members noticed that the malicious site continued to reference content hosted from the real one. Every time a user visited the phishing site, their web browser was also fetching content from the authentic Acme website. That astute observation paved the way for an effective intervention.

In this particular phishing campaign, the errant reference back to the original site was for a single image used to display a logo. That is not much to work with. On the one hand, Acme could detect when the image was being embedded from a different website, thanks to the HTTP Referer [sic] header. (Incidentally referrer-policies today would interfere with that capability.) But this particular image consumed precious little screen real estate, only measuring a few pixels across. One could return an altered image— skull and crossbones, mushroom cloud or something garish— but it is very unlikely users would notice. Even loyal customers who have the Acme logo committed to memory might not think twice if a different image appears. Instead of questioning the authenticity of the site, they would attribute that quirk to a routine bug or misguided UI experiment.

It is not possible to influence the rest of the page by returning a corrupt image or some other type of content. For example it is not possible to prevent the page from loading or redirect it back to the legitimate login page. Similarly one can not return some other type of content such as javascript to interrupt the phishing attempt or warn users. (Aside: if the crooks had made the same mistake by sourcing a javascript file from Acme, such radical interventions would have been possible.)

Remember me: cookies & authentication

To recap, this is the situation Acme faces:

  1. There is an active phishing campaign in the wild
  2. Game of whack-a-mole ensues: Acme security team continues to report each site to browser vendors and hosting companies and Cloudflare (Crooks are fond of using Cloudflare, because it acts as a proxy sitting in front of the malicious site, disguising its true origin and making it difficult for defenders to block it reliably.) Attacker responds by changing domain names and resurfacing the exact same phishing page under a different domain name.
  3. Every time a customer visits a phishing page, Acme can observe the attack happening in real time because its servers receive a request
  4. Despite being in the loop, Acme can not meaningfully disrupt the phishing page. It has limited influence over the content displayed to the customer

Here is the saving grace: when customers reach out to the legitimate Acme website in step #3 as part of the phishing page, their web browser sends along all cookies. Some of those cookies contain information that uniquely identifies the specific customer. That means Acme finds out not only that some customer visited a phishing site, but it can find out exactly which customer did. In fact there were at least two such cookies:

  • “Remember me” cookie used to store email address and expedite future logins by filling in the username field in login forms.
  • Authentication cookies set after login. Interestingly, even expired cookies are useful for this purpose. Suppose Acme only allowed login sessions to last for 24 hours. After the clock runs out, customer must reauthenticate by providing their password or MFA again. Authentication cookies would have embedded timestamps reflecting those restriction. In keeping with the policy of requiring “fresh” credentials, after 24 hours that cookie would no longer be sufficient for authenticating the user. But for the purpose of identifying which user is being phished, it works just fine. (Technical note: “expired” here is referring to the application logic; the HTTP standard itself defines an expiration time for cookies after which point the browser deletes that cookie. If a cookie expires in that sense, it would not be of much use— it becomes invisible to the server.)

This gives Acme a lucky break to protect customers from the ongoing phishing attack. Recall that Acme can detect when incoming request for the image is associated with the phishing page. Whenever that happens, Acme can use the accompanying cookies to look up exactly which customer has stumbled onto the malicious site. To be clear, Acme can not determine conclusively whether the customer actually fell for phishing and disclosed their credentials. (There is a fighting chance the customer notices something off about the page after visiting it, and stops short of giving away their password. Unfortunately that can not be inferred remotely.) As such Acme must operate on the worst-case assumption that phishing will succeed and place preemptive restrictions on the account, such as temporarily suspending logins or restricting dangerous actions. That way, even if the customer does disclose their credentials and the crooks turn around to “cash in” those credentials by logging into the genuine Acme website, they will be thwarted from achieving their objective.

Third-party stigma for cookies

There is one caveat to the availability of cookies required to identify the affected customer: those cookies are now being replayed in a third-party context. Recall that the first vs third-party distinction is all about context: whether a resource (such as image) being fetched for inclusion on a webpage is coming from the same site as the overall owner of the page, called “top level document.” When an image is fetched as part of a routine visit to Acme website, it is a first-party request because the image is hosted at the same origin as the top-level document. But when it is being retrieved by following a reference from the malicious replica, it becomes a third-party request.

Would existing Acme cookies get replayed in that situation and reveal the identity of the potential phishing victim? That answer depends on multiple factors:

  1. Choice of browser. At the time of this incident, most popular web browsers freely replayed cookies in third-party contexts, with two notable exceptions:
    • Safari suppressed cookies when making these requests.
    • Internet Explorer: As the lone browser implementing P3P, IE will automatically “leash” cookies if they are set without an associated privacy policy: cookies will be accepted, but only replayed in first-party contexts.
  2. User overrides to browser settings. While the preceding paragraph section describes the default behavior of each browser, users can modify these to make them more or less stringent.
  3. Use of “samesite” attribute. About 15 years after IE6 inflicted P3P and cookie management on websites, a proposed update to the HTTP cookie specification finally emerged to standardize and generalize its leashing concept. But the script was flipped: instead of browsers making unilateral decisions to protect user privacy, website owners would declare whether their cookies should be made available in third-party contexts. (One can imagine which way advertising networks— crucially dependent on third-party cookie usage for their ubiquitous surveillance model— leaned on that decision.)

Luckily for Acme, the relevant cookies here were not restricted by the samesite attribute. As for browser distribution, IE was irrelevant with market share in the single digits, primarily restricted to enterprise users in managed IT environments, a far cry from the target audience for Acme. Safari on the other hand did have a non-negligible share, especially among mobile clients since Apple did not allow independent browser implementations such as Chrome at the time. (That restriction would only be lifted in 2024 when the EU reset Apple’s expectations around monopolistic behavior.)

In the end, it was possible to identify and take evasive actions for the vast majority of customers known to have visited the malicious website. More importantly, intelligence gathered from one interaction is frequently useful in protecting other customers, even when the latter are not directly identified as being targeted by an attack. For example, an adversary often has access to a handful of IP4 addresses when they are attempting to cash-in stolen credential. When an IP address is observed attempting to impersonate a known phishing victim, every other login from that IP can be treated with higher suspicion. Detection mechanisms can be invaluable even with less than 100% coverage.

Verdict on third-party cookies 

Does this prove third-party cookies have some redeeming virtue and the web will be less safe when— or at this rate, if— they are fully deprecated? No. This anecdote is more the exception proving the rule. Advertising industry has been rallying in defense of third-party cookies for over a decade, spinning increasingly desperate and far-fetched scenarios. From the alleged death of “free” content (more accurately, ad-supported content where the real product being peddled are the audience eyeballs for ad networks) to allegedly reduced capability for detecting fraud and malicious activity online, predictions of doom have been a constant part of the narrative.

To be clear: this incident does not in any way provide more ammunition for such thinly-veiled attempts at defending a fundamentally broken business model. For starters, there is nothing intrinsic to phishing attacks that requires help from third-party cookies for detection. The crooks behind this particular campaign made an elementary mistake: they left a reference to the original site when cloning the content for their malicious replica. There is no rule that says other crooks are required to follow suit. While this is an optimistic assumption built into defense strategies such as canary tokens, new web standards have made it increasingly easier to avoid such mistakes. For example content security policy allows a website to precisely delineate which other websites can be contacted for fetching embedded resources. It would have been a trivial step for crooks to add CSP headers and prevent any accidental references back to the original domain, neutralizing any javascript logic lurking in there to alert defenders.

Ultimately the only robust solution for phishing is using authentication schemes that are not vulnerable to phishing. Defense and enterprise sectors have always had the option of deploying PKI with smart-cards for their employees. More recently consumer-oriented services have convenient access to a (greatly watered-down) version of that capability with passkeys. Jury is out on whether it will gain any traction or remain consigned to niche audience owing to the morass of confusing, incompatible implementations.

CP

QCC: Quining C Compiler

Automating quine construction for C

A quine is a program that can output its own source code. Named after the 20th century philosopher Willard Van Orman Quine and popularized in Godel, Escher, Bach, writing quines has quickly become a quintessential bit of recreational coding. Countless examples can be found among the International Obfuscated C Code Contest (IOCCC) entries. While quines are possible in any Turing complete programming language— more on that theoretical result below— writing them can be tricky due to the structural requirements placed on the code for a valid quine. Success criteria is very strict: the output must be exactly identical to the original source, down to spacing and newlines. A miss is as good as a mile. The starting point for this blog post is a search for simplifying the creation of arbitrary quines. For concreteness, this proof of concept will focus on C. But the underlying ideas extend to other programming languages.

Since most interesting quines have some additional functionality besides being able to output their own code, it would be logical to divide development into two steps:

  1. Write the application as one normally would
  2. Convert it into a quine by feeding it through a compiler. (More accurately, a “transpiler” or “transcompiler” since the output itself is another valid program in the same language.)

Motivating example

Consider the canonical example of a quine. It takes no inputs and simply prints out its own source code to standard out. Here is a logical, if somewhat naive, starting point in C:

Minor problem: this is not a valid C program. Doing all the heavy lifting is a mystery function “get_self” that is not declared— much less defined— anywhere in the source file. (Historical aside: C used to permit calling such undeclared functions, with an implicit assumption of integer return type. Those liberties were revoked with the C99 standard. In any case, adding a declaration will merely postpone the inevitable: the compile-time error about undefined symbols becomes a link-time error about unresolved symbols.)

What if we could feed this snippet into another program that automatically transform it into a valid C program working as intended? Before going down that path, it is important to clarify the success criteria for a “working” quine. Informally we are looking for a transformation with these properties:

  • Accepts as input an “almost valid” C program with one undefined function get_self(). This function takes no arguments and returns a nil-terminated C string.
  • Outputs a valid C program with identical functionality
  • That new program includes a valid C implementation of  get_self() which returns the entire source code of the modified program, including the newly added function.

There are two subtleties here: First the implementation must return the source code of the modified program, incorporating all additions and changes made by the transformation. Printing the original, pre-transformed source would not qualify as a true quine. Second, the implementation provided for the mystery function get_self() must operate at source level. There are trivial but OS dependent ways to convert any program into a quine by mucking with the binary executable, after the compilation is done. For example, the ELF format for Linux executables allows adding new data sections to an existing file. One could take the source code and drop that into the compiled binary as a symbol with a specific name such as “my_source_code.” This allows for a simple  get_self() implementation that parses the binary for the current executable, searching for that symbol in data sections and converting the result into a C string. Expedient as that sounds, such approaches are considered cheating and do not qualify as true quines according to generally accepted definition of a self-replicating program. Informally, the  “quineness” property must be intrinsic to the source. In the hypothetical example where one is playing games with ELF sections, the capability originated with an after-the-fact modification of the binary and is not present anywhere in the source. Another example of cheating would be to commit the source code to an external location such as Github and download it at runtime. These are disqualified under our definition.

QCC: Quining C Compiler

qcc is a proof-of-concept that transforms arbitrary C programs to create valid quines according to the rules sketched above. For example, running the pre-quine sample from the last section through qcc yields a valid quine:

Looking at the transformed output, we observe the original program verbatim in the listing (lines 8-15) sandwiched between prepended and appended sections:

Changes preceding the original source are minimal:

  • Courtesy warning against manual edits, as any change is likely to break the quine property. This is optional and can be omitted.
  • Include directives for standard C libraries required by the get_self() implementation. This is also optional if the original source already includes them. (This PoC does not bother to check. Standard library headers commonly have include guards; second and subsequent inclusions become a noop.) Incidentally, includes can also be deferred until the section of source referencing them. There is no rule dictating that all headers must appear at the start; it is merely a style convention most C/C++ developers abide by.
  • Declaration of get_self(). Also can be omitted if the original program has one.

More substantial additions appear after the original code:

  • Definition of a helper function for hex-decoding.
  • Two string constants carrying hex-encoded payloads. These could have been merged into a single string constant, but keeping them separate helps clarify the quine mechanism.
  • Definition of the get_self() function. This is the core of the implementation.

Hex-decoding the first string shows that it is just the the original program, with minor changes for headers/declaration prepended and the helper function appended. There is nothing self-referential or unusual going on. But the second hex payload when decoded is identical to the function get_self(). That is the tell-tale of sign quines: inert “data” strings mirroring code that gets compiled into executable instructions. Note that while the first string constant is a function of the original input program, the second one is identical for all inputs. (This is why separating them helps clarify the mechanism.)

Making a quine out of the quining compiler

Careful readers will note that qcc itself is a C program, but the initial version referenced above is not itself a quine. That will not stand. One natural option is to add a new command line flag that will induce qcc to output its own source code instead of quining another program.

Boot-strapping that has one tricky aspect: one needs a functioning qcc executable before it can quine its own source code. While this can be achieved by maintaining two different versions of the compiler, it is possible to get by with a single version with some judicious use of preprocessor tricks and conditional compilation. The trick is defining a preprocessor macro and surrounding any code paths that reference quine-like behavior with #ifdef guards referencing that macro:

  1. Add an option to qcc to define this macro explicitly in the transformation when outputting a quine. That capability will come in handy for any application that needs to be a valid C program in its pre-quine state.
  2. Compile with default settings, resulting in the preprocessor macro being undefined and leaving those sections out of the build. (That is a good thing: otherwise compiler errors will result from the nonexistent self-referential function referenced in those lines.)
  3. Run this original, pre-quine qcc binary on its own source code, and specify that the preprocessor macro is to be defined in the output.
  4. Save and recompile the transformed output as the new qcc. No need to explicitly define the preprocessor macro via compiler flags; it is already hardwired into the transformed source.

This final version can now output its own source, in addition to quining other C programs:

Caveats and design options

As this is a proof of concept, there are caveats:

  • It only operates on single files. As such it can not be used to create multiquines, where a collection of more than one program has access to the source code of every other program in the collection. That case can be handled by first concatenating them into a single program and some preprocessor tricks. Straight concatenation alone would result in an invalid C program, due to multiple definitions for main, not to mention any other conflicting symbol used. But conditional compilation with #ifdef allows activating only a subset of code during compilation.
  • Choice of hex encoding is arbitrary. It is certainly not space-efficient, doubling the size of the original input. On the other hand, using hex avoids the visual clutter/confusion caused by having strings that look like valid C code. Raw C strings also have the drawback that special characters such as double quotes and backslashes must be escaped.  While decoding is free in the sense that the C compiler takes care of it, helper functions are still needed for outputting escaped strings. Hex is simple enough to encode/decode in a handful of lines compared to more elaborate options such as base64 or compression, both of which would require more code inlined into every quine or pull in new library dependencies.
  • Implementation of get_self() is not thread safe, due to use of static variables. If there is a race condition where multiple threads race to execute the first-time initialization code, the string holding the source code representation will get computed multiple times. All threads will still receive the correct value, but extra memory will be wasted for unused versions.
  • String constants are currently emitted as a single line, which may overflow the maximum allowed depending on compiler. (C standard requires support for at least 4K characters and most compilers in practice far exceed that limit.)
  • Finally: QCC transformations are easily recognizable. There is no attempt to hide the fact that a quine is being created. Functions and variable identifiers have meaningful, intuitive names. This is problematic in contexts such as IOCC where obfuscation and brevity are virtues.

Post-script: excursion on Kleene’s recursion theorem

Stepping back, qcc is an implementation of a transform that is guaranteed to exist by virtue of a result first proven in 1938, namely Kleene’s second-recursion theorem. While that theorem was originally stated in the context of recursive functions, here we state it in an equivalent setting with Turing machines.

Consider a Turing machine T that computes a function of two inputs x and y:

T(x, y) 

Kleene’s result states that there is a Turing machine S which computes a function of just one input, such that the behavior of S on all inputs is closely related to the behavior of T:

x S(x) = T(x, #S)

where #S is the Turing number for the machine S— in other words, a reference to S itself. This statement says that for all inputs, S behaves just like the original machine T acting on the first input, with the second input hard-wired to its own (that is, of S) Turing number.

Since programming languages are effectively more convenient ways to construct Turing machines, we can translate this into even more familiar territory. Suppose we have a program P written in a high-level language which takes two inputs:

P(x, y) output

Kleene’s theorem guarantees the existence of a program Q which takes a single input x and computes:

Q(x) = P(x, source(Q))

Here source(Q) is the analog of Turing numbers for high-level programming languages, namely a textual representation of the program Q in the appropriate alphabet, such as ASCII or Unicode.

For a concrete example, consider replicating a recent IOCCC entry: write a program that outputs its own SHA512 hash. At first this seems impossible, barring a catastrophic break in SHA512. As a cryptographic hash function, SHA512 is designed to be one-way: given a target hash, it is computationally infeasible to find a preimage input that would produce that target when fed into SHA512. Given the difficulty of that basic problem, finding a “self-consistent” program such that its own SHA512 hash is somehow contained in the source code seems daunting. But Kleene’s result makes it tractable. As a starting point, it is easy enough to write a program that receives some input (for example, from console) and outputs the SHA512 hash of that input. Squinting a little, we can view this as a degenerate function on two inputs. The first input is always ignored while the second input is the one processed through the hash function:

P(x, y) SHA512(y)

Kleene’s recursion theorem then guarantees the existence of a corresponding program Q:

Q(x) = P(x, source(Q)) = SHA512(source(Q))

In other words, Q prints a cryptographic hash of its own source code.

Given this background, we have a theoretical view on QCC: For C programs that can be expressed in a single compilation unit, QCC constructs another C program that is the “fixed-point” guaranteed to exist by Kleene’s second recursion theorem, with the second input hard-wired to the source code of the new program.

CP

From TPM quotes to QR codes: surfacing boot measurements

The Trusted Platform Module (TPM) plays a critical role in measured boot: verifying that the state of a machine after booting into the operating system. This is accomplished by a chain of trust rooted in the firmware, with each link in the chain recording measurements of the next link in the TPM before passing control to its successor. For example, the firmware measures option ROMs before executing each one, then the master-boot record (for legacy BIOS) or GPT configuration (for EFI) before handing control over to the initial loader such as the shim for Linux, which in turn measures the OS boot-loader grub. These measurements are recorded in a series of platform-configuration registers or PCRs of the TPM. These registers have the unusual property that consecutive measurements can only be accumulated; they can not be overwritten or cleared until a reboot. It is not possible for a malicious component coming later in the boot chain to “undo” a previous measurement or replace it by a bogus value. In TPM terminology one speaks of extending a PCR instead of writing to the PCR, since the result is a function of both existing value present in the PCR— the history of all previous measurements accumulated so far— and current value being recorded.

At the end of the boot process, we are left with a series of measurements across different PCRs. The question remains: what good are these measurements?

TPM specifications suggest a few ideas:

  • Remote attestation. Prove to a third-party that we booted our system in this state. TPMs have a notion of “quoting”— signing a statement about the current state of the PCRs, using an attestation key that is bound to that TPM. The statement also incorporates a challenge chosen by our remote peer, to prove freshness of the quote. (Otherwise we could have obtained the quote last week and then booted a different image today.) As side-note: remote attestation was one of the most controversial features of the TPM specification, because it allows remote peers to discriminate based on software users are running. 
  • Local binding of keys. TPM specification has an extensive policy language around controlling when keys generated on a TPM can be used. In the basic case, it is possible to generate an RSA key that can only be used when a password is supplied. But more interestingly, key usage can be made conditional on the value of a persistent counter, the password associated with a different TPM object (this indirection allows changing password all at once on multiple keys) or specific values of PCRs. Policies can also be combined using logical conjunctions and disjunctions.

PCR policies are the most promising feature for our purposes. hey allow binding a cryptographic key to a specific state of the system. Unless the system is booted into this exact state— including the entire chain from firmware to kernel, depending on what is measured— that key is not usable. This is how disk-encryption schemes such as Bitlocker and equivalent DIY-implementations built on LUKS work: the TPM key encrypting the disk is bound to some set of PCRs. More precisely, the master key that is used to encrypt the disk is itself “wrapped” using a TPM key that is only accessible when PCRs are correct. Upshot of this design is that unless the boot process results in the same exact PCR measurements, disk contents can not be decrypted. (Strictly speaking, Bitlocker uses another way to achieve that binding. TPM also allows defining persistent storage areas called “NVRAM indices.” In the same way usage policies can be set on PCR, NVRAM indices can be associated with an access policy such that their contents are only readable if PCRs are in a given state.)

To see what threats are mitigated by this approach, imagine a hypothetical Bitlocker-like scheme where PCR bindings are not used and a TPM key exists that can decrypt the boot volume on a laptop without any policy restrictions. If that laptop is stolen and an adversary now has physical access to the machine, she can simply swap out the boot volume with a completely different physical disk that contains a Linux image. Since that image is fully controlled by the attacker, she can login and run arbitrary code after boot. Of course that random image does not contain any data from the original victim disk, so there is nothing of value to be found immediately. But since the TPM is accessible from this second OS, attacker can execute a series of commands to ask the TPM to decrypt the wrapped-key from the original volume. Absent PCR bindings, the TPM has no way to distinguish between the random Linux image that booted and the “correct” Windows image associated with that key.

The problem with PCR bindings

This security against unauthorized changes to the system comes at the cost of fragility: any change to PCR values will render TPM objects unusable, including those that are “honest.” Firmware itself is typically measured into PCR0 and if a TPM key is bound to that PCR, it will stop working after the upgrade. In TPM2 parlance, we would say that it is no longer possible to satisfy the authorization policy associated with the object. (In fact, since firmware upgrades are often irreversible to avoid downgrading to vulnerable versions, that policy is likely never satisfiable again.) The extent of fragility depends on selected PCRs and frequency of expected changes. Firmware upgrades are infrequent, they are increasingly integrated with OS software update mechanism such as fwupd on Linux. On the other hand, Linux Integrity Measurement Architecture or “IMA” feature measures key operating system binaries into PCR10. That measurement can change frequently with kernel and boot-loader upgrades. In fact since IMA is configurable in what gets measured, it is possible to configure it to measure more components and register even minor OS configuration tweaks. There is intrinsic tension between security and flexibility here: the more OS components are measured, fewer opportunities left for an attacker to backdoor the system unnoticed. But it also means fewer opportunities to modify that system since any change to a measured component will brick keys bound to those measurement.

There are some work-arounds for dealing with this fragility. In some scenarios, one can deal with an unusable TPM key by using an out-of-band backup. For example, LUKS disk encryption supports multiple keys. In case the TPM key is unavailable, the user can still decrypt using a recovery key. Bitlocker also supports multiple keys but MSFT takes a more cautious approach, recommending that full-disk encryption is suspended prior to firmware updates. That strategy does not work when the TPM key is the only valid credential enabling a scenario. For example when an SSH or VPN key is bound to PCRs and the PCRs change, those credentials need to be reissued.

Another work-around is using wildcard policies. TPM2 policy authorizations can express very complex statements. For example wildcard policies allow an object to be used as long as an external private-key signs a challenge from the TPM. Similarly policies can be combined using logical AND and OR operators, such that a key is usable either based on correct PCR values or a wildcard policy as fallback. In this model, decryption would normally use PCR bindings but in case the PCRs have changed, some other entity would inspect the new state and authorize use of the key if those PCRs look healthy.

Surfacing PCR measurements

In this proof-of-concept, we look at solving a slightly orthogonal problem: surfacing PCR measurements to the owner for that person make a trust decision. That decision may involve providing the wildcard authorization or something more mundane, such as entering the backup passphrase to unlock their disk. More generally, PCR measurements on a system can act as a health check, concisely capturing critical state such as firmware version, secure-boot mode and boot-loader used. Users can then make a decision about whether they want to interact with this system based on these data points.

Of course simply displaying PCRs on screen will not work. A malicious system can simply  report the expected healthy measurements while following a different boot sequence. Luckily TPMs already have a solution for this, called quoting. Closely related to remote attestation, a quote is a signed statement from the TPM that includes a selection of PCRs along with a challenge selected by the user. This data structure is signed using an attestation key, which in turns is related to the endorsement key that is provisioned on the TPM by its manufacturer. (The endorsement key comes with an endorsement certificate baked into the TPM to prove its provenance, but it can not be used to sign quotes directly. In an attempt to improve privacy, TCG specifications complicated life by requiring one level of indirection with attestation keys, along with an interactive challenge/response protocol to prove relationship between EK and AK.) These signed quotes can be provided to users after boot or even during early boot stages as datapoint for making trust decisions.

Proof-of-concept with QR codes

There are many ways to communicate TPM quotes to the owner of a machine: for example the system could display it as text on screen, write out the quote as regular file on a removable volume such as USB drive or leverage any network interface such as Ethernet, Bluetooth or NFC to communicate them. QR codes have the advantage of simplicity in only requiring a display on the quoting side and a QR scanning app on the verification side. This makes it particularly suited to interactive scenarios where the machine owner is physically present to inspect its state. (For non-interactive scenarios such as servers sitting in a datacenter, transmitting quotes over the network would be a better option.)

As a first attempt, we can draw the QR code on the login background screen. This allows the device owner to check its state after it has booted but before the owner proceeds to enter their credentials. The same flow would apply for unlocking the screen after sleep/hibernation state. First step is to point the default login image. Several tutorials describe customizing the login screen for Ubuntu by tweaking a specific stylesheet file. Alternatively there is the gdm-settings utility for a more point-and-click approach. The more elaborate part is configuring a task to redraw that image periodically. Specifically, we schedule a task to run on boot and every time the device comes out of sleep. This task will:

  1. Choose a suitable challenge to prove freshness. In this example, it retrieves the last block-hash from the Bitcoin blockchain. That value is updated every 10 minutes on average, can be independently confirmed by any verifier consulting the same blockchain and can not be predicted/controlled by the attacker. (For technical reasons, the block hash must be truncated down to 16 bytes, the maximum challenge size accepted by TPM2 interface.)
  2. Generate a TPM quote using the previously generated attestation key. For simplicity, the PoC assumes that AK has been made into a persistent TPM object to avoid having to load it into the TPM repeatedly.
  3. Combine the quote with additional metadata, most important one being actual PCR measurements. The quote structure includes a hash of the included PCRs but not the raw measurements themselves. Recall that our objective is to surface measurements so the owner can make an informed decision, not proving that they equal to a previously known reference value. If expected value was cast in stone and known ahead of time, one could instead use PCR policy to permanently bind some TPM key to those measurements instead, at the cost of fragility discussed earlier. (This PoC also includes the list of PCR indices involved in the measurement for easy parsing, but that is redundant as the signed quote structure already includes that.)
  4. Encode everything using base64 or another alphabet that most QR code applications can handle. In principle QR codes can encode binary data but not every scannerhandles this case gracefully.
  5. Convert that text into a PNG file containing its QR representation and write this image out on the filesystem where we previously configured Ubuntu to locate its background image.
TPM quote rendered as QR code on Ubuntu login screen

This QR code now contains sufficient information for the device owner to make a trust decision regarding the state of the system as observed by the TPM.

Corresponding verification steps would be:

  1. Decode QR image
  2. Parse the different fields and base64 decode to binary values
  3. Verify the quote structure using the public half of the TPM attestation key
  4. Concatenate actual PCR measurements including in the QR code, hash the resulting sequence of bytes and verify that this hash is equal to the hash appearing inside the quote structure. This step is necessary to establish that the “alleged” raw PCR measurements attached to the quote are, in fact, the values going into that quote.
  5. Confirm that the validated PCR measurements represent a trustworthy state of the system.

Step #5 is easier said than done, since it is uncommon to find public sources of “correct” measurements published by OEMs. Most likely one would take a differential approach, comparing against previous measurements from the same system or measurements taken from other identical machines in the fleet. For example, if applying a certain OS upgrade to a laptop in known healthy state in the lab produces a set of measurements, one can conclude that observing the same PCRs on a different unit from the same manufacturer is not unexpected occurrence. On the other hand, if a device mysteriously starts reporting a new PCR2 value—associated with option ROMs from peripheral devices that are loaded by firmware during boot stage— it may warrant further investigation by the owner.

Moving earlier in the boot chain

One problem with using the lock-screen to render TPM quotes is that it is already too late for certain scenario. Encrypted LUKS partitions will already have been unlocked at that point, possibly by the user entering a passphrase or their smart-card PIN during the boot sequence. That means a compromised operating system already has full access to decrypted data by the time any QR code appears. At that point the QR code still has some value as a detection mechanism, as the TPM will not sign a bogus quote. An attacker can prevent the quote from being displayed, feign a bug or attempt to replay a previous quote containing stale challenges but these all produce detectable signals. More subtle attempts may wait until disk-encryption credentials have been collected from the owner, exfiltrate those credentials to attacker-controlled endpoint, fake a kernel panic to induce reboot back into a healthy state where the next TPM quote will be correct.

With a little more effort, the quote rendering can be moved to earlier in the boot sequence and give the device owner an opportunity to inspect system state before any disks are unlocked. The idea is to move the above logic into the initrd image which is a customizable virtual disk image that contains instructions for early boot stages after EFI firmware has already passed control to the kernel. Initrd images are customized using scripts that execute at various stages. By moving the TPM quote generation to occur around the same time as LUKS decryption, we can guarantee that information about PCR measurements are available before any trust decisions are made about the system. While the logic is similar to rendering QR quotes on the login screen, there are several implementation complexities to work around. Starting with the most obvious problem:

  • Displaying images without the benefit of a full GUI framework. Frame-buffer to the rescue. It turns out that this is already a solved problem for Linux: fbi and related utilities can render JPEG or PNGs even while operating in console mode by writing directly to the frame-buffer device. (Incidentally it is also possible to take screenshots by reading from the same device; this is how the screenshot attached below was captured.)
  • Sequencing, or making sure that the quote is available before the disk is unlocked.  One way to guarantee this is to force quote generation to occur as integral part of LUKS unlock operation. Systemd has removed support for LUKS unlock scripts, although crypttab remains an indirect way to execute them. In principle we could write a LUKS script that invokes the quote-rendering logic first, as a blocking element. But this would create a dependency between existing unlock logic and TPM measurement verification. (Case in point: unlocking with TPM-bound secrets used to require custom logic but is now supported out of the box with systemd-cryptenroll.)
  • Stepping back, we only need to make sure that quotes are available for the user to check, before they supply any secrets such as LUKS passphrase or smart-card PIN to the system. There is no point in forcing any additional user interaction, since a user may always elect to ignore the quote and proceed with boot. To that end this PoC handles quote generation asynchronously. It opens a new virtual terminal (VT) and brings it to the foreground. All necessary work for generating the QR code— including prompting the user for optional challenge— will take place in that virtual terminal. Once the user exits the image viewer by pressing escape, they are switched back to the original VT where the LUKS prompt awaits. Note there is no forcing function in this sequence: nothing stops the user from ignoring the quote generation logic and invoking the standard Ctrl+Alt+Function key combination to switch back to the original VT immediately if they choose to.
  • Choosing challenges. Automatically retrieving fresh quotes from an external, verifiable source of randomness such as the bitcoin blockchain assumes network connectivity. While basic networking can be available inside initrd images and even earlier during the execution of EFI boot-loaders, it is not going to be the same stack running as the operating system itself. For example, if the device normally connects to the internet using a wifi network with the passphrase stored by the operating system, that connection will not be available until after the OS has fully booted. Even for wired connections, there can be edge-cases such as proxy configuration or 802.1X authentication that would be difficult to fully replicate inside the initrd image.
    This PoC takes a different tack to work around networking requirement, by using a combination of EFI variables and prompting the user. For the sunny-day path, the EFI variable is updated using an OS task scheduled to execute at shutdown, writing the same challenge (eg most recent block-hash from Bitcoin) into firmware flash. On boot this value will be available for the initrd scripts to retrieve for quote generation. If the challenge did not update correctly eg during a kernel-panic induced reboot, the owner can opt for manually entering a random value from the keyboard.
TPM quote rendered as QR code during boot, before LUKS unlock

Another minor implementation difference is getting by without a TPM resource manager. When executing in multi-user mode, TPM access is mediated by the tabrmd service, which stands for “TPM Access Broker and Resource Manager Daemon.” That service has exclusive access to the raw TPM device typically at /dev/tpm0 and all other processes seeking to interact with the TPM communicate with the resource manager over dbus. While it is possible to carry over the same model to initrd scripts, it is more efficient to simply have our TPM commands directly access the device node since they are executing as root and there is no risk of contention from other processes vying for TPM access.

CP

Evading Safe Links with S/MIME: when security features collide

From an attacker perspective email remains one of the most reliable channels for reaching their targets. Many security breaches start out with an employee making a poor judgment call to open a dangerous attachment or click on a link from their inbox. Not surprisingly cloud providers of email service invest significant time in building security features to protect against such risks. Safe Links is part of the defenses built into MSFT Defender for Office 365. As the second word implies, it is concerned with links: specifically protecting users from following harmful links embedded in email messages.

Working from first principles, there are two ways one could go about designing that functionality:

  1. Validate the link when the email is first received by the cloud service
  2. Validate it when the message is read by the recipient.

In both cases the verb “validate” assumes there is some blackbox that can look at a website and pass judgment on whether it is malicious, perhaps with a confidence score attached. In practice that would be a combination of crowd-sourced blacklists— for example, URLs previously reported as phishing by other users— and machine learning models trained to recognize specific signals of malicious activity..

There are trade-offs to either approach. Scanning as soon as email is received (but before it is delivered to the user inbox) allows for early detection of attacks. By not allowing users to ever see that email, we can short circuit human factors and avoid the risk that someone may be tempted to click on the link. On the other hand, it runs into a common design flaw known as TOCTOU or time-of-check-time-of-use. Here of “time of check” is when the webpage is scanned. “Time of use” is when the user clicks on the link. In between the content that the link points at can change; what started out as a benign page can morph into phishing or serve up  browser exploits.

Cloaking malicious content this way would be trivial for attackers, since they have full control over the content returned at all times. At the time they send their phishing email, the  server could be configured to serve anodyne, harmless pages. After waiting a few hours— or perhaps waiting until the initial scan, which is quite predictable in the case of MSFT Defender— they can flip a switch to start the attack. (Bonus points for sending email outside business hours, improving the odds that the victim will not accidentally stumble on the link until after the real payload is activated.) There is also a more mundane possibility that the page never changes but the classifiers get it wrong, mistakenly labeling it as harmless until other users manually report the page as malicious. Validating links on every click avoids such hijinks, leveraging most up-to-date information about the destination.

Wrapping links

While validating links every time is the more sound design, it poses a problem for email service providers. They do not have visibility into every possible situation where users are following links from email. In the fantasy universe MSFT sales teams inhibit, all customers read their email on MSFT Outlook on PCs running Windows with a copy of Defender for Endpoint installed for extra safety. In the real world, enterprises have heterogenous environments where employees could be reading email on iPhones, Android, Macs or even on Linux machines without  a trace of MSFT software in the picture.

Safe Links solves that problem by rewriting links in the original email before delivering it to the customer inbox. Instead of pointing to the original URL, the links now point to a MSFT website that can perform checks every time it is accessed and only redirect the user to the original site if considered safe. Once the original copy of the message has been altered, it no longer matters which email client or device the user prefers. They will all render messages with modified links pointing back to MSFT servers. (There is a certain irony to MSFT actively modifying customer communications in the name of security, after running a FUD campaign accusing Google of merely reading their customers’ messages. But this is an opt-in enterprise feature that customers actually pay extra for. As with most enterprise IT decisions, it is inflicted on a user population that has little say on the policy decisions affecting their computing environment.) 

Safe Links at work

To take an example, consider the fate of a direct link to Google when it appears in a message addressed to an Office 365 user with Defender policies enabled. There are different ways the link can appear, such as plaintext, as hyperlink from text section or image. and image with hyperlink. Here is the message according to the sender:


Here is the same message from the vantage point of the Office 365 recipient:

Visually these look identical. But on closer inspection the links have been altered. This is easiest to observe from the MSFT webmail client. True URLs are displayed in the browser status bar at the bottom of the window when hovering over links:


The alterations are more blatant when links are sent as plaintext email:


In this case Safe Links alters the visual appearance of the message, because the URL appears as plain- text instead of being encoded in the HTML mark-up which is not directly rendered.

Structure of altered links

Modifications follow the same pattern:

  • Hostname points to “outlook.com” a domain controlled by MSFT.
  • Original URL is included verbatim as one of the query-string parameters
  • Email address of the recipient also makes an appearance, in the next parameter
  • What is not obvious from the syntactic structure of the link but can be verified experimentally: this link does not require authentication. It is not personalized. Anyone— not just the original recipient— can request that URL from MSFT and will be served a redirect to Google. In other words these links function as semi-open redirectors.
  • There is an integrity check in the link. Tampering with any of the query-string parameters or removing them results in an error from the server. (“sdata” field could indicate a Signature over the Data field. It is exactly 32 bytes of base64-encoded content, consistent with an HMAC-SHA256 or similar MAC intended for verification only by MSFT.) This is what happens if the target URL is modified from Google to Bing:
Safe Link integrity checks: even MSFT’s own redirector refuses to send customers to Bing 🤷‍♂️

Bad interactions: end-to-end encryption

Given this background, now we can discuss a trivial bypass. S/MIME is a standard for end-to-end encryption and authentication of email traffic. It is ancient in Internet years, dating back to the 1990s. Emerging around the same time as PGP, it is arguably the “enterprisey” buttoned-down response to PGP, compliant with other fashionably enterprise-centric standards of its era. While PGP defined its own format for everything from public-keys to message structure, S/MIME built on X509 for digital certificates and CMS/PKCS7 for ciphertext/signature formatting. (Incidentally both are based on the binary encoding format ASN1.) As with PGP, it has not taken off broadly except in the context of certain high-security enterprise and government/defense settings.

At the nuts and bolts level, S/MIME uses public-key cryptography. Each participant has their own key-pair. Their public-key is embedded in a digital certificate issued by a trusted CA that can vouch for the owner of that public key. If Alice and Bob have each others’ certificates, she can encrypt emails that only Bob can read. She can also digitally sign those messages such that Bob can be confident they could only have originated with Alice.

How does all this interact with Safe Links? There is an obvious case involving encrypted messages: if an incoming message is encrypted such that Bob can only read it after decrypting with his private key— which no one else possesses— then the email service provider can not do any inspection, let alone rewriting, of hyperlinks present. That applies broadly to any link scanning implemented by a middle-man, not just MSFT Safe Links.  (Tangent: that restriction only holds for true end-to-end encryption. Cloud providers such as Google have muddied the waters with lobotomized/watered-down variants where private-keys are escrowed to the cloud provider in order to sign/decrypt on behalf of the end user. That is S/MIME in name only and more accurately “encraption.”)

In practice, this bypass is not very useful for a typical adversary running a garden-variety phishing campaign:

  • Most targets do not use S/MIME
  • For those who do— while few in number, these will be high-value targets with national security implications— the attacker likely does not have access to the victim certificate to properly encrypt the message. (Granted, this is security through obscurity. It will not deter a resourceful attacker.)
  • Finally even if they could compose encrypted emails, such messages are likely to raise suspicion. The presence of encryption can be used as a signal in machine learning models as contributing sign of malicious behavior, as in the case of encrypted zip files sent as attachments. Even the recipient may have heightened awareness of unusual behavior if opening the email requires taking unusual steps, such as entering the PIN for their smart-card to perform decryption.

Trivial bypass with signatures

But there is a more subtle interaction between Safe Links and S/MIME. Recall that digital signatures are extremely sensitive to any alteration of the message. Anything that modifies message content would invalidate signatures. It appears that the Safe Links design accounted for this and shows particular restraint: clear-text messages bearing an S/MIME signature are exempt from link-wrapping.

Interestingly, the exemption from Safe Links works regardless of whether the S/MIME certificate used for signing is trusted. In the above above screenshot from Outlook on MacOS, there is an informational message about the presence of a digital signature, accompanied by the reassuring visual indication of security, the ubiquitous lock icon. But taking a closer look via “Details” shows the certificate was explicitly marked as untrusted in MacOS keychain:

Similarly the web UI merely contains a disclaimer about signature status being unverifiable due to a missing S/MIME control. (Notwithstanding the legacy IE/ActiveX terminology of “control” that appears to be a reference to a Chrome extension for using S/MIME with webmail.) This limitation is natural: signature validation is done locally by the email client running on a user device. Safe Links operates in the cloud and must make a decision about rewriting links at the time the email is received, without knowing how the recipient will view it. Without full visibility into the trusted CAs associated with every possible user device, a cloud service can not make an accurate prediction about whether the signing certificate is valid for this purpose. MSFT makes a conservative assumption, namely that the signature may be valid for some device somewhere. It follows that signed messages must be exempt from tampering by Safe Links.

Exploiting cleartext signed messages to evade Safe Links is straightforward. Anyone can roll out their own CA and issue themselves certificates suitable for S/MIME. The main requirement is the presence of a particular OID in the extended key-usage (EKU) attribute indicating that the key is meant for email protection. While such certificates will not be trusted by anyone, the mere existence of a signature is enough to exempt messages from Safe Links and allow links to reach their target without tampering.

Crafting such messages does not require any special capability on the part of the target. Recall that they are signed by the sender— in other words, the attacker— but they are not encrypted. There is no need for the recipient to have S/MIME setup or even know what S/MIME is. Depending on the email client, there may be visual indications in the UI about the presence of a digital signature, as well as its trust status. (Worst case scenario, if there are obtrusive warnings concerning an untrusted signature, attackers can also get free S/MIME certificates from a publicly trusted CA such as Actalis. This is unlikely to be necessary. Given the lessons from Why Johnny Can’t Encrypt, subtle warnings are unlikely to influence trust decisions made by end users.)

Options for mitigation

While this post focused on cloud-hosted Exchange, the same dilemma applies to any so-called “email security” solution predicated on rewriting email contents: it can either break S/MIME signatures or fail-open by allowing all links in signed messages through unchanged. Even the weak, key-escrow model espoused by companies such as Google is of little help. GMail can decrypt incoming messages on behalf of a customer of Google Apps, if the customer willingly relinquishes their end-to-end confidentiality and hands over their private key to Google. But GMail still can not re-sign an altered message that was originally signed by unaffiliated party.

Given the rarity of S/MIME, a pragmatic approach is to allow enterprises to opt into the first option. If their employees are not setup for S/MIME and have no expectation of end-to-end authentication in the first place, this functionality is introducing pure risk with no benefit. In that scenario it makes sense for Exchange to not only rewrite the link, but remove signatures altogether to avoid confusion.

That will not fly in high-security environments where S/MIME is actually deployed and end-to-end encryption is important. In that case, more fine grained controls can be applied to cleartext signed messages. For example, the administrator could require that cleartext signed messages are only allowed if the sender certificate was issued by a handful of trusted CAs.

CP

Understanding Tornado Cash: code as speech vs code in action

Tornado Cash is a mixer on the Ethereum network. By design mixers obfuscate the movement of funds. They make it more difficult to trace how money is flowing among different blockchain addresses. In one view, mixers improve privacy for law-abiding ordinary citizens whose transactions are otherwise visible for everyone else in the world to track. A less charitable view contends that mixers help criminals launder the proceeds of criminal activity. Not surprisingly Tornado found a very happy customer in North Korea, a rogue nation-state with a penchant for stealing digital assets in order to evade sanctions. Regulators were not amused. Tornado Cash achieved the dubious distinction of becoming the first autonomous smart-contract to be sanctioned by the US Treasury. Its developers were arrested and put on trial for their role in operating the mixer. (One has already been convicted of money laundering by a Dutch court.)

Regardless of how Tornado ends up being used in the real world, lobbying groups have been quick to come to the defense of its developers. Some have gone so far as to cast the prosecution as an infringement on constitutionally protected speech, pointing to US precedents where source code was deemed in scope of first amendment rights. Underlying such hand-wringing is a slippery slope argument. If these handful of developers are held liable for North Koreans using their software to commit criminal activity, then what about the thousands of volunteers who are publicly releasing code under open-source licenses? It is almost certain that somebody somewhere in a sanctioned regime is using Linux to further the national interests of those rogue states. Does that mean every volunteer who contributed to Linux is at risk of getting rounded up next? 

This is a specious argument. It brushes aside decades of lessons learned from previous controversies around DRM and vulnerability disclosure in information security. To better understand where Tornado crosses the line, we need to look at the distinction between code as speech and code in action.

“It’s alright Ma— I’m only coding”

There is a crucial difference between Tornado Cash and the Linux operating system, or for that matter open-source applications such as the Firefox web browser. Tornado Cash is a hosted service. To better illustrate why that makes a difference, let’s move away from blockchains and money transmission, and into a simpler setting involving productivity applications. Imagine a not-entirely-hypothetical world where providing word processing software to Russia was declared illegal. Note this restriction is phrased very generically; it makes no assumptions about the distribution or business model.

For example the software could be a traditional, locally installed application. LibreOffice is an example of an open-source competitor to the better-known MSFT Office. If it turns out that somebody somewhere in Russia downloaded a copy of that code from one of the mirrors, are LibreOffice developers liable? The answer should be a resounding “no” for several reasons. First the volunteers behind LibreOffice never entered into an agreement with any Russian national/entity for supplying them with software intended for a particular purpose. Second, they had no awareness much less control over who can download their work product once it is released into the wild. Of course these points could also be marshaled in defense of Tornado Cash. Presumably they did not run a marketing campaign courting rogue regimes. Nor for that matter, did North Korean APT check-in with the developers first before using the mixer— at least, based on publicly known information about the case. 

But there is one objection that only holds true for the hypothetical case of stand-alone, locally installed software: that source-code downloaded from GitHub is inert content. It does not accomplish anything, until it is built and executed on some machine under control of the sanctioned entity. The same defense would not hold for a software-as-a-service (SaaS) offering such as Google Docs. If the Russian government switches to Google Docs because MSFT is no longer allowed to sell them copies of Word, Google can not disclaim knowledge of deliberately providing a service to a sanctioned entity. (That would hold true even if use was limited to the “free” version, with no money changing hands and no enterprise contract signed.) Google is not merely providing inert lines of code to customers. It has been animated into a fully functioning service, running on Google-owned hardware inside Google-owned data-centers. There is every expectation that Google can and should take steps to limit access to this service from sanctioned countries.

While the previous cases were cut and dry, gray areas emerge quickly. Suppose someone unaffiliated with the LibreOffice development team takes a copy that software and runs it on AWS as a service. With a little work, it would be possible for anyone in the world with a web browser to remotely connect and use this hosted offering for authoring documents. If it turns out such a hosted service is frequented by sanctioned entities, is that a problem? Provided one accepts that Google bears responsibility for the previous example, the answer here should be identical. But it is less straightforward who that responsible party ought to be. There is full separation of roles between development, hosting and operations. For Google Docs, they are all one and the same. Here code written by one group (open-source developers of LibreOffice) and runs on physical infrastructure provided by a different entity (Amazon.) But ultimately it is the operator who crossed the Rubicon. It was their deliberate decision to execute the code in a manner that would make its functionality publicly-accessible, including to participants who not supposed to have access. Any responsibility from misuse then lies squarely with the operator. The original developers are not involved. Neither is AWS. Amazon is merely the underlying platform provider, a neutral infrastructure that can be used for hosting any type of service, legal or not.

Ethereum as the infrastructure provider

Returning to Tornado Cash, it is clear that running a mixer on ethereum is closer in spirit to hosting a service at AWS, than it is to publishing open-source software. Billed as the “world computer,” Ethereum is a neutral platform for hosting applications— specifically, extremely niche types of distributed application requiring full transparency and decentralization. As with AWS, individuals can pay this platform in ether to host services— even if those services are written in an unusual programming language and have very limited facilities compared to what can be accomplished with Amazon. Just like AWS, those services can be used by other participants with access to the platform. Anyone with access to the blockchain can leverage those services. (There is in some sense a higher bar. Paying transaction fees in ether is necessary to interact with a service on the Ethereum blockchain. Using a website hosted at AWS requires nothing more than a working internet connection.) Those services could be free or have a commercial angle— as in the case of the Tornado mixer, which had an associated TORN token that its operators planned to profit from.

The implication is clear: Tornado team is responsible for their role as operators of a mixing service, not for their part developers writing the code. That would have been protected speech if they had stopped short of deploying a publicly-accessible contract, leaving it in the realm of research. Instead they opted for “breathing life into the code,” by launching a contract, complete with a token model they fully controlled.

One key difference is the immutable nature of blockchains: it may be impossible to stop or modify a service once it has been deployed. It is as if AWS allowed launching services without a kill switch. Once launched, the service becomes an autonomous entity that neither the original deployer or Amazon itself can shut down. But that does not absolve the operator of responsibility for deploying the service in the first place. There is no engineering rule that prevents a smart-contract from having additional safeguards, such as the ability to upgrade its code to address defects or even to temporarily pause it when problems are discovered. Such administrative controls— or backdoors, depending on perspective— are now common practice for critical ethereum contracts, including stablecoins and decentralized exchanges. For that matter, contracts can incorporate additional rules to blacklist specific addresses or seize funds in response to law enforcement requests. Stablecoin operators do this all the time. Even Tether with its checkered history has demonstrated a surprising appetite for promptly seizing funds in response to court orders. The Tornado team may protest they have no way to shut down or tweak the service in response to “shocking” revelations that it is being leveraged by North Korea. From an ethical perspective, the only response to that protest is: they should have known better than to launch a service based on code without adequate safeguards in the first place.

Parallels with vulnerability disclosure

Arguments over when developers cross a line into legal liability is not new. Information security has been on the frontlines of that debate for close to three decades owing to the question of proper vulnerability disclosure. Unsurprisingly software vendors have been wildly averse to public dissemination of any knowledge regarding defects in their precious products. Nothing offended those sensibilities more than the school of “full disclosure” especially when accompanied by working exploits. But try as they would like to criminalize such activity with colorful yet misguided metaphors (“handing out free guns to criminals!”) the consensus remains that a security researcher purely releasing research is not liable for the downstream actions of other individuals leveraging their work— even when the research includes fully weaponized exploit code ready-made for breaking into a system. (One exception has been the content industry, which managed to fare better, thanks to a draconian anti-circumvention measure in the DMCA. While that legislation certainly had a chilling effect on pure security research on copyright protection measures, in practice most of the litigation has focused on going after those committing infringement rather than researchers who developed code enabling that infringement.) 

Debates still continue around what constitutes “responsible” disclosure, where society can strike the optimal balance between incentivizing vendors to fix security vulnerabilities promptly without making it easier for threat actors to exploit those same vulnerabilities. Absent any pressure, negligent/incompetent vendors will delay patches arguing that risk is low because there is no evidence of public exploitation. (Of course absence of evidence is not evidence of absence and in any case, vendors have little control over when malicious actors will discover exploits independently.) But here we can step back from the question of optimal social outcomes, and focus on the narrow question of liability. It is not the exploit developer writing code but the person executing that exploit against a vulnerable target who ought to be held legally responsible for the consequences. (To paraphrase a bumper-sticker version of second amendment advocacy: “Exploits don’t pop machines; people pop machines.”) In the same vein, Tornado Cash team is fully culpable for their deliberate decision to turn code into a service. Once they launched the contract on chain, they were no longer mere developers. They became operators.

CP

Behavioral economics on Ethereum: stress-testing censorship resistance

Predicated on studying the behavior of real people (distinct from the mythical homo economicus of theory) behavioral economics has the challenge of constructing realistic experiments in a labarotory setting. That calls for signing up a sizable group of volunteers and putting them into an artificial situation with monetary incentives to influence their decision-making process. Could blockchains in general and Ethereum in particular help by making it easier to either recruit those participants or setup the experimental framework? In this blog post we explore that possibility using a series of hypothetical experiments, building up from simple two-person games to an open-ended version of the tragedy of the commons.

1. Simple case: the ultimatum game

The ultimatum game is a simple experiment involving two participants that explores the notion of fairness. The participants are randomly assigned to either “proposer” or “responder” role. A pool of funds is made available to the proposer, who has complete discretion on making an offer to allocate those funds between herself and the responder. If the responder accepts, the funds are distributed as agreed. If the responder vetos the offer— presumably for being too skewed towards the proposer— no one receives anything.

This experiment and its variations are notable in showing an unexpected divergence from the theoretical “profit maximization” model of economics 101. Given that the responder has no leverage, one would expect they will begrudgingly settle for any amount, including wildly unfair splits where the proposer decides keeps 99% of the funds. Given that the proposer is also a rational actor aware of that dynamic, the standard model predicts such highly uneven offers being made… and accepted. Yet that is not what experiments show: most offers are close to 50/50 and most responders outright reject offers that are considered too unequal. (This only scratches the surface of the complex dynamics revealed in the experiment. Subtle changes to the framing— such as telling the responders a tall tale about the split being randomly decided by a computer program instead of a sentient being— changes their willingness to accept unfair splits; possibly because one does not take “offense” at the outcome of a random process the same way they might react to the perceived greed of the fellow on the other side.)

Putting aside the surprising nature of the results, it is straightforward to implement this experiment in Ethereum. We assume both the proposer and responder have ethereum wallets with known addresses. In order to run the experiment on chain, researchers can deploy a smart-contract and fund it with the promised reward. This contract would have three functions:

  1. One only callable by the proposer, stating the intended allocation of funds.
  2. One only callable by the responder, accepting/rejecting that proposed allocation. Depending on the answer, the contract would either distribute funds or return the entire amount back to the experiment team. 
  3. For practical reasons, one would also include an escape-hatch in the form of a third function that can be called by anyone after some deadline has elapsed to recover the funds in case one or both subjects fail to complete the expected task. Depending on the which side reneged on their obligation, it would award the entire reward to the other participant.

There are some caveats that could influence the outcome: both participants must hold some ether already at their designated address, in order to make smart-contract calls. Alternatively the experimenters can supply just enough ETH to both volunteers to pay for the expected cost of those calls. But that runs the risk of participants deciding to abscond with funds instead of proceeding with the experiment. (The responder in particular faces an interesting dilemma when confronted with an unfair split they are inclined to reject. On the one hand, registering their displeasure on-chain sends a message to the greedy proposer, at the cost of spending ETH they had been given for the experiment. On the other hand, simply transferring that ETH to a personal wallet allows the proposer to walk away with something but only at the cost of allowing the greedy proposer to keep 100% of the funds due to the assumed default.) This effect is diminished to the extent that the prize money up for grabs is much larger than the transaction fees the participants are required to part with. More generally, transaction fees determine whether running this experiment on-chain would be any more efficient from an experimental stance than enticing volunteers with free food.

More subtle is the effect of perceived privacy— or lack thereof— in influencing participant behavior. Would a proposer be less inclined to reveal their true colors and offer an unfair split when interacting on-chain versus in real life? On the one hand, blockchains are public: anyone can observe that a particular proposal was greedy. Having their actions permanently on the record for the whole world to observe may motivate participants to follow social mores. On the other hand, blockchain addresses are pseudonyms, without any identifying information. “Bravery of the keyboard” may result in fewer inhibitions about diverging from social expectations and making greedy offers when one is not directly interacting with other persons.

2. Open to all: the Ether auction

The ultimatum game is played between two players. Running that experiment still requires finding suitable volunteers, pairing them up and launching a smart-contracts for each pair. (The contract will only accept inputs from the proposer and responder, and as such must have awareness of their address.) But there are other experiments which can be run without requiring any advance coordination, beyond that of publicizing the existence of a smart-contract that implements the experimental setup. In effect anyone with an ethereum wallet can make an independent decision on whether they want to participate.

Tullock auctions in general and the better-known “dollar auction” in particular are a case in point. As with all auctions, the highest bidder wins by paying their offer. But unlike most auctions, everyone else who loses to that winning bid are still required to part with the amount they offered. Given those unforgiving dynamics— everyone except the winner must pay and still end up with nothing in return— it seems illogical that anyone would play along. Now consider the “dollar auction,” a slight variant that is simple enough to be demonstrated in classroom settings. The professor holds up a $100 bill and offers to give it to the highest bidder, subject to Tullock auction rules with $1 increments for bids. (Side note: in the original version, only the second-highest bidder is forced to pay while all others are spared. That still does not alter the underlying competitive dynamic between the leading two bidders.) Once the students get over their sense of disbelief that their wise teacher— economics professor by training, of all things—  is willing to part with a $100 bill for a pittance, this looks like a great deal. So one student quickly comes up with the minimum $1 bid, spotting an easy $99 profit. Unfortunately the next student sees an equally easy way to score $98 by bidding $2. Slightly less profit than the first bidder imagined achieving if they remained uncontested, but still a decent amount that any rational participant would rightfully chase after. It follows that the same logic could motivate fellow students to bid $3, $4 and higher amounts in an ever increasing cycle of bids. But even before the third student jumps in, there is one person who has suddenly become more motivated to escalate: the initial bidder. Having lost the lead, they are faced with the prospect of losing $1— since everyone must pay their bid, win or lose. That fundamentally alters their expected value calculus, compared to other students currently sitting on the sidelines. A student who has not entered the auction must decide between zero gain (by remaining a spectator in this experiment) or jumping into the fray to chase after the $100 being dangled in front of the class. By contrast a student who has been already out-bid is looking at a choice between a guaranteed loss of their original or escalating the bid to convert that losses into gains.

Informally these auctions distill the notion of “arms race” or “winner-take-all” situations, where multiple participants expend resources chasing after an objective but only one of them can walk away with the prize while everyone else sees their effort wasted. Economists cited examples where such dynamics are inadvertently created, for example in the competition between American cities competing to win HUD grants. (Presumably the same goes for countries making elaborate bids to host the next Olympics or FIFA World Cup, considering that only one of them will be granted the opportunity.)

Shifting this classroom exercise into Ethereum is straightforward: we create a smart-contract and seed it with the initial prize money of 1 ETH. The contract accepts bids from any address, provided it exceeds the current leading bid by some incremental amount. In fact the bid can be automatically deduced from the amount of ether sent. Participants do not need MetaMask or similar noncustodial wallets with contract invocation capabilities. They can simply send ETH to the contract from an online centralized exchange. Contracts can be design with a default payment function that is triggered when any ether is sent, even without an explicit function call. That default fallback function can take care of the book-keeping associated with bids, including distinguishing between initial vs updated bids. If an address was encountered before, any subsequent calls with value attached are considered an increment on top of the original bid. (That is, if you bid 0.5 ETH and now want to raise that offer to 0.7ETH in order to counter someone else bidding 0.6 ETH, your second call only need to attach the delta 0.2 ETH.) At some previously agreed upon block-height or time, the contract stops accepting any more bids. The same fallback function can be invoked by anyone to “finalize” the outcome and send the prize money to the winner, and presumably any left over ETH from incoming bids to a GoFundMe campaign for funding behavioral economics education.

While this seems straightforward, there are some subtle differences from the class-room setting due to the way Ethereum operates. These can distort the auction dynamics and create alternative ways to maximize profit. The most problematic one arises from the interaction of an auction deadline with the block-by-block manner ethereum transactions are added to the blockchain. One way to guarantee not being outbid is to make sure one is submitting the last bid. If the auction is set to conclude at a particular block height or even instance in time (ethereum blocks contain a timestamp) all rational actors would engage in a game of chicken, waiting until that last block to submit their bids.

In fact, since transactions are visible in mempool before they are mined, rational actors would continue to bide there time even during the interval for that block. Optimal strategy calls for waiting out other bidders to send their transactions, and only submit a higher bid after all of those suckers prematurely tipped their hand. Naturally this runs a different risk that the bid may arrive too late, if the block has already been constructed and attested without your bid getting the last laugh. (That also raises a question of what the contract should do with bids arriving after the deadline. It can bounce the ETH as a favor to the slow poke or alternatively, eat up the ETH without taking the bid into consideration as punishment for playing this game of chicken.)

Even this model is too simplistic for not taking MEV into account. Ethereum blocks are not simply constructed out of some “fair” ordering of all public transactions sitting around in the mempool according to gas paid. Miners have been operating a complex ecosystem of revenue optimization by accepting out-of-band bids— that is, payment outside the standard model of fees— to prioritize specific transactions or reorder transactions within a block. (This system became institutionalized with the switch to proof-of-stake.) Why would participants compete over ordering within a block? Because transactions are executed in the order they appear in the block, and there may be a significant arbitrage opportunity for the first mover. Suppose a decentralized exchange has temporarily mispriced a particular asset because the liquidity pools got out of balance. One lucky trader to  to execute a trade on that DEX can monopolize all of the available profit caused by the temporary dislocation. If the competition to get that transaction mined first were conducted out in the open— as the fee marketplace originally operated— the outcome would be a massively wasteful escalation in transaction fees. Those chasing the arbitrage opportunity would send transactions with increasing fees to convince miners to include their TX ahead of everyone else in the block. This is incredibly inefficient: while multiple such transactions can and will be included in the next block, only the earliest one gets to exploit the price discrepancy on the DEX and collect the reward, while everyone else ends up wasting transaction fees and block space for no good reason. If that dynamic sounds familiar, it is effectively a type of Tullock auction conducted in mempool.

MEV solves that problem by having market participants submit individual transactions or even entire blocks to miners/validators through a semi-private interface. Instead of competing with each other out in the open and paying for TX that did not “win” the race to appear first, MEV converts the Tullock auction into a garden-variety sealed bid first price auction where the losers are no longer stuck paying their bid.

By the same logic, MEV provides a way out of the bizarre dynamics created by the Ether auction: instead of publicly duking it out with bids officially registered on-chain or even sitting in mempool, rational participants can use a service such as Flashbots to privately compete for the prize. There is still no guarantee of winning— a block proposer could ignore Flashbots and just choose to claim the prize for themselves by putting their own bid TX first— but MEV removes the downside for the losers.

3. Tragedy of the commons: Luring Lottery

Here is a different experiment that MEV will not help with. For real life inspiration, consider an experiment Douglas Hofstadter ran in Scientific American in 1983. (Metamagical Themas covers this episode in great detail.) SciAm announced a lottery with a whopping million dollar bounty to be awarded to the lucky winner. The rules were simple enough: any one can participate by sending a postcard with a positive integer written on the card. Their odds of winning are proportional to the submitted number. There is one catch: the prize money is divided by the total of all submission received. In the “best case” scenario— or worst case, depending on the publisher perspective— only one person participates and sends in the number 1. At that point SciAm may be looking at chapter 11, because that lucky reader is guaranteed to collect the promised million dollars. Alternatively in a “worst case” scenario, millions of readers participate or a handful of determined contestants submit astronomically large numbers to win at all costs. SciAm lives to publish another issue, as the prize money is diluted down to a few dollars.

Unlike the dollar auction, there is nothing subtle about the dynamics here. The contest is manifestly designed to discourage selfish behavior: submitting large numbers (or even making any submission at all, since a truly altruistic reader could deliberately opt not to participate) will increase individual chances of winning, but reduce the overall prize. While the magazine editors were rightfully concerned about having to payout a very large sum if most readers did not take the bait, Hofstadter was not flustered.

No one familiar with Garrett Hardin’s fundamental insight in “The tragedy of the commons” will be surprised by what transpired: not only were there plenty of submissions, some creative readers sent entries that were massive— not expressed as ordinary numbers but defined with complex mathematical formulas, to the point where the contest organizers could not even compare these numbers to gauge probabilities. Not that it mattered, as the prize money was correspondingly reduced to an infinitesimal fraction of a penny. No payouts necessary.

Again this experiment can be run as an Ethereum smart-contract and this time MEV will not help participants game the system. As before, we launch a contract and seed it with the prize money ahead of time. It has two public functions:

  • One accepts submissions for a limited time. Unlike the SciAm lottery judged by sentient beings, we have to constrain submissions to actual integers (no mathematical formulas) and restrict their range to something the EVM can comfortably operate on without overflow, such as 2200. Any address can submit a number. They can even submit multiple entries; this was permitted in the original SciAm lottery which serves as the inspiration for the experiment. The contract keeps track of totals submitted for each address, along with a global tally to serve as denominator when calculating winning odds for every address.
  • The second function can be called by anyone once the lottery concludes. It selects a winner using a fair randomness source, with each address weighted according to the ratio of their total submissions to the overall sum. It also adjusts the payout according to the total submissions and sends the reward to the winner, returning the remainder back to the organizers. (Side note: getting fair randomness can be tricky on chain, requiring a trusted randomness beacon. Seemingly “random” properties such as the previous block hash can be influenced by participants trying to steer the lottery outcome to favor their entry.)

Given this setup, consider the incentives facing every Ethereum user when the lottery is initially launched. There is a million dollars sitting at an immutable contract that is guaranteed to pay out the prize money according to predefined rules. (Unlike SciAm, the contract can not file for bankruptcy or renege on the promise.) One call to the contract is all it takes to participate in the lottery. Win or lose, there is no commitment of funds beyond the transactions fees required for that call. This is a crucial difference from the Tullock auction where every bid represents a sunk cost for the bidder. Downsides are capped regardless of what other network participants are doing.

Also unlike the Tullock auction, there is little disincentive to participate early. There is no advantage to be gained by waiting until the last minute and submitting the last entry. Certainly one can wait to submit a number much higher than previous entries to stack the odds, but doing so also reduces the expected payout. Not to mention that since submissions are capped in a finite range, participants can simply submit the maximum number to begin with. Meanwhile, MEV can not help with the one problem every rational actor would like to solve: prevent anyone else from joining the lottery. While it would be great to be the only participant or at least one of a handful of participants with an entry, existing MEV mechanisms can not indefinitely prevent others from participating in the lottery. At best bidders could delay submissions for a few blocks by paying to influence the content of those blocks. It would require a coordinated censorship effort to exclude all transactions destined to the lottery contract for an extended period of time.

If anything MEV encourages early participation. No one can be assured they will have the “last say” in the construction of the final block before the lottery stops accepting submissions. Therefore the rational course of action is to submit an entry earlier to guarantee a chance at winning. In fact there may even be optimal strategies around “spoiling” the payout for everyone else immediately with a large entry, such that the expected reward for future entries is barely enough to offset transaction fees for any future participant. This is an accelerated tragedy of the commons— equivalent to the herdsmen from Hardin’s anecdote burning the grasslands to discourage other herdsmen from grazing their cattle on the same land.

Testing censorship resistance

Alternatively the ether auction and Luring Lottery serve as empirical tests of miner censorship on Ethereum. In both cases, rational actors have a strong incentive to participate themselves while preventing others from participating. This is because participation by others reduces the expected gain. For the ether auction, being outbid results in going from guaranteed profit to guaranteed loss. For the Tullock lottery, competition from other participants is not quite as detrimental but it undermines expected return in two ways: by reducing the probability of winning and slashing the prize money on offer. It follows that rational actors have an incentive to censor transactions from other participants.

If 100% reliable censorship is possible, then both experiments have a trivial winning strategy for the censor. For the Tullock auction, submit the minimum acceptable bid and prevent anyone else from making higher offers. Likewise for the Luring Lottery, send the number “1” and block anyone else from submitting an entry that would dilute the prize money.

On Ethereum, block proposers are in the best position to engage in such censorship at minimal cost. While they would be foregoing fees associated with the censored TX, that loss is dwarfed by the outsized gain expected from monopolizing the full rewards available from the hypothetical contract. Could such censorship be sustained indefinitely? This seems unlikely, even if multiple validators were colluding under an agreement to split the gains. It only takes a defection from a single proposer to get a transaction included. Validators could choose to pursue a risky strategy of ignoring the undesirable block, on the pretense that the proposer failed to produce a block in time as expected. They could then wait for a block from the next “compliant” proposer who follows the censorship plan. This approach will fail and incur additional penalties if other attesters/validators accept the block and continue building on top of it. Short of near universal agreement on the censorship plan— as with OFAC compliance— a coalition with sufficient validator share is unlikely to materialize. On the other hand 100% reliable censorship is not necessary for the Luring lottery: block proposers can not stop other proposers from participating when it is their turn, but they can certainly exclude any competing TX from their own blocks. That effectively limits participation to proposers or at best ethereum users aligned with a friendly proposer willing to share the prize. But such tacit collusion would be highly unstable: even if every proposer correctly diagnosed the situation and limited themselves to a single submission of “1” to avoid unnecessary escalation, there would still be an incentive to rig the lottery with a last-minute submission that dwarfs all previous entries. 

CP

[Edited: May 4th]

Browser in the middle: 25 years after the MSFT antitrust trial

In May 1998 the US Department of Justice and the Attorneys General of 20 states along with the District of Columbia sued Microsoft in federal court, alleging predatory strategies and anticompetitive business practices. At the heart of the lawsuit was the web browser Internet Explorer, and strong-arm tactics MSFT adopted with business partners to increase the share of IE over the competing Netscape Navigator. 25 years later in a drastically altered technology landscape, DOJ is now going after Google for its monopoly power in search and advertising. With the benefit of hindsight, there are many lessons in the MSFT experience that could offer useful parallels for the new era of antitrust enforcement, as both sides prepare for the trial in September.

The first browser wars

By all indications, MSFT was late to the Internet disruption. The company badly missed the swift rise of the open web, instead investing in once-promising ideas such such interactive TV or walled-garden in the style of early Prodigy. It was not until the Gates’s 1995 “Tidal Wave” memo that the company began to mobilize its resources. Some of the changes were laughably amateurish— teams hiring for dedicating “Internet program manager” roles. Others proved more strategic, including the decision to build a new browser. In the rush to get something out the door, the first version of Internet Explorer was based on Spyglass Mosaic, a commercial version of the first popular browser NCSA Mosaic developed at the University of Illinois. (The team behind Mosaic would go on to create Netscape Navigator.)  Even the name itself betrayed the Windows-centric and incrementalist attitude prevalent in Redmond: “explorer” was the name of the Windows GUI or “shell” for browsing content on the local machine. Internet Explorer would be its networked cousin helping users explore the wild wild web.

By the time IE 1.0 shipped in August 1995, Netscape already had a commanding lead in market share, not to mention the better product measured in features and functionality. But by this time MSFT had mobilized its considerable resources, greatly expanding investment in the browser team, replacing Spyglass code with its own proprietary implementation. IE3 was the first credible version to have some semblance of feature parity against Navigator, having added support for frames, cascading stylesheets, and Javascript. It was also the first time MSFT started going on the offensive, and responding with its own proprietary alternatives to technologies introduced by Netscape. Navigator had the Netscape Plugin API (NPAPI) for writing browser extensions; IE introduced ActiveX— completely incompatible with NAPI but entirely built on other MSFT-centric technologies including COM and OLE. Over the next two years this pattern would repeat as IE and Navigator duked it out for market share by introducing competing technologies. Netscape allowed web pages to run dynamic content with a new scripting language Javascript; MSFT would support that in the name of compatibility but also subtly try to undermine JS by pushing vbscript, based on the Visual Basic language so familiar to existing Windows developers.

Bundle of trouble

While competition heated up over functionality— and chasing fads, such as the “push” craze of the late 1990s that resulted in Channel Definition Language— there was one weapon uniquely available to MSFT for grabbing market share: shipping IE with Windows. Netscape depended on users downloading the software from their website. Quaint as this sounds in 2024, it was a significant barrier to adoption in an age when most of the world had not made the transition to being online. How does one download Navigator from the official Netscape website if they do not have a web browser to begin with? MSFT had a well-established channel exempt from this bootstrapping problem: copies of Windows distributed “offline” using decidedly low-tech means such as shrink-wrapped boxes of CDs or preinstalled on PCs. In principle Netscape could seek out similar arrangements with Dell or HP to include its browser instead. Unless of course MSFT made the OEMs an offer they could not refuse.

That became the core of the government accusation for anticompetitive practices: MSFT pushed for exclusive deals, pressuring partners such as PC manufactures (OEM or “original equipment manufacturer in industry lingo) to not only include a copy of Internet Explorer with prominent desktop placement but also rule out shipping any alternative browsers. Redmond clearly had far more leverage than Mountain View over PC manufacturers: shipping any browser at all was icing on the cake, but a copy of the reigning OS was practically mandatory.

What started out as a sales/marketing strategy rapidly crossed over into the realm of software engineering when later releases of Windows began to integrate Internet Explorer in what MSFT claimed was an inextricable fashion. The government objected to this characterization: IE was an additional piece of software downloaded from the web or installed from CDs at the consumer’s discretion. Shipping a copy with Windows out-of-the-box may have been convenient to save users the effort of jumping through those installation hoops, but surely a version of Windows could also be distributed without this optional component

 When MSFT objected that these versions of Windows could not function properly without IE, the government sought out a parade of expert witnesses to disprove this. What followed was a comedy of errors on both sides. One expert declared the mission accomplished after removing the icon and primary executable, forgetting about all of the shared libraries (dynamic link library or DLL in Windows parlance) that provide the majority of browser functionality. IE was designed to be modular, to allow “embedding” the rendering engine or even subsets of functionality such as the HTTP stack into as many applications as possible. The actual “Internet Explorer” icon users clicked on was only the tip of the iceberg. Deleting that was the equivalent of arguing that the electrical system in a car can be safely removed by smashing the headlights and noting the car still drives fine without lights. Meanwhile MSFT botched its own demonstration of how a more comprehensive removal of all browser components results in broken OS functionality. A key piece of evidence entered by the defense was allegedly a screen recording from a PC showing everything that goes wrong with Windows when IE components are missing. Plaintiffs lawyers were quick to point out strange discontinuities and changes in the screenshots, eventually forcing MSFT into an embarrassing admission that the demonstration was spliced together from multiple sequences.

Holding back the tide

The next decade of developments would vindicate MSFT, proving that company leadership was fully justified in worrying about the impact of the web. MSFT mobilized to keep Windows relevant, playing the game on two fronts:

  1. Inject Windows dependencies into the web platform, ensuring that even if websites were accessible on any platform in theory, they worked best on Windows viewed in IE. Pushing ActiveX was a good example of this. Instead of pushing to standardize cross-platform APIs, IE added appealing features such as the initial incarnation of XML HTTP request as ActiveX controls. Another example was the addition of Windows-specific quirks into the MSFT version of Java. This provoked a lawsuit from Sun for violating the trademark “Java” with incompatible implementation. MSFT responded by deciding to remove the JVM from every product that previously shipped it.
  2. Stop further investments into the browser once it became clear that IE won the browser wars.  The development of IE4 involved a massive spike of resources. That release also marked the turning of the tide, and IE starting to win out in comparisons against Navigator 4. IE5 was an incremental effort by comparison. By IE6, the team had been reduced to a shadow of itself where it would remain for the next ten years until Google Chrome came upon the scene. (Even the “security push” in the early 2000s culminating in SP2 focused narrowly on cleaning up the cesspool of vulnerabilities in the IE codebase. It was never about adding features and enhancing functionality for a more capable web.)

This lack of investment from MSFT had repercussions far beyond the Redmond campus. It effectively put the web platform into deep freeze. HTML and JavaScript evolved very quickly in the 1990s. HTML2 was published as an RFC in 1995. Later the World Wide Web Consortium took up the mantle of standardizing HTML. HTML3 came out in 1997. It took less than a year for HTML4 to be published as an official W3C “recommendation”— what would be called a standard under any other organization. This was a time of rapid evolution for the web, with Netscape, MSFT and many other companies participating to drive the evolution forward. It would be another 17 years before HTML5 followed.

Granted MSFT had its own horse in the race with MSN, building out web properties and making key investments such as the acquisition of Hotmail. Some even achieved a modicum of success, such as the travel site Expedia which was spun out into a public company in 1999. But a clear consensus had emerged inside the company around the nature of software development. Applications accessed through a web browser were fine for “simple” tasks, characterized by limited functionality, with correspondingly low performance expectations: minimalist UI, laggy/unresponsive interface, only accessible with an Internet connection and even then constrained by the limits of broadband at the time. Anything more required native applications, installed locally and designed to target the Windows API. These were also called “rich clients” in a not-so-subtle dig at the implied inferiority of web applications.

Given that bifurcated mindset, it is no surprise the web browser became an afterthought in the early 2000s. IE had emerged triumphant from the first browser wars, while Netscape disappeared into the corporate bureaucracy of AOL following the acquisition. Mozilla Firefox was just staring to emerge phoenix-like from the open-sourced remains of the Navigator codebase, far from posing any threat to market share. The much-heralded Java applets in the browser that were going to restore parity with native applications failed to materialize. There were no web-based word processors or spreadsheets to compete against Office. In fact there seemed to be hardly any profitable applications on the web, with sites still trying to work out the economics of “free” services funded by increasingly annoying advertising. 

Meanwhile MSFT itself had walked away from the antitrust trial mostly unscathed. After losing the initial round in federal court after a badly botched defense, the company handily won at the appellate court. In a scathing ruling the circuit court not only reversed the breakup order but found the trial judge to have engaged in unethical, biased conduct. Facing another trial under a new judge, the DOJ blinked and decided it was no longer seeking a structural remedy. The dramatic antitrust trial of the decade ended with a whimper: the parties agreed to a mild settlement that required MSFT to modify its licensing practices and better documents its APIs for third-parties to develop interoperable software.

This outcome was widely panned as a minor slap-on-the-wrist by industry pundits, raising concerns that it left the company without any constraints to continue engaging in the same pattern of anticompetitive behavior. In hindsight the trial did have an important consequence that was difficult to observe from the outside: it changed the rules of engagement within MSFT. Highly motivated to avoid another extended legal confrontation that would drag on share price and distract attention, leadership grew more cautious about pushing the envelope around business practices. It may have been too little too late for Netscape, but this shift in mindset meant that when the next credible challenger to IE materialized in the shape of Google Chrome, the browser was left to fend for itself, competing strictly on its own merits. There would be no help from the OS monopoly.

Second chances for the web

More than any other company it was Google responsible for revitalizing the web as a capable platform for rich applications. For much of the 2000s, it appeared that the battle for developer mindshare had settled into a stalemate: HTML and Javascript were good for basic applications (augmented by the ubiquitous Adobe Flash for extra pizzaz when necessary) but any heavy lifting— CPU intensive computing, fancy graphics, interacting with peripheral devices— required a locally installed desktop application. Posting updates on social life and sharing hot-takes on recent events? Web browsers proved  perfectly adequate for that. But if you planned to crunch numbers on a spreadsheet with complex formulas, touch up high-resolution pictures or hold a video conference, the consensus held that you needed “real” software written in a low-level language such as C/C++ and directly interfacing with the operating system API.

Google challenged that orthodoxy, seeking to move more applications to the cloud. It was Google continually pushing the limits of what existing browsers can do, often with surprising results. Gmail was an eye opener for its responsive, fast UI as much as it was for the generous gigabyte of space every user received and the controversial revenue model driven by contextual advertising based on the content of emails. Google Maps— an acquisition, unlike the home-grown Gmail which had started out as one engineer’s side project— and later street view proved that even high resolution imagery overlaid with local search results could be delivered over existing browsers with decent user experience. Google Docs and Spreadsheets (also acquisitions) were even more ambitious undertakings aimed at the enterprise segment cornered by MSFT Office until that point.

These were mere opening moves in the overall strategic plan: every application running in the cloud, accessed through a web browser. Standing in the way of that grand vision was the inadequacy of existing browsers. They were limited in principle by the modest capabilities of standard HTML5 and Javascript APIs defined at the time, without venturing into proprietary, platform-dependent extensions such as Flash, Silverlight and ActiveX. They were hamstrung in practice even further thanks to the mediocre implementation of those capabilities by the dominant browser of the time, namely Internet Explorer. What good would innovative cloud applications do when users had to access them through a buggy, slow browser riddled with software vulnerabilities? (There is no small measure of irony that the 2009 “Aurora” breach of Google by Chinese APT started out with an IE6 SP2 zero-day vulnerability.)

Google was quick to recognize the web browser as a vital component for its business strategy, in much the same way MSFT correctly perceived the danger Netscape posed. Initially Google put its weight behind Mozilla Firefox. The search deal to become the default engine for Firefox (realistically, did anyone want Bing?) provided much of the revenue for the fledgling browser early on. While swearing by the benefits of having an open-source alternative to the sclerotic IE, Google would soon realize that a development model driven by democratic consensus came with one undesirable downsides: despite being a major source of revenue for Firefox, it could exert only so much influence over the product roadmap. For Google controlling its own fate all but made inevitable to embark on its own browser project.

Browser wars 2.0

Chrome was the ultimate Trojan horse for advancing the Google strategy: wrapped in the mantle of “open-source” without any of the checks-and-balances of an outside developer community to decide what features are prioritized (a tactic that  Android would soon come to perfect in the even more cut-throat setting of mobile platforms.) Those lack of constraints allowed Google to move quickly and decisively on the main objective: advance the web platform. Simply shipping a faster and safer browser alone would not have been enough to achieve parity with desktop applications. HTML and Javascript itself had to evolve.

More than anything else Chrome gave Google a seat at the table for standardization of future web technologies. While work on HTML5 had started in 2004 at the instigation of Firefox and Opera representatives, it was not until Chrome reignited the browser wars that bits and pieces of the specification began to find their way into working code. Crucially the presence of a viable alternative to IE meant standardization efforts were no longer an academic exercise. The finished output of W3C working groups is called a “recommendation.” That is no false modesty in terminology because at the end of the day W3C has no authority or even indirect influence to compel browser publishers to implement anything. In a world where most users are running an outdated version of IE (with most desktops were stuck on IE6 SP2 or IE7) the W3C can keep cranking out enhancements to HTML5 on paper without delivering any tangible benefit to users. It’s difficult enough to incentivize websites to take advantage of new features. The path of least resistance already dictates coding for the least common denominator. Suppose some website crucially depends on a browser feature missing from 10% of visitors who are running an ancient version of IE. Whether they do not care enough to upgrade, or perhaps can not upgrade as with enterprise users at the mercy of their IT department for software choice, these users will be shut out of using the website, representing a lost revenue opportunity. By contrast a competitor with more modest requirements from their customers’ software, or alternatively a more ambitious development mindset dedicated to backwards compatibility will have no problem monetizing that segment.

The credibility of a web browser backed by the might of Google shifted that calculus. Observing the clear trend with Chrome and Firefox capturing market share from IE (and crucially, declining share of legacy IE versions) made it easier to justify building new applications for a modern web incorporating the latest and greatest from the W3C drawing board: canvas, web-sockets, RTC, offline mode, drag & drop, web storage… It no longer seemed like questionable business judgment to bet on that trend and build novel applications assuming a target audience with modern browsers. In 2009 YouTube engineers snuck in a banner threatening to cut off support for IE, careful to stay under the radar lest their new overlords at Google object to this protest. By 2012 the tide had turned to the point that an Australian retailer began imposing a surcharge on IE7 users to offset the cost of catering to their ancient browser.

While the second round of the browser wars is not quite over, some conclusions are obvious. Google Chrome has a decisive lead over all other browsers especially in the desktop market. Firefox share is declining, creating doubts about the future of the only independent open-source web browser that can claim the mantle of representing users as stakeholders. As for MSFT, despite getting its act together and investing in auto-update functionality to avoid getting stuck with another case of the “IE6 installed-base legacy” problem, Internet Explorer versions have steadily lost market share during the 2010s. Technology publications cheered on every milestones such as the demise of IE6 and the “flipping” point when Google Chrome reached 50%. Eventually Redmond gave up and decided to start over with a new browser altogether dubbed “Edge,” premised on a full rewrite instead of incremental tweaks. That has not fared much better either. After triumphantly unveiling a new HTML rendering engine written from scratch to replace IE’s “Trident,” MSFT quickly threw in the towel, announcing that it would adopt Blink— the engine from Chrome. (In as much as MSFT of the 1990s was irrationally combative in its rejection of technology not invented in Redmond, its current incarnation had no qualms admitting defeat and making pragmatic business decisions to leverage competing platforms.) Despite multiple legal skirmishes with EU regulators over its ads and browser monopoly, there are no credible challengers to Chrome on the desktop today. When it comes to market power, Google Chrome is the new IE.

The browser disruption in hindsight

Did MSFT overreact to the Netscape Navigator threat and knee-cap itself by inviting a regulatory showdown through its aggressive business tactics? Subsequent history vindicates the company leadership in correctly judging the disruption potential but not necessarily the response. It turned out the browser was indeed a critical piece of software— it literally became the window users through which users experience the infinite variety of content and applications beyond the narrow confines of their local device. Platform-agnostic and outside the control of companies providing the hardware/software powering that local device, it was an escape hatch out of the “Wintel” duopoly. Winning the battle against Netscape diffused that immediate threat for MSFT. Windows did not become “a set of poorly debugged device drivers” as Netscape’s Marc Andreessen had once quipped.

An expansive take on “operating system”

MSFT was ahead of its time in another respect: browsers are considered an intrinsic component of the operating system, a building block for other applications to leverage. Today a consumer OS shipping without some rudimentary browser out-of-the-box would be an anomaly. To pick two examples comparable to Windows:

  • MacOS includes Safari starting with the Panther release in 2003.
  • Ubuntu desktop releases come with Firefox as the default browser.

On the mobile front, browser bundling is not only standard but pervasive in its reach:

  • iOS not only ships a mobile version of Safari but the webkit rendering engine is tightly integrated into the operating system, as the mandatory embedded browser to be leverage by all other apps that intend to display web content. In fact until recently Apple forbid shipping any alternative browser that is not built on webkit. The version of “Chrome” for iOS is nothing more than a glossy paint-job over the same internals powering Safari. Crucially, Apple can enforce this policy. Unlike desktop platforms with their open ecosystem where users are free to source software from anywhere, mobile devices are closed appliances. Apple exerts 100% control on software distribution for iOS.
  • Android releases have included Google Chrome since 2012. Unlike iOS, Google has no restrictions on alternative browsers as independent applications. However embedded web views in Android are still based on the Chrome rendering engine.

During the antitrust trial, some astute observers pointed out that only a few years ago even the most rudimentary networking functionality— namely the all important TCP/IP stack— was an optional component in Windows. Today it is not only a web browser that has become table stakes. Here are three examples of functionality once considered strictly distinct lines of business from providing an operating system:

  1. Productivity suites: MacOS comes with Pages for word processing, Sheets for spreadsheets and Keynote for crafting slide-decks. Similarly many Linux distributions include LibreOffice suite which includes open-source replacements for Word, Excel, PowerPoint etc. (This is a line even MSFT did not cross: to this day no version of Windows includes a copy of the “Office suite” understood as a set of native applications.)
  2. Video conferencing and real-time collaboration: Again each vendor has been putting forward their preferred solution, with Google including Meet (previously Hangouts), Apple promoting FaceTime and MSFT pivoting to Teams after giving up on Skype.
  3. Cloud storage: To pick an example where the integration runs much deeper, Apple devices have seamless access to iCloud storage, Android & ChromeOS are tightly coupled to Google Drive for backups. Once the raison d’être of unicorn startups Dropbox and Box, this functionality has been steadily incorporated into the operating system casting doubt on the commercial prospects of these public companies. Even MSFT has not shied away from integrating its competing OneDrive service with Windows.

There are multiple reasons why these examples raise few eyebrows from the antitrust camp. In some cases the applications are copycats or also-rans: Apple’s productivity suite can interop with MSFT Office formats (owing in large part to EU consent decree that forced MSFT to start documenting its proprietary formats) but still remains a pale imitation of the real thing. In other cases, the added functionality is not considered a strategic platform or has little impact on the competitive landscape. FaceTime is strictly a consumer-oriented product that has no bearing on the lucrative enterprise market. While Teams and Meet have commercial aspirations, they face strong headwinds competing against established players Zoom and WebEx specializing in this space. No one is arguing that Zoom is somehow disadvantaged on Android because it has to be installed as a separate application from Play Store. But even when integration obviously favors an adjacent business unit— as in the case of mobile platforms creating entrenched dependencies on the cloud storage offering from the same company— there is a growing recognition that the definition of an “operating system” is subject to expansion. Actions that once may have been portrayed as leveraging platform monopoly to take over another market— Apple & Google rendering Dropbox irrelevant— become the natural outcome of evolving customer expectations.

Safari on iOS may look like a separate application with its own icon, but it is also the underlying software that powers embedded “web views” for all other iOS apps when those apps are displaying web content inside their interface. Google Chrome provides a similar function for Android apps by default. No one in their right mind would resurrect the DOJ argument of the 1990s that a browser is an entirely separate piece of functionality and weaving it into the OS is an arbitrary marketing choice without engineering merits. (Of course that still leaves open the question of whether that built-in component should be swappable and/or extensible. Much like authentication or cryptography capabilities for modern platforms have an extensibility mechanism to replace default, out-of-the-box software with alternatives, it is fair to insist that the platform allow substituting a replacement browser designated by the consumer.) Google turned the whole model upside down with Chromebooks, building an entire operating system around a web browser.

All hail the new browser monopoly

Control over the browser temporarily handed MSFT significant leeway over the future direction of the web platform. If that platform remained surprisingly stagnant afterwards— compared to its frantic pace of innovation during the 1990s— that was mainly because MSFT had neither the urgency or vision to take it to the next level. (Witness the smart tags debacle.) Meanwhile the W3C ran around in circles, alternating between incremental tweaks— introducing XHTML, HTML repackaged as well-formed XML— and ambitious visions of a “semantic web.” The latter imagined a clean separation of content from style, two distinct layers HTML munged together, making it possible for software to extract information, process it and combine it in novel ways for the benefit of users. Outside the W3C there were few takers. Critics derided it as angle-brackets-everywhere: XSLT, XPath, XQuery, Xlink. The semantic web did not get the opportunity it deserved for large-scale demonstration to test its premise. For a user sitting in front of their browser and accessing websites, it would have been difficult to articulate the immediate benefits. Over time Google and ChatGPT would prove machines were more than adequate for grokking unstructured information on web pages even without the benefit of XML tagging.

Luckily for the web, plenty of startups did have more compelling visions of how the web should work and what future possibilities could be realized— given the right capabilities. This dovetailed nicely with the shift in emphasis from shipping software to operating services. (It certainly helped that the economics were favorable. Instead of selling a piece of software once for a lump sum and hope the customer upgrades when the next version comes out, what if you could count on a recurring source of revenue from monthly subscriptions?) The common refrain for all of these entrepreneurs: the web browser had become the bottleneck. PCs kept getting faster and even operating systems became more capable over time, but websites could only access a tiny fraction of those resources through HTML and Javascript APIs, and only through a notoriously buggy, fragile implementation held together by duct-tape: Internet Explorer.

In hindsight it is clear something would change; there was too much market pressure against a decrepit piece of software guarding an increasingly untenable OS monopoly. Surprisingly that change came in the form of not one but two major developments in the 2010s. One shift had nothing to do with browsers: smart-phones gave developers a compelling new way to reach users. It was a clean slate, with new powerful APIs unconstrained by the web platform. MSFT did not have a credible response to the rise of iOS and Android any more than it did to Chrome. Windows Mobile never made much inroads with device manufacturers, despite or perhaps because of the Nokia acquisition. It had even less success winning over developers, failing to complete the virtuous cycle between supply & demand that drives platform. (At one point a desperate  MSFT started outright offering money to publishers of popular apps to port their iOS & Android apps to Windows Mobile.)

Perhaps the strongest evidence that MSFT judged the risk accurately comes from Google Chrome itself. Where MSFT saw one-sided threat to the Windows and Office revenue streams, Google perceived a balanced mix of opportunity and risk. The “right” browser could accelerate the shift to replace local software with web applications— such as the Google Apps suite— by closing the perceived functionality gap between them. The “wrong” browser would continue to frustrate that shift or even push the web towards another dead-end proprietary model tightly coupled to one competitor. Continued investment in Chrome is how the odds get tilted towards the first outcome. Having watched MSFT squander its browser monopoly with years of neglect, Google knows better than to rest on its laurels.

CP

The elusive nature of ownership in Web3

A critical take on Read-Write-Own

In the recently published “Read Write Own,” Chris Dixon makes the case that blockchains allow consumers to capture more of the returns from the value generated in a network because of the strongly enshrined rules of ownership. This is an argument about fairness: the value of networks is derived from the contributions of participants. Whether it is Facebook users sharing updates with their network or Twitter/X influencers opining on latest trends, it is Metcalfe’s law that allows these systems to become so valuable. But as the history of social networks has demonstrated time and again, that value accrues to a handful of employees and investors who control the company. Not only do customers not capture any of those returns (hence the often used analogy of “sharecroppers” operating on Facebook’s land) they are stuck with the negative externalities, including degraded privacy, disinformation and in the case of Facebook, repercussions that spill out into the real-world including outbreaks of violence.

The linchpin of this argument is that blockchains can guarantee ownership in ways that the two prevailing alternatives (“protocol networks” such as SMTP or HTTP and the better-known “corporate networks” such as Twitter) can not. Twitter can take away any handle, shadow-ban the account or modify their ranking algorithms to reduce its distribution. By comparison if you own a virtual good such as some NFT issued on a blockchain, no one can interfere with your rightful ownership of that asset. This blog post delves into some counterarguments on why this sense of ownership may prove illusory in most cases. The arguments will run from the least-likely and theoretical to most probably, in each case demonstrating ways these vaunted property rights fail.

Immutability of blockchains

The first shibboleth that we can dispense with is the idea that blockchains operate according to immutable rules cast in stone. An early dramatic illustration of this came about in 2016, as a result of the DAO attack on Ethereum. The DAO was effectively a joint investment project operated by a smart-contract on the Ethereum chain. Unfortunately that contract had a serious bug, resulting in an critical security vulnerability. An attacker exploited that vulnerability to drain most of the funds, to tune of $150MM USD notional at the time.

This left the Ethereum project with a difficult choice. They could double down on the doctrine that Code-Is-Law and let the theft stand: argue that the “attacker” did nothing wrong, since they used the contract in exactly the way it was implemented. (Incidentally, that is a mischaracterization of the way Larry Lessig intended that phrase. “Code and other laws of cyberspace” where the phrase originates was prescient in warning about the dangers in allowing privately developed software, or “West Coast Code” as Lessig termed it, to usurp democratically created laws or “East Coast Code” in regulating behavior.) Or they could orchestrate a difficult, disruptive hard-fork to change the rules governing the blockchain and rewrite history to pretend the DAO breach never occurred. This option would return stolen funds back to investors.

Without reopening the charged debate around which option was “correct” from an ideological perspective, we note the Ethereum foundation emphatically took the second route. From the attacker perspective, their “ownership” of stolen ether proved very short lived.

While this episode demonstrated the limits of blockchain immutability, it is also the least relevant to the sense of property rights that most users are concerned about. Despite fears that the DAO rescue could set a precedent and force the Ethereum foundation to repeatedly bailout vulnerable projects, no such hard-forks followed. Over the years much larger security failures occurred on Ethereum (measured in notional dollar value) with the majority attributed with high confidence to rogue states such as North Korea. None of them merited so much as a serious discussion of whether another hard-fork is justified to undo the theft and restore the funds to rightful owners. If hundreds of million dollars in tokens ending up in the coffers of a sanctioned state does not warrant breaking blockchain immutability, it is fair to say the average NFT holder has little reason to fear that some property dispute will result in blockchain-scale reorganization that takes away their pixelated monkey images.

Smart-contract design: backdoors and compliance “features”

Much more relevant to the threat model of a typical participant is the way virtual assets are managed on-chain: using smart-contracts that are developed by private companies and often subject to private control. Limiting our focus to Ethereum for now, recall that the only “native” asset on chain is ether. All other assets such as fungible ERC-20 tokens and collectible NFTs must be defined by smart contracts, in other words software that someone authors. Those contracts govern the operation of the asset: conditions under which it can be “minted”— in other words, created out of thin air—transferred or destroyed. To take a concrete example: a stablecoin such as Circle (USDC) is designed to be pegged 1:1 to the US dollar. More USDC is issued on chain when Circle the company receives fiat deposits from a counterparty requesting virtual assets. Similarly USDC must be taken out of circulation or “burned” when a counterparty returns their virtual dollars and demands ordinary dollars back in a bank account.

None of this is surprising. As long the contract properly enforces rules around who can invoke those actions on chain, this is exactly how one envisions a stablecoin to operate. (There is a separate question around whether the 1:1 backing is maintained, but that can only be resolved by off-chain audits. It is outside the scope of enforcement by blockchain rules.) Less appreciated is the fact that most stablecoins contracts also grant the operator ability to freeze funds or even seize assets from any participant. This is not a hypothetical capability; issuers have not shied away from using it when necessary. To pick two examples:

While the existence of such a “backdoor” or “God mode” may sound sinister in general, these specific interventions are hardly objectionable. But it serves to illustrate the general point: even if blockchains themselves are immutable and arbitrary hard-forks a relic of the past, virtual assets themselves are governed not by “native” rules ordained by the blockchain, but independent software authored by the entity originating that asset. That code can include arbitrary logic granting the issuer any right they wish to reserve.

To be clear, that logic will be visible on-chain for anyone to view. Most prominent smart-contracts today have their source code published for inspection. (For example, here is the Circle USD contract.) Even if the contract did not disclose its source code, the logic can be reverse engineered from the low-level EVM bytecode available on chain. In that sense there should be no “surprises” about whether an issuer can seize an NFT or refuse to honor a transfer privately agreed upon by two parties. One could argue that users will not purchase virtual assets from issuers who grant themselves such broad privileges to override property rights by virtue of their contract logic. But that is a question of market power and whether any meaningful alternative exists for consumers who want to vote with their wallet. It may well become the norm that all virtual assets are subject to permanent control by the issuer, something users accept without a second thought much like the terms-of-use agreements one clicks through without hesitation when registering for advertising-supported services. The precedent with stablecoins is not encouraging: Tether and Circle are by far the two largest stablecoins by market capitalization. The existence of administrative overrides in their code was no secret. Even multiple invocations of that power has not resulting in a mass exodus of customer into alternative stablecoins.

When ownership rights can be ignored

Let’s posit that popular virtual assets will be managed by “fair” smart-contracts without designed-in backdoors that would enable infringement of ownership rights. This brings us to the most intractable problem: real-world systems are not bound by ownership rights expressed on the blockchain.

Consider the prototypical example of ownership that proponents argue can benefit from blockchains: in-game virtual goods. Suppose your game character has earned a magical sword after significant time spent completing challenges. In most games today, your ownership of that virtual sword is recorded as an entry in the internal database of the game studio, subject to their whims. You may be allowed to trade it, but only on a sanctioned platform most likely affiliated with the same studio. The studio could confiscate that item because you were overdue on payments or unwittingly violated some other rule in the virtual universe. They could even make the item “disappear” one day if they decide there are too many of these swords or they grant an unfair advantage. If that virtual sword was instead represented by an NFT on chain, the argument runs, the game studio would be constrained in these types of capricious actions. You could even take the same item to another gaming universe created by a different publisher.

On the face of it, this argument looks sound, subject to the caveats about the smart-contract not having backdoors. But it is a case of confusing the map with the territory. There is no need for the game publisher to tamper with on-chain state in order to manipulate property rights; nothing prevents the game software from ignoring on-chain state. On-chain state could very well reflect that you are the rightful owner of that sword while in-game logic refuses to render your character holding that object. The game software is not running on the blockchain or in any way constrained by the Ethereum network or even the smart-contract managing virtual goods. It is running on servers controlled by a single company— the game studio. That software may, at its discretion, consult the Ethereum blockchain to check on ownership assignments. That is not the same as being constrained by on-chain state. Just because the blockchain ledger indicates you are the rightful owner of a sword or avatar does not automatically force the game rendering software to depict your character with those attributes in the game universe. In fact the publisher may deliberately depart from on-chain state for good reasons. Suppose an investigation determines that Bob bought that virtual sword from someone who stole it from Alice. Or there have been multiple complaints about a user-designed avatar being offensive and violating community standards. Few would object to the game universe being rendered in a way that is inconsistent with on-chain ownership records under these circumstances. Yet the general principle stands: users are still subject to the judgment of one centralized entity on when it is “fair game” to ignore blockchain state and operate as if that virtual asset did not exist.

Case of the disappearing NFT

An instructive case of  “pretend-it-does-not-exist” took place in 2021 when Moxie Marlinspike created a proof-of-concept NFT that renders differently depending on which website it is viewed from. Moxie listed the NFT on OpenSea, at the time the leading marketplace for trading NFTs. While it was intended in good spirit as a humorous demonstration of the mutability and transience of NFTs, OpenSea was not amused. Not only did they take down the listing, but the NFT was removed from the results returned by OpenSea API. As it turns out, a lot of websites rely on that API for NFT inventories. Once OpenSea ghosted Moxie’s API, it is as if the NFT did not exist. To be clear: OpenSea did not and could make any changes to blockchain state. The NFT was still there on-chain and Moxie was its rightful owner as far as the Ethereum network is concerned. But once the OpenSea API started returning alternative facts, the NFT vanished from view for every other service relying on that API instead of directly inspecting the blockchain themselves. (It turns out there were a lot of them, further reinforcing Moxie’s critique of the extent of centralization.)

Suppose customers disagree with the policy of the game studio. What recourse do they have? Not much within that particular game universe, anymore than the average user has any leverage with Twitter or Facebook in reversing their trust & safety decisions. Users can certainly try to take the same item to another game but there are limits to portability. While blockchain state is universal, game universes are not. The magic sword from the medieval setting will not do much good in a Call Of Duty title set in WW2.

In that sense, owners of virtual game assets are in a more difficult situation than Moxie with his problematic NFT. OpenSea can disregard that NFT but can not preclude listing on competing marketplaces or even arranging private sales to a willing buyer who values it on collectible or artistic merits. It would be the exact same situation if OpenSea for some bizarre reason came to insist that you do not own a bitcoin that you rightfully own on blockchain. OpenSea persisting at such a delusion would not detract in any way from the value of your bitcoin. Plenty of sensible buyers exist elsewhere who can form an independent judgment about blockchain state and accept that bitcoin in exchange for services. But when the value of a virtual asset is determined primarily by its function within a single ecosystem— namely that of the game universe controlled by a centralized publisher— what those independent observers think about ownership status carries little weight.

CP

We can bill you: antagonistic gadgets and dystopian visions of Philip K Dick

Dystopian visions

Acclaimed science-fiction author Isaac Asimov’s stories on robots involved a set of three rules that all robots were expected to obey:

  1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
  2. A robot must obey orders given it by human beings except where such orders would conflict with the First Law.
  3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

The prioritization leaves no ambiguity in the relationship between robots and their creators. Regardless of their level of artificial intelligence and autonomy, robots were to avoid harm to human beings. Long before sentient robots running amok and turning against their creators became a staple of science-fiction (Mary Shelly’s 19th century novel“Frankenstein” could be seen as their predecessor) Asimov systematically was formulating the intended relationship. Ethical implications of artificial intelligence are a recurring theme today. Can our own creations end up undermining humanity after they achieve sentience? But there are far more subtle and less imaginative ways that technology works against people in everyday settings, and this involves emergent AI to blame. This version too was predicted by science-fiction.

Three decades after Asimov, the dystopian imagination of Philip K Dick produced a more conflicted relationship between man and his creations. In the 1969 novel “Ubik”, the protagonist inhabits a world where advanced technology controls basic household functions from kitchen appliances to locks on the door. But there is a twist: all of these gadgets operate on what today would be called a subscription model. The coffee-maker refuses to brew the morning cup of joe until coins are inserted. (For all the richness and wild imagination of his alternate realities, PKD did not bother devising an alternative payment system for this future world.) When he runs out of money, he is held hostage at home; his front-door will not open without coins.

Compared to some of the more fanciful alternate universes brought to life in PKD fiction— Germany winning World War II in “The man in the high castle” or an omniscient police-state preemptively arresting criminals before they commit crimes as in “The minority report”— this level of dystopia is mild, completely benign. But it is also one that bears a striking resemblance to where the tech industry is stridently marching towards. Consumers are increasingly losing control over their devices— devices which they have fully and rightfully paid for. Not only is the question of ownership being challenged with increasing restrictions on what they can do to hardware they have every right to expect 100% control over, but those devices are actively working against the interests of the consumer, doing the bidding of third-parties be it the manufacturer, the service-provider or possibly the government.

License to tinker

There is a long tradition in American culture of hobbyists tinkering with their gadgets. This predates the Internet or the personal computer. Perhaps automobiles and motorcycles were the first technology which lent itself to mass tinkering. Mass production made cars accessible to everyone and for those with a knack for spending long hours in the garage, they were relatively easy to modify- for different aesthetics or better performance under the hood. In one sense the hot-rodders of the 1950s and 1960s were one of the cultural predecessors of today’s software hobbyists. Cars at the time were relatively low-tech; with carburetors and manual transmission being the norm, spending a few months in high school shop-class would provide adequate background. But more importantly the platform was tinkering-friendly. Manufacturers did not go out of their way to prevent buyers from modifying their hardware. That is partly related to technical limitations. It is not as-if cars could be equipped with tamper-detection sensors to immobilize the vehicle if the owner installed parts the manufacturer did not approve of. But more importantly, ease of customization was itself considered a competitive advantage. In fact some of the most cherished vehicles of the 20th century including muscle-cars, V-twin motorcycles and air-cooled Volkswagens owed part of their iconic status to their vibrant aftermarket for mods.

Natural limits existed on how far owners could modify their vehicle. To drive on public-roads, it had to be road-legal after all. One could install a different exhaust system to improve engine sound, but not have flames shooting out the back. More subtly an economic disincentive existed. Owners risked giving up on their warranty coverage for modified parts, a significant consideration given that Detroit was not exactly known for high-quality, low-defect manufacturing at the time. But even that setback was localized. Replace the stereo or rewire the speakers yourself, and you can no longer complain about electrical system malfunctions. But you would still expect the transmission to operate as advertised and the manufacturer to continue honoring any warranty coverage for the drivetrain. There was no warning sticker anywhere that loosening this or that bolt would void the entire warranty on every other part of the vehicle. Crucially consumers were given a meaningful choice: you are free to modify the car for personal expression in exchange for giving up warranty claims against the manufacturer.

From honor code to software enforcement

Cars from the golden-era of hot-rodding were relatively dumb gadgets. Part of the reason manufacturers did not have much of a say in how owners could modify their vehicle is that they had no feasible technology to enforce those restrictions once the proud new owner drove it off the lot. By contrast, software can enforce very specific restrictions on how a particular system operates. In fact it can impose entirely arbitrary limitations to disallow specific uses of the hardware, even when the hardware itself is perfectly capable of performing those functions.

Here is an example. In the early days Windows NT 3.51 had two editions: workstation and server, differentiated by the type of scenario they were intended for. The high-end server SKU supported machines with up to 8 processors while the workstation maxed out at 2. If you happened to have more powerful hardware, even if you did not need any of the bells-and-whistles of server, you had to spring for the more expensive product. (Note: there is a significant difference between uniprocessor and multiprocessor kernels; juggling multiple CPUs requires substantial changes but going from 2 to 8 processors does not.) What was the major difference between those editions? From an economical perspective, $800 measured in 1996 dollars. From a technology perspective, handful of bytes in a registry key describing which type of installation occurred. As noted in a 1996 article titled Differences Between NT Server and Workstation Are Minimal:

“We have found that NTS and NTW have identical kernels; in fact, NT is a single operating system with two modes. Only two registry settings are needed to switch between these two modes in NT 4.0, and only one setting in NT 3.51. This is extremely significant, and calls into question the related legal limitations and costly upgrades that currently face NTW users.”

There is no intrinsic technical reason why the lower-priced edition could not take advantage of more powerful hardware, or for that matter, allow more than 10 concurrents connections to function as a web-server— as Microsoft later relented after customer backlash. These are arbitrary calls made by someone on sales team who, in their infinite wisdom, concluded that customers with expensive hardware or web-sites ought to pay more for their operating system.

Two tangents worth exploring about this case. First the proprietary nature of the software and its licensing model is crucial for enforcing these types of policies. Arbitrary restrictions would not fly with open-source software. If a clueless vendor shipped a version of Linux with random, limit on the number of CPUs or memory which does not originate from technical limitations, customers could modify the source-code to lift that restriction. Second, the ability to enforce draconian restrictions dreamt up by marketing is greatly constrained by platform limitations. That’s because the personal computer is an open platform. Even with a proprietary operating system such as Windows, users get full control over their machine. You could edit the registry or tamper with OS logic to trigger an identity crisis between workstation/server.  Granted, that would be an almost certain violation of the shrink-warp license nobody read when installing the OS. MSFT would not look kindly upon this practice if carried out large scale. It’s one thing for hobbyists to demonstrate the possibility as a symbolic gesture; it is another level of malicious intent for an enterprise with thousands of Windows licenses to engage in systematic software piracy by giving themselves a free upgrade. So at the end of the day, enforcement still relied on messy social norms and imperfect contractual obligations. Software did not aspire to replace the conscience of the consumer, to stop them from perceived wrongdoing at all costs.

Insert quarters to continue

In fact software licensing in the enterprise has a history of such arbitrary restrictions enforced through a combination of business logic implemented in proprietary-software along with dubious reliance on over-arching “terms of use” that discourage tampering with said logic. To this day copies of Windows Server are sold with client-licenses, dictating the number of concurrent users that the server is willing to support. If the system is licensed for 10 clients, the eleventh user attempting to connect will be turned away regardless of how much spare CPU or memory capacity is left. You must purchase more licenses. In other words: insert quarters to continue.

Yet this looks very different than Philip K Dick’s dystopian coffeemaker and does not elicit anywhere near the same level of indignation. There are several reasons for that. First, enterprise software has acclimatized to the notion of discriminatory pricing. Vendors extract higher prices from companies who are in a position to pay. The reasoning goes: if you can afford that fancy server with a dozen CPUs and boatloads of memory, surely you can also spring for the high-end edition of Windows server that will agree to fully utilize the hardware? Second, the complex negotiations around software licensing are rarely surfaced to end-users. It is the responsibility of the IT department to work out how many licenses are required and determine the right mix of hardware/software required to support the business. If an employee is unable to perform her job because she is turned down by a server having reached its cap on maximum simultaneous users—an arbitrary limit that exists only in the realm of software licensing it must be noted, not in the absolute resources available in the underlying hardware— she is not expected to solve that problem by taking out her credit-card and personally paying for the additional license. Finally this scenario is removed from everyday considerations. Not everyone works in a large enterprise subject to stringent licensing rules, and even for those who are unlucky enough to run into this situation, the inconvenience created by an uncooperative server is relatively mild- far cry from the front-door that refuses to open and locks its occupants inside.

From open platforms to appliances

One of more disconcerting trends of the past decade is that what used to be norm in the enterprise segment is now trickling down into the consumer space. We may not have  coffeemakers operating on a subscription model yet. Doors that demand payment for performing their basic function would likely never pass fire-code regulations. But gradually consumer electronics have started imposing greater restrictions on their alleged owners, restrictions that are equally arbitrary and disconnected from the capabilities of hardware, chosen unilaterally by their manufacturers. Consider some examples from consumer electronics:

  • Region coding in DVD players. DVD players are designed to play content manufactured only for a specific region, even though in principle there is nothing that prevents the hardware from being able to play discs purchased anywhere in the world. Why? Because of disparities in purchasing power, DVDs are priced much lower in developing regions than they are in Western countries. If DVD players sold to American consumers could play content, it would suddenly become possible to “arbitrage” this price difference by purchasing cheap DVDs in say Taiwan and play them in the US. Region coding protects the revenue model of content providers, which depends crucial on price discrimination: charging more to US consumers for the same thing as Taiwanese consumers because they can afford to pay higher prices for movies.
  • Generalizing from the state of DVD players, any Digital Rights Management (or as it has been derisively called “digital restrictions management”)  technology is an attempt to hamper the capabilities of software/hardware platforms to further the interests of content owners. While rest of the software industry is focused on doing more with existing resources— squeeze more performance out of the CPU, add more features to an application that users will enjoy— those working on DRM are trying to get devices to do less. Information is inherently copyable; DRM tries to stop users from copying bits. By default audio and video signals can be freely sent to any output device; HDMI tries to restrict where they can be routed in the name of battling piracy. The restrictions do not even stop with anything involving content. Because the PC platform is inherently open, DRM enforcement inevitably takes an expansive view of its mission and begins to monitor the user for any signs that they could be doing perfectly valid activity that could potentially undermine DRM such as installing unsigned device drivers or enabling kernel-mode debugging on Windows.
  • Many cell phones sold in North America are “locked” to a specific carrier, typically the one where the customer bought their phone from. It is not possible to switch to another wireless carrier while keeping the device. Again there is no technical reason for this. Much like the number of processors that an operating system will agree to run on, it is an arbitrary setting. (In fact it takes more work to implement such checks.) The standard excuse is that cost of devices are greatly subsidized by the carrier in hidden costs buried into the service contract. But this argument fails basic sanity checks. Presumably the subsidy is paid-off after some number of months but phones remain locked. Meanwhile customers who bring their own unlocked device are not rewarded with any special discounts, effectively distorting the market. Also carriers already charge an early termination fee to customers who walk away from their contract prematurely, surely they can also include additional costs to cover the lost subsidy?
  • Speaking of cell phones, they are increasingly becoming more and more locked down appliances to use the terminology from Zittrain’s “The future of the Internet,” instead of open computing platforms. Virtually all PCs allow users to replace the operating system. Not a fan of Windows 8? Feel free to wipe the slate clean and install Linux. Today consumers can even purchase PCs preloaded with Linux to escape the dreaded “Microsoft tax” where cost of Windows licenses are implicitly factored into hardware prices. And if the idea of Linux-on- the-desktop turns out to be wishful thinking yet again, you can repent and install Windows 10 on that PC which came with Ubuntu out of the box. By contrast phones ship with one operating system picked by the all-knowing manufacturer and it is very difficult to change that. On the surface, consumers have plenty of choice because they can pick from thousands of apps written for that operating system. Yet one level below that, they are stuck with the operating system as an immutable choice. In fact, some Android devices never receive software updates from the manufacturer or carrier, so they are “immutable” in a very real sense. Users must go out of their way to exploit a security vulnerability in order to jailbreak/root their devices to replace the OS wholesale or even extend its capabilities in ways the manufacturer did not envision. OEMs further exploit this confusion to discourage users from tinkering with their devices, trying to equate such practices with weakening security— as if users are better off sticking to their abandoned “stock” OS with known vulnerabilities that will never get patched.

Unfree at any speed

Automative technology too is evolving in this direction of locked-down appliances. Cars remained relatively dumb until the 1990s when microprocessors slowly started making their way into every system, starting with engine management. On the way to becoming more software- driven, effectively computers-on-wheels, something funny happened: the vehicle gained greater capability to sense present conditions and more importantly, it became capable of reacting to these inputs. Initially this looks like an unalloyed good. All of the initial applications are uncontroversial, improving occupant safety: antilock brakes, airbags and traction control. All depend on software monitoring input from sensors and promptly responding to signals indicating that a dangerous condition is imminent.

The next phase may be less clear-cut, as enterprising companies continue pushing the line between choice and coercion. Insurers such as Geico offer pay-per-mile plans that use gadgets attached to the OBD-II port to collect statistics on how far the vehicle is driven, and presumably on how aggressively the driver attacks corners. While some may consider this an invasion of privacy, at least there is a clear opt-out: do not sign up for that plan. In other cases, opt-out becomes ambiguous. GM found itself in a pickle over the Stingray Corvette recording occupants with a camera in the rearview mirror. This was a feature not a bug, designed to create YouTube-worthy videos while the car was being put through its paces. But if occupants are not aware that they are being recorded, it is not clear they consented to appearing as extras in a Sebastian-Vettel-role-playing game. At the extreme end of the informed consent scale is use of remote immobilizers for vehicles sold to consumers with subprime credit. In these cases the dealers literally get a remote kill-switch for disabling operation of the vehicle if the consumer fails to stay current on payments. (At least that is the idea; NYT reports allegations of mistaken or unwarranted remote shutdown by unscrupulous lenders.) One imagines the next version of these gadgets will incorporate a credit-card reader to better approximate the PKD dystopia. Insert quarters to continue.

What is at stake here is a question of fairness and rights, but not in the legal sense. Very little has changed about the mechanics of consumer financing: purchasing a car on credit still obligates the borrower to make payments promptly until the balance is paid off. Failure to fulfill that obligation entitles the seller to repossess the vehicle. This is not some new-fangled notion of how to handle loans in default; the right to repossess or foreclose has always existed on the books. In practice exercising that right often required some dramatic, made-for-TV adventures in for tracking down the consumer or vehicle in question. Software has greatly amplified ability of lenders to enforce their rights and collect on their entitlements under the law.

From outright ownership to permanent tenancy

Closely related is a shift from ownership to subscription models. Software has made it possible to recast what used to be one-time purchases into ongoing subscriptions or pay-per-use models. Powerful social norms exist around how goods are distributed according to one or other model. No one expects that they can pay for electricity or cable with a lump-sum payment once and call it a day, receiving free service in perpetuity. If you stop paying for cable, the screen will eventually go blank. By contrast hardware gadgets such as television sets are expected to operate according to a different model: once you bring it home, it is yours. It may have been purchased with borrowed money, with credit extended by the store or your own credit-card issuer. But consumers would be outraged if their bank, BestBuy or TV manufacturer remotely reached out to brick their television in response to late on payments. Even under most subscription models, there are strict limitations on how service providers can retaliate against consumers breaking the contract.  If you stop paying for water, the utility can shut off future supply of water. They can not send vigilantes over to your house to drain the water tank or “take back” water you are presumably no longer entitled to.

Such absurd scenarios can and do happen in software. Perhaps missing the symbolism, Amazon remotely wiped copies of George Orwell’s 1984 from Kindles over copyright problems. (The irony can only be exceeded if Amazon threatens to remove  copies of Philip K Dick’s “Ubik” unless customers pay up.) These were not die-hard Orwell fans or DMCA protestors deliberately pirating the novel; they had purchased their copies from the official Amazon store. Yet the company defended its decision, arguing that the publisher who had offered those novels on its marketplace lacked the proper rights. Kindle is a locked-down appliance where Amazon calls the shots and customers have no recourse no matter however arbitrary those decisions appear.

What about computers? It used to be the case that if you bought a PC, it was yours for the keeping. It would continue running until its hardware failed.  In 2006 Microsoft launched FlexGo, a pay-as-you-go model for PC ownership in emerging markets. Echoing the words of the used car-salesmen on the benefits bestowed on consumers while barely suppressing a sense of colonialist contempt, a spokesperson for a partnering bank in Brazil enthuses: “Our lower-income customers are excited to finally buy their first PC with minimal upfront investment, paying for time as they need it, and owning a computer with superior features and genuine software.” (Emphasis on genuine software, since consumers in China or Brazil never had any problem getting their hands on pirated versions of Windows.) MSFT takes a more measured approach in touting the benefits of this alternative: “Customers can get a full featured Windows-enabled PC with low entry costs that they can access using prepaid cards or through a monthly subscription.” Insert quarters to continue.

FlexGo did not did not crater like “Bob,” Vista or others in the pantheon of MSFT disasters. Instead it faded into obscurity, having bet on the wrong vision of “making computing accessible” soon rendered irrelevant on both financial and technology grounds. Hardware prices continued to drop Better access to banking services and consumer credit meant citizens in developing countries got access to flexible payment options to buy a regular PC, without an OEM or software vendor in the loop to supervise the loan or tweak the operating system to enforce alternative licensing models. More dramatically the emergence of smartphones cast into doubt whether everyone in Brazil actually needed that “full-featured Windows-enabled PC” in the first place to cross the digital divide.

FlexGo may have disappeared but the siren song of subscription models still exerts its pull on the technology industry. Economics favor such models on both sides. Compared to the infrequent purchase of big-ticket items, the steady revenue stream from monthly subscribers smooths out seasonal fluctuations in revenue. From the consumer perspective, making “small” monthly payments over time instead of one big lump payment may look more appealing due to cognitive biases.

If anything the waning of PC as the dominant platform paves the way for this transformation. Manufacturers can push locked-down “appliances” without the historical baggage associated with the notion of a personal computer. Ideas that would never fly on the PC platform, practices that would provoke widespread consumer outrage and derision—locked boot-loaders, mandatory data-collection, always-on microphones and cameras, remote kill capabilities— can become the new normal for a world of locked-down appliances. In this ecosystem users no longer “own” their devices in the traditional sense; even if they were paid in full and no one can legally show up at the door to demand their return. These gadgets suffer from a serious case of split-personality disorder. On the one hand they are designed to provide some useful service to their presumed “owner;” this is the ostensible purpose they are advertised and purchased for. At the same time the gadget software contains business logic to serve the interests of the device manufacturer/service-provider/whoever happens to actually control the bits running there. These two goals are not always aligned. In a hypothetical universe with efficient markets, one would expect strong correlation. If the gadget deliberately sacrificed functionality to protect the manufacturer’s platform or artificially sustain an untenable revenue model, enlightened consumers would flock to an alternative from a competitor that is not saddled with such baggage. In reality such competitive dynamics operate imperfectly if at all, and the winner-takes-all nature of many market segments means that it is very difficult for a new entrant to make significant gains against entrenched leaders by touting openness or user-control as distinguishing feature. (Case in point: troubled history of open-source mobile phone projects and their failure to reach mass adoption.)

Going against the grain?

If there are forces counteracting the irresistible pull of locked-down appliances, they will face an uneven playing field. The share of PCs continues to decline among all consumer devices; Android has recently surpassed Windows as the most common operating systems on the Internet. Meanwhile the highly fashionable Internet of Things (IoT) notion is predicated on blackbox devices which are not programmable or extensible by their ostensible owners. It turns out that in some cases, they are not even managed by the manufacturer; just ask owners of IP cameras who devices were unwittingly enrolled into the Mirai botnet.

Consumers looking for an alternative face a paradoxical situation. On the one hand, there is a dearth of off-the-shelf solutions designed with user rights in mind. The “market” favors polished solutions such as the Nest thermostat, where hardware, software and cloud services are inextricably bundled together. Suppose you are a fan of the hardware but skeptical about how much private information it is sending to a cloud service provider? Tough luck; there is no cherry picking allowed. On the other hand, there has never been a better time to be tinkering with hardware: Arduino, Raspberry Pi and a host of other low-cost embedded platforms made it easier than ever to put together your own custom solution. This is still a case of payment in exchange for preserving user rights but it is a uniquely undemocratic system. But this “payment” is in the form of additional time spent to engineer and operate home-brew solutions. More worrisome is that such capabilities are only available to a small number of people, distinguished by their ability to renegotiate the terms service providers attempt to impose on their customer base. While that capability is to be celebrated—and it is why every successful jailbreak of a locked-down appliance is celebrated in the security community— it is fundamentally undemocratic by virtue of being restricted to a new ruling class of technocrats.

CP

[Update: Edited Feb 27th to correct typo.]