Bitcoin’s meta problem: governance (part I)

Layer 9: you are here

Bitcoin has room for improvement. Putting aside regulatory uncertainty, there is the unsustainable waste of electricity consumed by mining operations, unclear profitability for miners as block rewards decrease and last but not least, difficulty scaling beyond its Lilluputian capacity of handling only a few transactions per second globally. (You want to pay for something using Bitcoin? Better hope not many other people have that same idea in the next 10 minutes or so.) In theory all of these problems can be solved. What stands in the way of a solution is not the hard reality of mathematics; this is not a case of trying to circle the square or solve the halting problem. Neither are they insurmountable engineering problems. Unlike calls for designing “secure” systems with built-in backdoors accessible only to good guys, there is plenty of academic research and some real-world experience building trusted, distributed systems to show the way. Instead Bitcoin the protocol is running into problems squarely at “layer 9:” politics and governance.

This last problem of scaling has occupied the public agenda recently and festered into a full-fledged PR crisis last year with predictions of the end of Bitcoin. Much of the conflict focusing on the so-called “block-size”- the maximum size of each virtual page added to the global ledger of all transactions maintained by the system. More space in that page, more transactions can be squeezed in. That matters for throughput because the protocol also fixes the rate at which pages can be added, to roughly one every 10 minutes. But TANSTAAFL still holds: there are side-effects to increasing this limit, which was first put in place by Satoshi himself/herself/themselves to mitigate denial-of-service attacks against the protocol.

Game of chicken

Two former Bitcoin Core developers found this out the hard way last summer when they tried to force the issue. They created a fork of the popular open-source implementation of bitcoin (Bitcoin Core) called BitcoinXT with support for expanded block size. The backlash came swift and loud. XT did not go anywhere, its supporters were banned from Reddit forums and the main developer rage-quit Bitcoin entirely with a scathing farewell. But that was not the end of the scaling experiment. Take #2 followed shortly afterwards as a new fork dubbed Bitcoin Classic, with more modest and incremental changes to block-size to address criticisms in XT. As of this writing, Classic has more traction than XT ever managed but remains far from reaching the 75% threshold required to trigger a permanent change in protocol dynamics.

Magic numbers and arbitrary decisions

This is a good time to step back and ask the obvious question: why is it so difficult to change the Bitcoin protocol? There are many arbitrary “magic numbers” and design choices hard-coded in the design:

Money supply is fixed at 21 million bitcoins.
Each block rewards the miner 50 bitcoins, but that reward halves periodically with the next decrease expected around June of this year
Mining uses a proof-of-work algorithm based on the SHA2 hash function
Proof-of-work construction encourages the creation of special-purpose ASIC chips, because they have significant efficiency advantages over using ordinary CPUs or GPUs that ship with off-the-shelf PCs/servers.
That same design is “pool-friendly:” its design permits the creation of mining pools, where a centralized pool operator coordinates work by thousands of independent contributors and distributes rewards based on share of work coordinated.
Difficulty level for that proof-of-work is adjusted roughly around ~2000 blocks, with the goal of making the interval between blocks 10 minutes
Transactions are signed using ECDSA algorithm over one specific elliptic-curve secp256k1
And of course, blocks are limited to 1MB in size

Where did all of these decisions come from? To what extent are they fundamental aspects of Bitcoin—it would not be “Bitcoin” as we understand it without that property— as opposed to arbitrary decisions made by Satoshi that could have gone a different way? What is sacred about the number 21 million? (It is half of 42, the answer to the meaning of life?) Each of the decisions can be questioned, and in fact many have been challenged. For example, proof-of-stake has been offered as an alternative to proof-of-work to halt runaway costs and CO2 emissions of electricity wasted on mining. Meanwhile later designs such as Ethereum tailor their proof-of-work system explicitly to discourage ASIC mining, by reducing the advantage such custom hardware would have over vanilla hardware. Other researchers proposed discouraging mining by making it possible for the participant who solves the PoW puzzle to keep the reward, instead of having it automatically returned to the pool operator for distribution. One core developer even proposed (and later withdrew) a special-case adjustment to block difficulty for upcoming change to block rewards. It was motivated by the observation that many mining operations will become unprofitable when rewards are cut in half, powering off their rigs and resulting in a significant drop in total mining power that will remain uncorrected for a significant time as blocks are mined at a slower rate.

Some of these numbers reflect limitations or trade-offs necessitated by current infrastructure. For example, one can imagine a version of Bitcoin that runs twice as fast, generating blocks every 5 minutes instead of 10. But that version would require each node running the software to exchange data twice as fast among themselves, because Bitcoin relies on a peer-to-peer network for distributing transactions and mined blocks. This goes back to the same objection levied against large-block proposals such as XT and Classic. Many miners are based in countries with high-latency, low-bandwidth connections such as China, a situation not helped by economics that drive mining operations to locate to the middle of nowhere, close to cheap source of power such as dams, but away from fiber. There is a legitimate concern that if bandwidth requirements escalate- either because blocks sizes go up or alternatively because blocks are minted more frequently- they will not be able to keep up But what happens when those limitations go away, when multi-gigabit pipes are available to even the most remote locations and the majority of mining power is no longer constrained by networking?

Planning for change

Once we acknowledged that change is necessary, the question becomes how such changes are made. This is as much a question of governance as it is of technology. Who gets to make the decision? Who gets veto power? Does everyone have to agree? What happens to participants who are not on board with the new plan?

Systems can be limited because of a failure in either domain. Some protocols were designed with insufficient versioning and forwards-compatibility; that means it is very difficult for them to operate in a heterogeneous environment consisting of “old” and “new” versions existing side-by-side. That makes it very difficult to introduce upgrades, because everyone must coordinate on a “flag-day” to upgrade everything at once. In other cases, the design is flexible enough to allow small, local improvements, but the incentives for upgrading are absent. Perhaps the benefits for upgrade are not compelling enough or there is no single entity in charge of the system capable of forcing all participants to go along.

For example, credit-card networks have long been aware of the vulnerabilities associated with magnetic-stripe cards. Yet it has been a slow uphill battle to get issuing-banks to replace existing cards and especially merchants to upgrade their point-of-sale terminals to support EMV. Incidentally that is a relatively centralized system: card-networks such as Visa and MasterCard sit in the middle of every transaction, mediating the movement of funds from the bank that issued the credit-card to the merchant. Visa/MC call the shots around who gets to participate in this network and under what conditions, with some limits defined by regulatory watch-dogs worried about concentration in this space. In fact it was their considerable leverage over banks/merchants which allowed card networks to push for EMV upgrade in the US, by dangling economic incentives/penalties in front of both sides. Capitalizing on the climate of panic in the aftermath of Target data-breach, these networks were able to move forward with their upgrade objectives.

[continued in part II]

Future-proofing software updates: Global Platform and lessons from FBiOS (part III)

[continued from part II]

(Full disclosure: this blogger worked on Google Wallet 2011-2013)

Caveats

Global Platform constrains the power of manufacturers and system operators to insert back-doors into a deployed system after the fact. But there are some caveats to cover before we jump to any conclusion about how that could have altered the dynamics of FBI/Apple skirmish. There are still some things that a rogue card-manager— or more precisely “someone who knows one of the issuer security-domain keys” in GP terminology, a role often outsourced to trusted third-parties after deployment— can try to subvert security policy. These are not universal attacks. Instead they depend on implementation details outside the scope of GP.

Card OS vulnerabilities

Global Platform is agnostic about what operating system is running on the hardware, and for that matter the isolation guarantees provided by the OS for restricting each application to its own space. If that isolation boundary is flawed and application A can steal or modify data owned by application B, there is room for the issuer to work around GP protections. While there is no way to directly tamper with the internal state of that application B, one can install a brand-new application B that exploits the weak isolation to steal private data from A. Luckily most modern card OSes in fact do provide isolation between mutually distrustful applications, along with limited facilities for interaction provided both sides opt-in to exchanging messages with another application. For example JavaCard-based systems apply standard JVM restrictions around access to memory, type-safety and automatic memory management.

Granted, implementation bugs in these mechanisms can be exploited to breach containment. For example early JavaCard implementations did not even implement the full-range of bytecode checks expected of a typical JVM. Instead they call for a trusted off-card verifier to weed out malformed byte-code prior to installing the application. This is a departure from the security guarantees provided by a standard desktop implementation of JVM. In theory the JVM can handle hostile byte-code by performing necessary static and run-time checks to maintain integrity of the sandbox. (In reality JVM implementations have been far from perfect in living up to that expectation.) The standard excuse for the much weaker guarantees in JavaCard goes back to hardware limitations. Performing these additional checks on-card adds to complexity of the JVM implementation which must run on the limited CPU/memory/storage environment of the card. The problem is, off-card verification is useless against a malicious issuer seeking to install deliberately malformed Java bytecode with the explicit goal of breaking out of the VM.

It’s worth pointing out that this is not a generic problem with card operating systems, but a specific case of cutting-corners in some versions of a common environment. Later generations of JavaCard OS have increasingly hardened their JVM and reduced dependence on off-card verification, to the point that at least some manufacturers claim installing applets with invalid byte-code will not permit breaking out of the JVM sandbox.

Global state shared between applications

Another pitfall is shared state across applications. For example, GP defines a card global PIN object that any application on the card can use for authenticating users. This makes sense from a usability perspective. It would be confusing if every application on the card had its own PIN and users have to remember whether they are authenticating to the SSH app vs GPG app for instance. But the downside of the global PIN is that applications installed with the right privilege can change it. That means a rogue issuer can install a malicious app designed to reset that PIN, undermining an existing application which relied on that PIN to distinguish authorized access.

There is a straight-forward mitigation for this: each application can instead use its own, private PIN object for authorization checks, at the expense of usability. (Factoring out PIN checks into an independent application accessed via inter-process communication is not trivial. A malicious issuer could replace that applet by a back-doored version that always returns “yes” in response to any submitted PIN, while keeping the same application identifier. Some type of authenticated channel is required.) In many scenarios this is already inevitable due to the limited semantics of the global PIN object, including mobile payments such as Apple Pay and Google Wallet which support multiple interfaces and retain PIN verification state during a reset of the card.

Hardware vulnerabilities

There is another way OS isolation can be defeated: by exploiting the underlying hardware. Some of these involve painstakingly going after the persistent storage, scraping data while the card is powered off and all software checks out of the picture. Others are more subtle, relying on fault-injection to trigger controlled errors in the implementation of security checks such as by using a focused laser-beam to induce bit flips. Interestingly enough, these exploits can be aided by installing new, colluding applications on the card designed to create a very specific condition (such as specific memory layout) susceptible to that fault. For example, this 2003 paper describes an attack involving Java byte-code deliberately crafted to take advantage of random bit-flip errors in memory. In other words, while issuer privileges do not directly translate into 0wning the device outright, they can facilitate exploitation of other vulnerabilities in hardware.

Defending against Apple-gone-rogue

Speaking of Apple, there is a corollary here for the FBiOS skirmish. Manufacturers, software vendors and cloud-service operators all present a clear danger to the safety of their own customers. These organizations can be unknowingly compromised by attackers interested in going after customer data; this is what happened to Google in 2009 when attackers connected to China breached the company. Or they can be compelled by law-enforcement as in the case of Apple, called on to attack their own customers.

“Secure enclave” despite the fancy name is home-brew proprietary technology from Apple without a proven track-record or anywhere near the level of adversarial security research aimed at smart-cards. While actual details of the exploit used by FBI to gain access to the phone are still unknown, one point remains beyond dispute: Apple could have complied with the order. Apple could have updated the software running in the secure enclave to weaken previously enforced security guarantees on any phone of that particular model. That was the whole reason this dispute went to court: Apple argued that the company is not required to deliver such an update, without ever challenging the FBI assertion that it was capable of doing that.

Global Platform mitigates against that scenario by offering a different model for managing multiple applications on a trusted execution environment. If disk-encryption and PIN verification were implemented in GP-compliant hardware, Apple would not face the dilemma of subverting that system after the fact. Nothing in Global Platform permits even the most-privileged “issuer” from arbitrarily taking control of an exiseting application already installed. Apple could even surrender card-manager keys for that particular device to the FBI and it would not help FBI defeat PIN verification, absent some other exploit against the card OS or hardware.

SE versus eSE

The strange part: there is already a Global Platform-compliant chip included in newer generation iPhones. It does not look like a “card.” That word evokes images plastic ID cards with specific dimensions and rounded corners, known by the standard ISO 7810 ID1. While that may have been the predominant form-factor for secure execution environments when GP specifications emerged, these days such hardware comes in different shapes and incarnations. On mobile devices, it goes by the name embedded secure element—another “SE” that has no relationship to the proprietary Apple secure enclave. For all intents and purposes, eSE is the same type of hardware one would find on a chip & PIN enabled credit-card being issued by US banks today to improve security of payments. In fact mobile payments over NFC was the original driver for shipping phones equipped with an eSE, starting with Google Wallet. While Google Wallet (now Android Pay) later ditched eSE entirely, Apple picked up the same hardware infrastructure, even same manufacturer (NXP Semiconductors) for its own payments product.

The device at the heart of the FBI/Apple confrontation was an iPhone 5C, which lacks an eSE; Apple Pay is only supported on iPhone6 and later iterations. But even on these newer models, eSE hardware is not used for anything beyond payments. In other words, there is already hardware present to help deliver the result Apple is seeking— being in a position where the company can not break into a device after the fact. But it sits on the sidelines. Will that change?

In fairness, Apple is not alone in under-utilizing the eSE. When this blogger worked on Google Wallet, general-purpose security applications of eSE were an obvious next step after mobile payments. For example, the original implementation of disk encryption in Android was susceptible to brute-force attacks. It used a key directly derived from a user-chosen PIN/password for encrypting the disk. (It did not help that the same PIN/password would be used for unlocking the screen all the time, all but guaranteeing that it had to be short.) Using eSE to verify the PIN and output a random key would greatly improve security, in the same way using TPM with PIN check improves the security of disk encryption compared to relying on user-chosen password directly. But entrenched opposition from wireless carriers meant Android could not count on access to the eSE on any given device, much less a baseline of applications present on eSE. (Applications can be pre-installed or burnt into the ROM mask at the factory, but that would have involved waiting for a new generation of hardware to reach market.) In the end Google abandoned the secure element, settling instead for the much weaker TrustZone-backed solution for general purpose cryptography.

Future-proofing software updates: Global Platform and lessons from FBiOS (part II)

[continued from part I]

Locked-down environments

Smart-cards as application platforms differ from ordinary consumer devices in one crucial way: they are locked down against the “user.” Unlike a PC or smart-phone which derives its value from its owner’s freedom to choose from a large ecosystem of applications, cards are optimized for security; the ability to run specific applications selected by the issuer with highest assurance level. While the operating system of modern cards is powerful enough to support multiple applications at once (and even have them running at the same time, although not in traditional multi-tasking fashion) it is not up to the user to decide which applications are installed.

Global Platform in a nutshell

Global Platform formalizes that notion by defining an access control model around card-management operations such as loading, running and uninstalling applications. It also defines a family of authentication protocols for appropriate entities to assert those privileges. At a high-level GP relies on the notion of security domains. Issuer security- domain or ISD is the most privileged one. In earlier versions of Global Platform only the issuer could install applications. Later iterations generalized this to allow for delegating such privileges to supplementary security-domains or SSDs.

In practice that means installing new apps requires authenticating to ISD, which in turn involves knowledge of a unique secret-key for each card. (As a historic note, when Global Platform was being developed, the types of hardware found in smart-cards were too anemic for public-key cryptography. This is why the standard is primarily based on symmetric-key primitives and often outdated ones at that, such as triple-DES. Recent updates to GP have been introducing public-key based variants to core protocols.) ISD keys or colloquially “card-manager keys” are tightly controlled by the organization distributing the hardware, or outsourced to a third-party specializing in card management at scale. With the exception of development samples, it is difficult to find hardware shipped with “default” keys known to the end-user.

This state of affairs can help simplify the security story; it’s difficult to subvert isolation between applications when you can’t install a malicious app to attack others in the first place. But it also complicates and distorts market dynamics around deployment. Google Wallet is an instructive example. The so called “embedded secure element” on Android devices happens to be GP-compliant smart-card hardware, permanently attached to the phone. It inspired an ugly skirmish between Google and wireless carriers over control of the issuer role. In fact GP has been struggling for years to expand this inflexible model and bootstrap a rich developer ecosystem where multiple vendors can coexist to offer applications on cards controlled by a different party.

But for our purposes, there is a more interesting security property of GP management model: even the all-powerful issuer role is greatly constrained in what it can do.

There is no “root” here

Here is a short list of what does not exist in Global Platform:

Reading out contents of card storage. All data associated with card applications is stored in permanent storage, which is traditionally EEPROM or flash-based in more recent devices. GP defines a specific set of structured information that can be retrieved from the card: list of applications installed, card unique ID etc. But there is no command to retrieve chunk of data from a specific EEPROM offset. There is not even a command to retrieve the executable code for an application after installation.
Making arbitrary modifications to card storage. Again GP defines specific structured operations that modify card contents (install new application, create a security domain, modify keys…) there is no provision in the protocol for writing arbitrary data at a specific offset.
Similar provisions apply to RAM. There is no equivalent to UNIX /dev/mem or /dev/kmem for freely reading and writing memory.
There are no debugging facilities for getting information about the internal state of card applications, much less “attaching a debugger” in the conventional sense to a running application to control its execution.

Future-proofing against malicious updates

These may be viewed as limitations on platform functionality, but looked another way they constitute an interesting defense against the “rogue issuer” threat. Suppose we have a card with a general-purpose encryption application already provisioned. This app is responsible for generating cryptographic keys within the secure card environment and leveraging them to unlock private user information (such as an encrypted disk) conditional on the user entering a correct PIN. Let’s posit that the card-issuer started out “honest:” they installed a legitimate copy of the application on the card when they first handed it over to the customer.

Later this issuer experiences a change of heart. Perhaps compelled by law enforcement or due to a change of business model, they now seek to undermine the security provided by the application and extract those secret keys without knowing the user PIN. If the platform in question is Global Platform compliant, they would be out-of-luck going through the standard mechanisms. There is no mechanism in Global Platform to scrape data associated with an application. Nor can the issuer selectively tamper with application logic, for example skipping the PIN check or sneaking in a new code path that causes the application to cough up secret keys. Whether or not the issuer is cooperating, they would have to find another avenue such as exploiting some weakness in the tamper-resistance of the hardware- expensive and time-consuming attack that have to be repeated for each device, as opposed to software attacks which scale with minimal effort to any number of devices.

There is no “update” either

A common question that comes up when threat-modelling rogue issuers is the question of software updates. That was the avenue FBI pursued with Apple: to update the operating system with a subverted version that allows unbounded number of incorrect PIN entries. Even if we grant the premise that GP does not allow the issuer to reach into the application state to read secrets or tamper with the code, why can’t issuer simply replace the legitimate application with a back-doored version designed to leak secrets or otherwise misbehave? Because Global Platform has no concept of “updating” an application. One can delete an application instance and launch a new one under the same ID, but that first step erases all data associated with that instance is also removed. For the chip & PIN example above, that means all of the private information associated with that credit-card is gone. It is not possible to replace only the code while retaining data.

That is why updating card applications in the field is rare, aside from the logistical challenges of delivering updates customized to each card from ISD. System that require “upgrading” in the conventional sense need to plan that out in advance by using a split design. Typically the functionality is split into two pieces: a very-small application holds secrets, cryptographic keys and internal state, while a much larger application contains business logic for leveraging that state. These two pieces communicate using a suitable IPC mechanism exposed by the environment (for example, JavaCard defines sharable-interface objects.) The second half can be “updated” by removing and reinstalling a new version, because it does not contain any state while the first half is not affected. Still any replacement is still bound by the same interface agreement between them. If the interface allows the business-logic app to request a message to be decrypted, the replacement app can invoke the same functionality. But it will not magically gain a new capability, such as asking the other application to spit out encryption keys.

That said, there are limitations and edge-cases where having issuer privileges do grant an advantage in attacking a preexisting application, although far from guaranteeing success. These will be the topic of the next post.

[Continued: caveats & conclusion]

Future-proofing software updates: Global Platform and lessons from FBiOS (part I)

It is no secret that software-update mechanisms controlled by vendors are also back-doors for delivering malicious payloads into a system. The recent FBiOS controversy raised awareness of this inconvenient fact beyond the security community, but it is far from being the first example when a service was effectively coerced into delivering malware to one of its own users. By far the most extreme and cavalier example of “software updating” is of course on the web: each time a user visits a website, the vendor has the option of delivering brand-new application code with JavaScript, Flash or any of the myriad other proprietary programming environments supported by web browsers. Hushmail was a “secure email” provider who learned this lesson the hard-way.

This raises the question of whether it is possible to design systems such that even the manufacturer can not deliver malicious updates after the fact. “After the fact” being the operative keyword- of course they always have the option of delivering a pre-compromised machine. So a more realistic line to hold is making sure if that a vendor that started out honest can not later changes its mind, regardless of reason. It could be that the vendor itself has gone rogue- the way SourceForge started injecting spyware into binaries after a “change of business model.” Perhaps the HR department hired a corrupt insider- more than a few developers were caught sneaking in Bitcoin mining software into their company applications. Or it could be the situation Apple faces: under a legal order to provide access to specific user data. Varied as the motivations are, from a security perspective they are equivalent. The question is properly framed as one of capability- can the vendor do this?- rather than one of intention (“we promise we would never do this to our users!”), political inclinations or creativity of legal department in pushing back against subpoenas.

From a technology perspective, this problem is non-trivial. Most systems have a “God mode” as part of their security design which gives full control over the system. This role is exempt from the usual security checks that the system diligently applies to all actions. The answer to the access-control question “is this person allowed to read/write/modify this piece of data?” is an unqualified yes for that role. Unix has root, Windows has administrator. Over time changes to the operating system tried to limit the capabilities of these roles. For example 64-bit versions of Windows prevent even admin account from running arbitrary code in kernel-mode by requiring signed drivers. But admins with physical access can still override such restrictions. Meanwhile the introduction of software-update mechanisms introduced yet another cook into the kitchen: the operating system vendor. Taking the example of Windows, MSFT can remotely update operating system components including the kernel itself on modern versions of the OS. That was not always the case and users can still opt-out, but the history of Windows Update shows a very clear progression: what started out as a convenience feature for the minority of users who cared to pull updates morphed into a powerful large-scale distribution channel for pushing updates to everyone by default. Since MSFT can now silently update Windows with arbitrary code of its choice, it effectively has administrator access to all machines running recent versions of that OS. (Note that code-signing has no effect on this capability, although it creates a deterrent effect. Updates have to be signed but MSFT is capable of signing a malicious binary just as much as it is capable of signing a legitimate OS update intended for public consumption. But that signature provides compelling evidence of culpability if the system is later examined forensically.) In short, while modern OS designs attempt to tame old-school “root” account in the name of least-privilege, they have introduced an even more powerful role with remote access. The situation is worse on mobile devices. Android does not give the user root access by default. You have to “root” your device, the equivalent of jail-breaking an iPhone, for earning that capability. Google on the other hand, retains the ability to push updates to the operating system running at root privilege. Power dynamics have been inverted: all-powerful remote entity, highly constrained local user.

To be clear, this notion of an anything-goes account exempt from usual access-control restrictions is very useful. Being able to tweak every knob and update every last component in the system is essential for improving functionality over time. Otherwise bugs could not be fixed and one would have to purchase brand new PC each time they ran into a critical bug deep in the operating system code itself. A platform shipped in permanently “fused” state stuck with its initial software and no ability to deliver future enhancements is a non-starter.* But when the OS itself is responsible for enforcing aspects of security policy— such as who gets to read data residing on an iPhone— unchecked update capability translates into an exemption from previously defined security restrictions.

So is there a middle-ground? The ideal design would allow delivering new functionality over time (so users are not stuck with the hardware as they purchased it on day one) minus the ability to use the update channel for subverting previously defined security policies. It’s easy to craft theoretical designs, but it is more instructive to look at deployed systems. It turns out that an architecture originally intended for managing smart-cards has exactly this characteristic. More surprisingly, the latest iPhone and some Android devices already include a separate piece of hardware called the embedded secure element which obeys that architecture. It’s called Global Platform.

[continued in part II]

* Interestingly that describes the state of many Android devices, not for lack of an auto-update mechanism which certainly exists in Android, but inability/unwillingness of wireless carriers to leverage that channel for delivering updates.

Getting by without passwords: encrypted email

[Part of a series on getting by without passwords]

The final post in this series takes up the problem of securing email traffic without relying on the security of user-chosen passwords. There are two aspects to protecting email:

Confidentiality: guaranteeing that only the intended recipients can read the contents of a message (Even when the message itself is transiting through untrusted networks, as email delivery is based on store-and-forward paradigm.)
Authenticity: for the recipient to be certain that an incoming message indeed originated with the purported sender.

Public-key cryptography is well suited to serving both of these objectives. At a high-level authenticity is provided by the sender digitally signing the message with their private-key, while confidentiality is assured by encrypting the message using the public-key of each recipient. But the devil is in the details, and over the years many protocols/formats/standards have emerged around how exactly these operations are done, differing both in cosmetic ways- how the bits are laid out- and fundamental assumptions around key management. To keep the discussion tractable, we focus on two of the more popular formats in widespread use: S/MIME and PGP.

S/MIME and PGP

“The nice thing about standards is that you have so many to choose from.” — Andrew Tannenbaum

Not surprisingly there is more than one way to secure email, and each one has taken hold in different niche markets. PGP came first, S/MIME is the more enterprise-oriented format. Chronologically it arrived after PGP and was first codified by RFC 2633, building on formats. The differences between PGP and S/MIME can be grouped into two categories:

Superficial/cosmetic: PGP defines its own home-brew format for messages and keys. S/MIME uses the cryptographic message syntax (CMS) format for messages and X509 for carrying keys, which is in turn built on ASN1.
Philosophical/trust model: This is a more significant difference in approach to establishing trust in keys. Recall that sending an encrypted email to Alice requires having her public-key ahead of time. That boils down to the basic questions: where do you look up Alice’s public-key? When someone presents a key that purportedly belongs to Alice, how do you go about verifying that?
S/MIME assumes a public-key infrastructure or PKI mediated by trusted third-parties. Everyone receives digital certificates from these parties and can verify other credentials by reference to the same. PGP relies on a more grass-roots web of trust, with users exchanging keys in person or leveraging social networks by trusting existing contacts to vouch for each other.

Easy case: S/MIME

Because S/MIME is associated with enterprise/managed IT scenarios- think Windows shop running an Exchange server in-house and users accessing their email via Outlook clients- there is already plenty of precedent for using smart-cards to protect email. Outlook itself has supported S/MIME for 15+ years and Windows cryptography architecture abstracts away key-management from applications. As far as the application is concerned, same code paths are executed to sign/decrypt when the private key resides on local-disk or when that key lives on a smart-card that must be invoked to perform the operation itself. The operating system and underlying layers take care of the difference: for example cryptographic hardware can appear/disappear based on user actions of connecting/disconnecting it, some cards may require a PIN to authorize user actions etc.

Hard case: PGP

To this day PGP decidedly remains an “enthusiast technology” (to put it politely) voluntarily used by individuals instead of mandated by enterprise IT settings. Because use of cryptographic hardware by consumers is exceedingly rare, PGP keys are almost always managed in software and migrated to new devices via awkward import/export mechanisms. But there was an attempt dating back to 2004 at standardizing support for hardware tokens with the OpenPGP card specification.

It sounds promising: while S/MIME is silent on low-level key management interface or exact behavior of cryptographic hardware to manager user keys, OpenPGP Card attempts to prescribe in exacting detail exactly how such devices shall operate and what features they must support.

In reality, that turns out to be a bad idea.

How to limit future options: pick a card-edge

With OpenPGP Card, the system has committed to low-level details of a card. It is not just expressing baseline requirements (such as “must support RSA up to 2048-bit keys”) or describing a high-level interface for invoking functionality on the card. It prescribes low-level details about how the host and card communicate over the PC/SC interface as well as exact organization of data stored on card. Why is that too restrictive? For starters, let’s start with the concept of “card.” While smart-cards and most USB tokens indeed go through PC/SC (after all it stands for “Personal-Computer-to-Smart-Card”) there are other types of cryptographic hardware such as HSMs or even TPM which are not “cards” as far as the operating-system is concerned.**

Even restricting our attention to cards, standardizing on the low-level interface to the card creates unnecessary incompatibility over superficial implementation details. By focusing on low-level wire protocol, GPG has limited itself to working with exactly one card application. Yet many standards have been introduced over the years for card-based applications: eID for electronic ID in the EU, CAC and later PIV for government employees in the US, GIDS for enterprise authentication, to name a handful. Many of them contain a superset of functionality required for PGP: generate key-pairs on board, perform private-key operations, store public-keys, use PIN for authorization etc. They exceed those requirements and improve on baseline by supporting multiple key-sets and elliptic-curve algorithms not originally envisioned by OpenPGP. With the exception of GIDS, they are also far more popular compared to PGP cards, produced in higher-volume with multiple suppliers benefiting from economies of scales. Yet none of these are usable for PGP directly, because of cosmetic differences in how their functionality is packaged into low-level commands.

Compare this to how SSH authentication integrates cryptographic hardware. There is no such thing as “OpenSSH Card.” Instead the implementation uses an existing abstraction layer for cryptographic hardware, namely PKCS #11. Any vendor that manufactures cryptographic hardware with the right features can write a PKCS11 module to package those features into a standard interface and compete in the market for hardware-based SSH authentication. In practice they don’t even need to write that module, because volunteers have already solved that problem: OpenSC ships a module with support for an impressive variety of cards. Another open-source packages SimpleTPM-PK11 allows using the TPM for managing keys. On the proprietary side, HSM vendors such as Safenet and Thales ship their own PKCS11 module written in-house to work with their specific model. As a result SSH authentication for both client and server side can take advantage of a variety of hardware offerings. To pick two examples: this blog covered using PIV tokens for SSH, others explored leveraging the TPM.

Restoring the abstraction layer

Luckily there is a solution that allows bringing that diversity of hardware to PGP: gnupg-pkcs11. This project is dated and by all appearances unmaintained— generally a red-flag increasing the chances of incompatibility caused by software rot. (Hosting source-code on Sourceforge is another red-flag considering the company was found to tamper with binaries. Luckily packages are available in upstream repos for major distributions such as Ubuntu.)

gnupg-pkcs11 replaces the GPG smart-card deemon with a new implementation that makes no assumptions about the “card” but instead calls into an existing PKCS #11 module for cryptographic functionality. Specifying this module as well as which keys on the card to pick (since a “token” in PKCS #11 model can have multiple keys) is done with configuration files. This part can be tricky, so here is a walk through of steps outlined in the man page.

Configuration

1. Replacing the smart-card daemon

First step is instructing GnuPG to use a different smart-card agent and specifying the application to use when prompting for a PIN. These lines appear in gpg-agent.conf, typically located in “.gnupg” folder under the user home directory.

scdaemon-program /usr/bin/gnupg-pkcs11-scd
pinentry-program /usr/bin/pinentry-qt4

2. Configuring the new daemon

Next step is configuring gnupg-pkcs11-scd itself. This is best done in two stages because the file contains friendly key identifiers which are not available at this point. Easiest way to discover them is by asking gpg-agent, which in turn invokes the smart-card daemon. A minimal configuration file gnupg-pkcs11-scd.conf pointing at the PKCS #11 module is sufficient to bootstrap that process:

# List of providers
providers p1

# Provider attributes
provider-p1-library /usr/lib/x86_64-linux-gnu/opensc-pkcs11.so

With that in place we can query the card for key identifiers:

$ echo "SCD LEARN" |\
gpg-agent --server gpg-connect-agent |\
grep KEY-FRIEDNLY
S KEY-FRIEDNLY EB1391B66C49F44859D1CF81BB97882E32154A2B /C=US/ST=CA/L=\
San Francisco/O=Widgets Inc./OU=Randomization/CN=Cem Paya/emailA\
ddress=notcem@nothere.net on PIV_II (PIV Card Holder pin)

(Note the misspelling “friednly” [sic] for friendly identifier.)

The output contains the distinguished name from each certificate discovered on the card, along with SHA1 hash of the public-key highlighted in red above. In this example there is just one certificate present, although a PIV card will typically contain up to four active certificates. Now SHA1 hashes can be used to indicate which keys are used to secure email by adding another section to gnupg-pkcs11-scd.conf:

emulate-openpgp
openpgp-sign EB1391B66C49F44859D1CF81BB97882E32154A2B
openpgp-encr EB1391B66C49F44859D1CF81BB97882E32154A2B
openpgp-auth EB1391B66C49F44859D1CF81BB97882E32154A2B

This example is cutting corners by using the same key for signing and encryption which is suboptimal. A more realistic mapping would pick the PIV non-repudiation slot for signatures and PIV key-management slot for encryption.

3. Registering the keys with OpenGPG

The final step is making GPG aware of keys on the card. This involves going through the card-edit menu and invoking the “generate” option. The command name is misleading because new keys will not be generated on the card. (Similar to “ssh-keygen -D” retrieving existing keys from a hardware token, not generating new keys or otherwise altering state of the token.) Instead the replacement daemon returns existing public-keys on the card specified by SHA1 hashes above. GPG then combines those keys with additional information prompted from the user such as email address, expiration and proper name. In effect this step synthesizes “PGP keys” with their own meta-data out of existing key-pairs present on the hardware token.

Caveat: While the man page recommends disabling opengpg emulation after this step, in the experience of this blogger gnu-pkcs11 only works successfully in emulation mode.

User-experience

Putting it all together, here is sample of UX for private key operations.

Linux

Suppose we try signing a message from the command line:

echo "Hello world" | gpg2 -as

Assuming the token is connected, a system-modal dialog appears:

Qt4 PIN collection prompt

If we instead specified pinentry-gtk-2 in the configuration file, that UI would look like:

GTK-2 PIN collection prompt

After entering PIN, the signature operation is performed by the hardware token. Remaining steps are identical to the flow for software keys- the signature is formatted by gpg and output to console.

OSX

Surprisingly gnupg-pkcs11-scd also runs on OSX. GPGTools is a user-friendly implementation of PGP (as judged by the low standards of open-source software) and provides keyboard shortcuts and context-menu for decrypting text highlighted in any application. For example, decrypting in a web browser can be initiated by selecting MacPGP from the Services menu:

Decrypt highlighted text in web mail

That brings up a PIN collection UI slightly more polished than the Linux versions:

OSX MacPGP PIN collection UI

Handling multiple devices

Existing keys on the card need to be registered with PGP only once. In order to use the hardware token with another device, it is easier to “export the private-key” to a file, copy the file over and import it into the key-ring on the new machine. Export in quotes because in the case of hardware keys, there is no secret information being exported. Despite GPG dutifully prompting for a passphrase to encrypt sensitive data, the only relevant information written to file is a reference to the card serial number and key identifier. Private-keys never leave the hardware token itself. The resulting export file is not sensitive in the way actual private-keys would be, making the migration process straightforward.

** While a TPM can be made to look like a smart-card (for example, using Windows virtual smart-cards) that involves an emulation layer, not sending raw APDUs to the TPM as if it were a CCID device.

Relics from the P2P file-sharing wars

This blogger was recently disconnected from a New York hotel network with a cryptic error message:

Nothing adds up here. p2pnetworking.exe? EXE extension indicates a Windows application—which is not the operating system running on the machine. Even more surprising was the reference to Kazaa (note the misspelling above,) which inspired a trip down memory lane.

The 2000s called; they want their file-sharing wars back.

After Napster

Kazaa was a popular peer-to-peer file sharing application which came to prominence as part of the second generation of P2P designs. It was Napster that kicked off the first-generation, putting MP3 and file-sharing in the limelight, exceeding 20M users at its peak. That success was short-lived. Napster became a lightning rod for litigation by a floundering recording industry, which found a convenient scapegoat for its declining revenue. (It would not be the first or last time that RIAA tilted at wind-mills.) Napster also had an intrinsic weakness: its design fell short of being truly 100% distributed. There was a point of centralization, a single-point-of-failure in Napster the company itself. While users downloaded files directly from each other, indexing and search of content was handled by servers operated by the corporate entity. That gave RIAA all leverage required in arguing that Napster can and should be held accountable for any infringing uses of its platform.

Future developers of file-sharing applications would not make that mistake again. First came Gnutella from the developers of WinAmp, which had been recently acquired by AOL. Its release lead to awkward moments for the parent company, courting a merger with the content-rich Time Warner at the time. AOL tried putting the genie back in the bottle, but it was too late. Along came many more: Kazaa, Morpheus, eDonkey. These optimized the network topology by taking into account that not every user has the same bandwidth available. Instead of treating all nodes equally in democratic spirit, they deliberately capitalized on powerful machines on high-bandwidth connections, promoting them to “supernodes” responsible for coordination.

Users embraced the new generation of P2P applications— at least until the apps morphed into delivery vehicles for dubious adware and spyware bundled with the installers in some semblance of a business model. Network administrators on the other hand hated P2P with a vengeance because of the massive bandwidth consumption. At first it was colleges that started banning Napster. They claimed the prerogative of fairly distributing network capacity, although RIAA nastygrams no doubt also played a part in scaring the typically tech-illiterate and clueless administrations into policing their own student body. Later the crusade was taken up by ISPs on a much larger scale, with Comcast getting in trouble with FTC after it was caught manipulating BitTorrent traffic.

False positives

That brings us to this mysterious error message from the hotel network.

Kazaa has folded a long time ago. Its official client only ran on Windows. P2P moved on to third-generation designs best exemplified by BitTorrent. But somehow the network monitoring software used by this hotel retained the vestigial traces of the great War On File Sharing from early 2000s. Lying dormant all this time were heuristics on the watch for dreaded P2P traffic, ready to banish those users from the network for their transgressions. (That ban lasted approximately 24 hours in this case.)

As with all unmaintained software, these rules can go haywire when the world changes. It turns out that in this case the omniscient network monitor misidentified Google Hangouts video-conferencing for P2P file sharing. A quick search shows the offending traffic pattern cited in the error message— UDP traffic to port 19305— is among the documented ports associated with Hangouts application. Similarly a reverse DNS lookup for the destination IP points to “1e100.net” domain, which is a Google domain as expected.

Protecting users from themselves?

By itself a false-positive in a network monitoring application is not exactly news. Such heuristics are notoriously unreliable and subject to errors. For starters a given protocol is often implemented by multiple pieces of software written independently of each other. Some may be open-source, others proprietary, yet others “adware” supported by bundled applications. If they are implementing the protocol faithfully, they will look very similar to an external observer watching bits on the network. Just because a machine is emitting network packets that looks like the one emitted by Kazaa does not mean it is Kazaa running on that machine.

Putting aside that conceptual problem, what makes this example truly egregious is the alarmist language in the message and vindictive approach taken by the hotel against would be file-sharers. What is being achieved by singling out P2P applications? “Online activity flagged as malicious.” Malicious towards whom? Certainly not the owner of the laptop. If the hotel has taken on the paternalistic role of protecting customers from dubious software on their own device, where do we draw the line? Should they also start performing deep-packet inspection or blocking known malicious websites?

For that matter, the response after detecting P2P activity is disproportionate. If the network management system can identify file-sharing traffic (it obviously can not, but let’s grant the premise) why not specifically block those connections? Why retaliate by keeping users completely off the network for an extended period of time? Even if we grant the premise that hotels somehow owe it to their guests to protect them from their own malware- which, let’s be clear, file sharing software is not– quarantining the user completely does not further that objective. Redirecting users to a web page that explains risks involved or offers assistance with removing the offending malware is a more constructive approach.

Searching for a motive

Because the standard it’s-for-your-own-good excuse does not hold up to scrutiny, we need to look deeper for motives. There are two usual suspects.

“Network management”

This is a euphemism for allocating scarce bandwidth fairly between users with competing interests. This was the putative excuses colleges/universities initially offered for blocking Napster: with everyone downloading music, the application was consuming significant chunk of campus bandwidth. It is also the original excuse ISPs resorted to when explaining their throttling of BitTorrent. P2P was particularly embarrassing for ISPs because it revealed one of the worst-kept secrets of Internet service in the US: while an ISP may advertise 10Mbps speeds to subscribers for a given price, it does not have anywhere near the capacity to provide that speed to all subscribers at the same time. As long as few are maxing out their connection, customers may indeed attain speeds near that advertised upper-bound. But if everyone gets busy downloading music at once, no one gets close to seeing anything close to the effective bandwidth they were mislead to believe they were paying for.

A cafe or hotel providing wireless access to guests faces a similar problem of apportioning its outbound bandwidth, and here there is even less expectation of guaranteed service. Internet access is an incidental amenity not unlike a swimming pool; unlike an ISP selling broadband Internet, it is not the main line of business for the establishment. There is something to be said about fair allocation of scarce bandwidth and preventing one person from consuming a disproportionate share.

But such limits are by definition content-agnostic: a 5MB file occupies exact same number of bits whether it is downloaded off P2P network or streamed from iTunes. There is no justification for singling out a specific protocol, much less one specific application such as Kazaa for exclusion.

Contributory infringement

Avoiding legal headaches related to copyright infringement may be another motivation to block file-sharing. This blogger is not a lawyer and will refrain from commenting on the creative theories of liability required to hold a hotel responsible for what guests download on the network.

Relics

There is another possibility of course: this “feature” of the hotel network is an ancient relic of a different era. In that less-enlightened time, organizations and businesses were cowed by RIAA/MPAA lawsuits into policing the activity of their own users, forking over for ineffective and blunt network management technologies to demonstrate their commitment to the higher cause of copyright enforcement. Times may have changed. Yet those “features” built into the network keep cropping up in unexpected ways.

Scaling Bitcoin vs keeping-up-with-Visa

Continuing on theme of the previous post on Bitcoin scaling challenges, there is an implicit assumption driving the scaling controversy that requires a closer look. It is an article of faith that Bitcoin must scale by orders of magnitude process more transactions, and do so quickly. The benchmark for evaluating increased blocksize proposals (or features that effectively result in same outcome by shuffling bits around, such as segregated witness) remains one dimensional: the throughput of the network measured in transactions processed per second or TPS.

Given current limitations that metric peaks at a theoretical maximum of 7TPS although the jury is out on whether that upper bound can be attained in practice. This is often contrasted against the corresponding numbers for a standard payment network such as Visa. Statements published by Visa— attributed to a decidedly ancient 2010 IBM test— put those figures at average of 150M per day with a maximum peak rate of 24000TPS.

Before questioning this “keeping-up-with-the-card-networks” mantra, it’s worth noting the paucity of attention paid to other statistics such as confirmation time. Imagine trying to buy coffee with Bitcoin. To be on the safe side, the merchant must wait for the transaction to appear on the Blockchain. Not just on the latest block freshly minted, but a few blocks deeper to rule out the possibility of being reversed by a rare reorganization event. Average 10 minute interval for mining blocks makes for an unpleasant retail experience. There are nascent proposals such as Lightning Network layered on top of the core Bitcoin protocol (as opposed to being modifications to the protocol) designed to facilitate near real-time settlement. They are far from seeing any traction. Meanwhile the 10 minute interval remains an inviolable constraint in the protocol.

Granted, this may not matter when buying coffee. Merchants can manage the risk from operating with zero-confirmation as long as the potential loss from the sale of that item is low-enough and the proportion of honest customers in the population high enough. It is similar to the risk-management decision Starbucks already makes when they swipe card at checkout without demanding a signature from the card holder. That is a good idea. It keeps the line moving and customers caffeinated. Once in a while there is a stolen credit-card used or some customer disputes the charge. Starbucks is left holding the bag because they have no proof. A signed receipt could provide insurance by shifting liability to the issuer but those extra 20 seconds would translate into lost sales on a busy morning. Now multiply the dollar amounts involved by 100x, and waiting for the network to confirm the transaction starts to make more sense, just as merchants get more picky about receipts and checking ID for big-ticket purchases. (Not to mention that replace-by-fee proposal will make it much easier to double-spend unconfirmed transactions.)

Keeping up with the Visas

The narrative animating Bitcoin XT envisions a world with ubiquitous Bitcoin usage. Not only do consumers worldwide purchase their coffee at the corner store in cryptocurrency instead of swiping/inserting/tapping their credit-card, but they get to pay the rent, stream music online, access journalism behind a paywall, settle that debt to a friend who paid for dinner and wire money overseas to the cousin traveling in Timbuktu. In this expansive vision, it’s not only Visa and PayPal who are on the way to extinction. Check clearing and its automated cousin ACH, Federal Reserve Wire Network, the international SWIFT system, peer-to-peer payments such as Venmo, proprietary alternatives such as PayPal, Western Union etc. all face competition from Bitcoin eating into their market share. Naturally that one-network-to-rule-them-all must boast the combined capacity of all of the “deprecated” systems it replaces.

In case such a formidable list of incumbents being obsoleted at once has not already cast some doubts on the feasibility of this vision, let’s dive deeper into one scenario: replacing card networks. We can ask what incentives would drive mass-adoption of Bitcoin as an alternative to using payment cards. This can quickly become a complicated comparison with multiple categories: PIN-debit vs credit vs charge-card vs prepaid, retail vs online payments etc. Here we pick one representative case that today accounts for the bulk of such transactions by volume.

Case study: how (not) to disrupt in-store payments

There are multiple participants involved in a transaction when a consumer buys a TV from a big-box retail store using their credit card: cardholder, merchant, payment network, issuing bank, not to mention all of the invisible middlemen behind the scenes facilitating that transaction such as the payment processor or the vendors who manufactured the point-of-sale terminal. [Quick refresher] In order to switch over to using Bitcoins, one or more parties must see a compelling benefit.

It’s certainly not the card network or the issuer. They would be out of the picture, disintermediated by an open system where anyone can pay anyone else without going through gate-keepers for access to the network. Despite initial forays into exploring crypto-currency, they have little to gain from watching their consumer credit business evaporate overnight. Issuing banks are already earning money hand over fist from interchange-fees and interest on balances carried by consumers.

Could it be the consumer? The privacy afforded by Bitcoin is not exactly much of a help when walking into a store dotted with surveillance cameras, or for that matter using a smart-phone that requires an Internet connection to broadcast the bitcoin transaction. It’s even worse when considering the economics. One can only spend existing funds with Bitcoin; a credit card allows deferring payments, to handle fluctuations in cash flow by carrying a balance. Worse, consumers receive an effective discount (albeit small one) from credit-card transactions. In the face of increasing competition, issuers are increasingly splitting their profits with customers in the form of airline miles, fungible points exchangeable for gifts or straight cash-back rewards. For a card averaging 1% rewards, paying for a $1000 TV in BTC amounts to leaving $10 on the table.

Merchant perspective

Merchants are the prime candidate to agitate for bitcoin. Frustrated by high card-processing fees, perennially on the lookout for cheaper alternatives (so much that several large US retailers banded together to create their own) they are seemingly the ideal cheerleaders for cryptocurrency. The economic pressure is particularly strong for segments with low profit margins: grocery-chains and large department stores often boast net profits in the single-digit percents. Forking over 2% of the gross sale for the privilege of accepting Visa/MC/Amex starts to sting. Irreversibility of Bitcoin transactions is another selling point. There is no longer the dreaded problems of charge-backs caused by fraudulent transactions. (That’s a bug from the consumer perspective: it is a safety feature to be able to call someone and complain when a card is stolen, and not be on the hook for unauthorized charges— or at least maintain that illusion of zero liability. But merchants greatly appreciate not having to worry about a sale getting reversed.)

Yet in-store payments are exactly the scenario where adoption is challenging because of the long settlement time, coupled with 0-confirmation being unsustainable long term. Let’s assume merchants will eat the cost of upgrading their point-of-sale equipment to accept Bitcoin. Realistically, they were far from enthusiastic about chip-and-PIN or NFC adoption, until card networks threatened a liability-shift pitting them against issuers, but in this case the lure of drastically lower processing fees could alter the equation. It would still require widespread adoption of Lightning or equivalent solution for instant settlement before that investment pays off. This would be much less of an issue for online purchases: given the additional lag involved in digging up the item out of a warehouse, packaging and shipping, an extra half-hour wait is noise. (But even that may change with the rise of same-day delivery and other services compressing the timeline from placing the order to a package showing up.) To wit, success stories of using Bitcoin at Sears/CVS/Home Depot turn out to be enabled by an intermediary exchanging Bitcoins for gift-cards, which are then processed by existing channels at the point-of-sale. Those merchants are not directly accepting Bitcoin; they do not have a Bitcoin address or operate a node on the network.**

Realistically, even highly motivated merchants can not unilaterally force consumers to adopt a new payment system. They can certainly express strong preferences towards minimizing transaction processing costs. They can nudge consumers in that direction, for example by setting minimum amounts for using cards. They can try passing on card fees to consumers or equivalently, offering discounts for cash- although that turns out to be so fraught with contractual issues that few venture there. They can lobby for structural changes behind the scenes: the Durbin amendment made it cheaper to process debit transactions. Some have even gotten into the open-loop card business by partnering with issuers, following the if-you-can’t-beat-them-join-them mantra. (That Costco-branded American Express card, which recently blew up in a very public spate between the two companies, let Costco share in revenue generated from cardholders while also reducing its own processing costs.) But at the end of the day, merchants must adopt to the existing payments ecosystem. As long as they conclude card-acceptance increases revenue by driving sales, they have an incentive to offer that option. In order for Bitcoin to become a viable alternative, there must exist a market segment of customers ready to spend Bitcoin and unable/unwilling to transact any other way. That critical mass does not exist at the moment, and it is far from clear that a demographic can emerge that simultaneously embraces Bitcoin while shunning mobile wallets such as Apple Pay or Android Pay, forcing merchants hand in having to cater to that preference.

Capitalizing on the unusual fee structure

The preceding issues reflect less on inherent flaws in Bitcoin than the current framing of scaling debate. The push for rapidly expanding block-size is predicated on the assumption that limited network capacity is holding back Bitcoin from achieving its manifest destiny of disrupting consumer payments. Build it and they will come, that argument goes. But the real question is not whether Bitcoin can compete with credit-cards in raw capacity, but whether it can compete on economical terms. For that matter, whether it needs to compete in order to be considered successful in its own right. Observe that wire-transfers move billions of dollars everyday, but no one has penned an article announcing the demise of FedWire due to its inability to pay for coffee down the street. Payment systems have strengths and weaknesses. Even in its infancy, there are many applications where Bitcoin already outshines other alternative.

One theme uniting those scenario is fast and low-cost movement of funds. Writing a check does not incur any fees for either sender or recipient, but it is slow and requires physical shipment of pieces of paper. Wire transfers are real-time but come with hefty fees for individuals. Peer-to-peer systems such as Venmo and Google Wallet can move funds instantly at no-cost within the system, but they suffer from the problem of getting money into & out of the system; hefty credit-card fees are passed on to the consumer.

A crucial special case of low-friction P2P transfers are cross-border remittances. If wiring money from one bank to another in the US sounds like an expensive proposition, imagine being a migrant worker trying to send money overseas to family. WorldBank tracks global remittances, and recently estimated an average ~7.5% fee globally. That estimate hides the true range of figures: trying to send $200 from Germany to China would incur a whopping 17% in fees. It’s not even a function of geography- France and Algeria are close but the average fee is 15%. Bitcoin vastly undercuts these expensive channels, translating into billions of dollars saved.

A core developer recently raised the alarm about Bitcoin development increasingly focusing on settlement scenarios instead of direct payments. There is a good reason for that bias: it plays to existing strengths of the design. Bitcoin is one of the most efficient ways of moving large sums around. Fees are charged per byte of the transaction. That is a very unusual metric because it is independent of the value moved. Under this regime a transaction moving a million dollars may pay less in fees than one moving a few dollars. By contrast most systems charge fees ad valorem. For credit cards those rates can hover around 2-3%. Even the Durbin amendment capping fees on PIN-debit set those limits as fixed cost plus 0.05% of the amount, retaining the proportionality feature. Bitcoin is unique in decoupling fees from the utility of the transaction as measured in the amount changing hands. It’s as if a bank decided to charge for check-cashing not based on the face value of that check, but on the thickness of paper used.

All of these applications still require Bitcoin to continue scaling up its throughput. Key difference is that the throughput targets are much lower than required for playing catch-up to Visa/MasterCard/Amex. Nor do these changes need to happen on an aggressive schedule, playing a game-of-chicken with hard forks. These markets are emerging gradually, with adoption driven organically because Bitcoin is a compelling improvement over existing alternatives. No one will give up on Bitcoin because their transfer did not settle faster than Western Union or charged a few more Satoshis in network fees.

** Ironically the intermediary eGiftCards itself uses yet another intermediary (Coinbase) for processing the Bitcoin transaction, and never directly taking on the risk of double-spending or currency volatility of BTC/USD. So much for direct peer-to-peer payments.

Observations on Bitcoin’s scaling challenge

Much ink has been spilled in recent months about Bitcoin’s scaling problems, specifically around the impact of increasing blocksize to allow more transactions to be processed in each batch. Quick recap: Bitcoin processes transactions by including them in a “block” which gets tacked on to a public ledger, called the blockchain. Miners compete to find blocks by solving a computational puzzle, incentivized by a supply of new Bitcoins minted in that block as their reward. Net “throughput” of Bitcoin as system for moving funds is determined by the frequency of block mining and number of transactions, or “TX” for short, appearing in each block. The difficulty level is periodically adjusted such that blocks are found on average every 10 minutes. (That is a statistical average, not an iron-clad rule. A lucky miner could come up with after a few seconds. Alternatively all miners could get collectively unlucky and require a lot more time.) In other words the protocol adopts to find an equilibrium: as hardware speeds improve or miners ramp up their investments, finding a block becomes more difficult to bring average time back to 10 minutes. Similarly if miners reduce their activity because of increased costs, block difficulty would adjust downward and become easier.

“640K ought to be enough for everybody”

(In fairness, Bill Gates never said that about MS-DOS.)

Curiously block-size has been fixed for some time at 1 megabyte. There are no provisions in the protocol for increasing this dynamically. That stands in sharp contrast to many other attributes that are set to change on a fixed schedule (amount of Bitcoin rewarded for mining a block decreases over time) or adjust automatically in response to current network conditions, such as the block difficulty. There is no provision for growing blocks as the limit is approached— the current situation.

What is the effect of that limitation in terms of funds movement? Good news is that space restrictions have no bearing on on amount of funds moved. A transaction moving a billion dollars need not consume any more space than one moving a few cents. But it does limit the number of independent transactions that can be cleared in each batch. Alice can still send Bob a million dollars, but if hundreds of people like her wanted to send a few dollars to hundreds of people like Bob, they would be competing against each other for inclusion in upcoming blocks. Theoretical calculations suggest a throughput of roughly 7 TX per second, although later arguments cast doubt on the feasibility of achieving that. The notion of “throughput” is further complicated by the fact that a Bitcoin transaction does not move funds from just one point to another. Each TX can have multiple sources and destinations, moving the combined sum of funds in those sources in any proportion to the destinations. That is a double-edged sword. Paying multiple unrelated people in a single TX is more efficient than creating multiple TX for each destination. On the downside, there is inefficiency introduced by scrounging for multiple inputs from past transactions to create the source. Still adjusting for these factors does not appreciably alter the capacity estimate.

The decentralization argument

Historically the 1MB limit was introduced as a defense against denial-of-service attacks, to guard against a malicious node flooding the network with very large blocks that other nodes can not keep up with. Decentralized trust relies on each node in the network independently validating all incoming blocks and deciding for themselves if that block has been properly mined. “Independently” being the operative keyword- if they were taking some other node’s word that the block is legitimate, that would not add any trust into the system. Instead it would effectively concentrate power, granting the other node extra influence over how others view the state of Bitcoin ledger. Now if some miner on a fast network connection creates giant blocks, other miners on slow connections may take a very long time to receive and validate it. As a result they fall behind and find themselves unable to mine new blocks. They are effectively working from an outdated version of the ledger missing the “latest” block. All of their effort to mine the next block on top of this obsolete state will be wasted..

Arguments against increasing blocksize start from this perspective that larger blocks will render many nodes on the network incapable of keeping up, effectively increasing centralization. This holds true for miners- who end up with “orphaned blocks” when they mined a block based on outdated version of the ledger, having missed out on someone else’s discovery of latest block- but to some extent for ordinary “full-nodes” on the network. It’s the independent verification performed by all those of nodes that keeps Bitcoin honest in the absence of a centralized authority. When fewer and fewer nodes are paying attention to which blocks are mined, the argument goes, that distributed trust decreases.

Minimum system requirements: undefined

This logic may be sound but the problem is that Bitcoin core, the open-source software powering full-nodes, has never come with any type of MSR or minimum system requirements around what it takes to operate a node. It’s very common for large software products to define a baseline of hardware required for successful installation and use. This holds true for commercial software such as Windows- and in the old-days when shrink-wrap software actually came in shrink-wrapped packages, those requirements were prominently displayed on the packaging to alert potential buyers. But it also holds true for open-source distributions such as Ubuntu and specialized applications like Adobe Photoshop.

That brings us to the first ambiguity plaguing this debate: no clear requirements have been spelled out around what it takes to “operate a full Bitcoin node” or for that matter to be a miner, which presumably has even more stringent requirements. No reasonable person would expect to run ray-tracing on their 2010-vintage smartphone, so why would they be entitled to running a full Bitcoin node on a device with limited capabilities? This has been pointed out by other critiques:

“Without an MVP-specification and node characterization, there is nothing to stop us from torquing the protocol to support wristwatches on the Sahara.”

Perhaps in a nod to privacy, bitcoind does not have any remote instrumentation to collect statistics from nodes and upload it to a centralized place for aggregation. So there is little information on how much CPU power the “average” node can harness, or how many GBs of RAM or disk-space it has, much less any operational data on how close bitcoind is to exhausting those limits today. Nor has there been a serious attempt to quantify these in realistic settings:

There would also be a related BIP describing the basic requirements for a full node in terms of RAM, CPU processing, storage and network upload bandwidth, based on experiments — not simulations — […] This would help determine quantitatively how many nodes could propagate information rapidly enough to maintain Bitcoin’s decentralized global consensus at a given block size.

Known unknowns

In the absence of MSR criteria or telemetry data, anecdotal evidence and intuition rules the day when hypothesizing which resource may become a bottleneck when blocksize is increased. This is akin to trying to optimize code without a profiler, going by gut instinct on which sections might be the hot-spots that merit attention. But we can at least speculate on how resource requirements scale, both in “average” scenario under ordinary load as well as “worst-case” scenarios induced by deliberately malicious behavior trying to exhaust the capacity of Bitcoin network. This turns out to be instructive not only for getting a perspective on current squirmish over whether to go with 2/4/8 MB blocks, but for revealing some highly suboptimal design choices made by Satoshi that will remain problematic going forward.

Processing

Each full-node verifies incoming blocks, which involves checking off several criteria including:

Miner solved the computational puzzle correctly
Each transaction appearing in this block is syntactically valid- in other words, it conforms to the Bitcoin rules around how the TX structure is formatted
Transactions are authorized by the entity who owns those funds- typically this involves validating one or more cryptographic signatures
No transaction is trying to double-spend funds that have already been spent

Blocksize debate brought renewed attention on #3, and core team has done significant work on improving ECDSA performance over secp256k1. (An earlier post provides some reasons why ECDSA is not ideal from a cost perspective in Bitcoin.) Other costs such as hashing were considered so negligible that scaling section of the wiki could boldly assert:

“…as long as Bitcoin nodes are allowed to max out at least 4 cores of the machines they run on, we will not run out of CPU capacity for signature checking unless Bitcoin is handling 100 times as much traffic as PayPal.”

Turns out there is a design quirk/flaw in Bitcoin signatures ignored by this rosy picture. The entire transaction must be hashed and verified independently for each of its inputs. A transaction with 100 inputs will be hashed 100 times (with a few bytes different each time, precluding reuse of previous results, although initial prefixes shared) and subjected to ECDSA signature verification the same number of times. In algorithmic terms, cost of verifying a transaction has a quadratic O(N²) dependency on input count. Sure enough the pathalogical TX created during the flooding of the network last summer had exactly this pattern: one giant TX boasting over 5500 inputs and just 1 output. Such quadratic behavior is inherently not scalable. Doubling maximum block-size leads to 4x increase in the worst-case scenario.

There are different ways to address this problem. Placing a hard-limit on the number of inputs is one heavy-handed way solution. It’s unlikely to fly because it would require a hard-fork. Segregated witness offers some hope by not requiring a different serialization of the transaction for each input. But one can still force the pathological behavior, as long as Bitcoin allows a signature mode where only the current input (and all outputs) are signed. That was intended for constructing TX in a distributed fashion, where the destination of funds is fixed but sources are not. Multiple people can chip in to add some of their own funds into the same single transaction, along the lines of fundraising drive for charity. An alternative is to discourage such activity with economic incentives. Currently fees charged for transactions are based on simplistic measures such as size in bytes. Accurately reflecting the cost of verifying a TX on the originator of that TX would introduce a market-based solution to discourage such activity. (That said defining a better metric is tricky. From another perspective, consuming a large number of inputs is desirable, because it consumes unspent outputs which otherwise have to be kept around in memory/disk.)

One final note: most transactions appearing in a block have already been verified. That’s because Bitcoin uses a peer-to-peer communication system to distribute all transactions around the network. Long before a block containing the TX appear, that TX would have been broadcast, verified and placed into the mem-pool. (Under the covers, the implementation caches the result of signature validation to avoid doing it again.) In other words, CPU load is not a sudden spike occurring when blocks materializes out of thin air; it is spread out over time as TX arrive. As long as a block contains few “surprises” in terms of TX never encountered before, bulk of the signature verification has already been paid. This is useful property for scaling: it removes pressure to operate in real-time. Relevant metric isn’t the cost of verifying a block from scratch, but how well the node is keeping up with sustained volume of TX broadcast over time. It might also improve parallelization, by distributing CPU intensive work across multiple cores if new TX are arriving evenly from different peers, handled by different threads.

Storage

Nodes also have to store the blockchain and look up information about past transactions when trying to verify a new one. (Recall that each input to a transaction is a reference to an output from some previous TX.) As of this writing current size of the Blockchain is around 55GB. Strictly speaking, only unspent outputs need to be retained. Those already consumed by a later TX can not appear again. That allows for some pruning. But individual nodes have little control over how much churn there is in the system. If most users decide to hoard Bitcoins and not use them for any payments, most outputs remain active. In practice one worries about not just raw bytes as measured by Bitcoin protocol, but the overhead of throwing that data into a structure database for easy access. That DB will introduce additional overhead beyond the raw size of the blockchain.

Regardless, larger blocks only have a very slow effect on storage requirements. It’s already given that space devoted to the ledger must increase over time as new blocks are mined. Doubling blocksize only leads to faster rate of increase over time, not a sudden doubling of existing usage. It could mean some users will have to add disk capacity sooner than they had planned. But disk space had to be added sooner or later. Of all the factors potentially affected by blocksize increase, this is least likely to be the bottleneck that causes an otherwise viable full-node to drop off the network.

Whether 55GB is already a significant burden or might become one under various proposals depends on the hardware in question. Even 100GB is peanuts for ordinary desktop/server-class hardware that typically feature multiple terabytes of storage. On the other hand it’s out of the question for embedded-devices and IoT scenarios. Likewise most smartphones and even low-end tablets with solid-state disks are probably out of the running. Does that matter? The answer goes back to the larger question of missing MSR, which in turn is a proxy for lack of clarity around target audience.

Network bandwidth & latency

At first bandwidth does not appear all that different from storage, in that costs increase linearly. Blocks that are twice as large will take twice as long to transmit, resulting in an increased delay before the can recognize when a new one has been successfully mined. That could result in a few additional seconds of delay in processing. On the face of it, that does not sound too bad. Once again the scalability page paints a rosy picture with an estimated 8MB/s bandwidth to scale to Visa-like 2000 transactions per second (TPS):

This sort of bandwidth is already common for even residential connections today, and is certainly at the low end of what colocation providers would expect to provide you with.

If the prospect of going to 2000TPS from the status quo of 7TPS is no-sweat, why all this hand-wringing over a mere doubling?

Miner exceptionalism

This is where miners as a group appear to get special dispensation. There is an assumption that many are stuck on relatively slow connections, which is almost paradoxical. These groups command millions of dollars in custom mining hardware and earn thousands of dollars from each block mined. Yet they are doomed to connect to the Internet with dial-up modems, unable to afford a better ISP. This strange state of affairs is sometimes justified by two excuses:

Countries where miners are located. Given heavy concentration of mining power in China, there is indeed both high-latency as measured from US/EU and relatively low bandwidth available overall for communicating with the outside world, not helped by the Great Firewall.
Economic considerations around specific locations within a country favorable to miners. Because Bitcoin mining is highly power-intensive, it is attractive to locate facilities close to sources of cheap power, such as dams. That ends up being middle of nowhere, without the benefit of fast network connections. (Except that holds for data-centers in general. Google, Amazon and MSFT also place data-centers in the middle of nowhere but still manage to maintain fast network connections to the rest of the world.)

There is no denying that delays in receiving a block are very costly for miners. If a new block is discovered but some miner operating in the dessert with bad connectivity has not received it, they will be wasting cycles trying to mine on an outdated branch. Their objective is to reset their search as soon as possible, to start mining on top of the latest block. Every extra second delay in receiving or validating a block increases the probability of either wasting time on futile search, or worse, actually finding a competing block that creates a temporary fork that will be resolved with one side or other losing all of their work when the longest chain wins out.

Network connections are also the least actionable of all these resources. One can take unilateral action to scale up (buy better machine with more/faster CPUs), scale out (add servers to a data-center), add disk space or add memory. These actions do not need to be coordinated with anyone. But network pipes are part of the infrastructure of the region and often controlled by telcos or governments, neither responsive or agile. There are few options- such as satellite based internet, which is still high-latency and not competitive with fiber- that an individual entity can take to upgrade their connectivity.

Miners’ dilemma

In fact as many pointed out, increased block-size is doubly detrimental to miners’ economic interests: blocks that are filled to capacity lead to an increase in mining fees as users compete to get their transaction into that scarce space. Remove that scarcity and provide lots of spare room for growth, and that competitive pressure on fees goes away. That may not matter much at the moment. Transaction fees are noise compared to what miners earn from the so-called coinbase transaction, the prize in newly-minted Bitcoins for having found that block.

Those rewards diminish overtime and eventually disappear entirely once all 21 million Bitcoins have been created. At that point fees become the only direct incentive to continue mining. (Participants who have significant Bitcoin assets may still want to subsidize mining at a loss, on the grounds that a competitive market in mining-power provides decentralization and protects those assets.) Those worried about a collapse in mining economics argue fees need to rise over time to compensate and that such increases are healthy. Keeping block-size artificially constrained helps that objective in the short run. But looked another way, it may be counter-productive. Miners holding on to their Bitcoin stand to gain a lot more from increase in the value of Bitcoin relative to fiat currencies such as the US dollar. If scaling Bitcoin helps enable new payment scenarios or brings increased demand for purchasing BTC with fiat, that will benefit miners.

Stakeholder priorities

The preceding discussion may shed light on why miners are very sensitive to even small increases in latency and bandwidth. It still does not answer the question of whether that is a legitimate reason to hold off on block-size increases. Miners are one of the constituents, certainly one of the more prominent, and perhaps justifiably wielding outsized influence on the protocol. They are not the only stakeholders. Lest we forget:

People running full nodes
Merchants accepting Bitcoin as a form of payment
Companies operating commercial services dependent on the Blockchain
Companies vying to extend Bitcoin or layer new features
Current and future end-users, relying on the network for payments and money transfer

That last group dwarfs all of the others in sheer numbers. They are not actively participating in keeping up the network by mining or even verifying blocks; they simply want to use Bitcoin or hold it as an investment vehicle, often under custody of a third-party service. These stakeholders have different incentives, requirements and expectations. Sometimes those are aligned, other times they are in conflict. Sometimes incentives may change over time or vary based on particular circumstances: miners with significant holdings of BTC on their balance sheet would happily forgo higher transaction fees today, if scaling Bitcoin could drive usage and cause their BTC assets to gain against fiat currencies.

That brings us to the governance question: which constituency should Bitcoin core optimize for? Can feature development be held hostage by the requirements of one group or another? W3C has a clear design principle ranking its constituencies on the totem pole:

“Consider users over authors over implementors over specifiers over theoretical purity”

What the XT debacle and more recent saga of a rage-quitting core developer suggest is that Bitcoin project needs to articulate its own set of priorities and abide by them when making decisions.

Getting by without passwords: LUKS with smart-cards (part II)

[continued from part I]

Unlocking the encrypted partition

During the boot sequence, the system will prompt for the smart-card PIN as part of the execution of pkcs15-crypt command:

After providing the correct PIN, the card uses the private-key stored on board to decrypt the LUKS key. That key is returned to LUKS which unlocks the volume and permits the boot process to continue.

One tricky aspect handled by the key-script is managing terminal windows via openvt. Because the partition will be unlocked during boot and OpenSC has to prompt users for smart-card PIN to perform private key operations, that process must have access to console. This can be difficult to guarantee when I/O is being redirected as part of the boot sequence. A cleaner approach is to invoke decryption on a different TTY and switch the display over. This also makes the PIN collection slightly more legible by separating it from the clutter of messages related to boot process and appearing on its own window as above.

Weak links and redundancy

Strictly speaking, the sequence of steps detailed earlier adds an option to unlock an encrypted disk using a smart-card. But much like enabling smart-card login via pam-pkcs11, it does not require using the card. As long as the original LUKS key-slot is still present, using a hardware token is at best a convenience option: connecting a token and typing a short PIN maybe preferable to typing in a complex passphrase. But if the user had picked a “weak” passphrase for that, disk encryption is still subject to offline grinding attacks. In that sense, user-chosen passwords remain the weakest link in that setup.

Realizing the security benefit requires going one step further to remove the original keyslot using via cryptsetup luksRemoveKey. After that point, the only way to unlock the volume is decrypting the random secret.

Doing so however a new set of failure modes. What if the user forgets their PIN? Or loses their card? Even assuming perfect user behavior, what if the card itself malfunctions? There are even legitimate scenarios where the private key on the card may have to be replaced. For example when the certificate expires, issuing a new one may call for generating a new key-pair as well. (In theory PIV standard accommodates this by allowing the card to retain “archived” version of the key-management key. Unlike authentication or signature keys, this is one case where simply replacing an expired credentials is not enough. The old key is still required to decrypt previously encrypted ciphertext.)

This has several solutions, since LUKS permits multiple key-slots.

In the simplest case, the user could create a new key-slot with a random passphrase that is printed on hard-copy and tucked away offline.
For more high-tech solutions, one can repeat the process above using additional public-keys, creating multiple encryptions of the same random secret. (The corresponding private-keys may or may not reside on hardware tokens; that is an implementation detail.) Corresponding ciphertexts could be stored alongside the primary, to be used in case of a hardware problem with the original card.
A slight tweak involves creating additional key-slots containing random secrets, with each one encrypted to a different public-key. By contrast option #2 involves a single key-slot, with the same secret encrypted to multiple public-keys. The latter approach is slightly less flexible for revocation: if one wanted to remove one of the backups (for example, because the corresponding private-key is assumed compromised) it requires deleting all copies of the ciphertext which reside outside the encrypted volume, as opposed the removing the key-slot that is part of the volume.

Getting by without PIV

One final note about the functionality required of the card here. As with the examples for local login and SSH authentication, we have used a PIV card or more specifically a PIV token in USB form-factor such as GoldKey or PIVKey. But there is nothing in the above examples that assume a specific card-edge. As long as the middleware can recognize the card— and OpenSC boasts an extensive list of supported hardware— the exact same steps apply. Alternatively one can switch to using pkcs11-tool for encryption/decryption, in conjunction with a PKCS#11 module for the hardware to work with a broader range of hardware including HSMs for example.

Second-guessing Satoshi: ECDSA vs RSA? (part III)

The previous post discussed the potential for blackbox kleptographic implementations of ECDSA to leak private-keys by manipulating random nonces. What about other cryptographic algorithms? Would offline Bitcoin wallets be less susceptible to similar attacks if Satoshi had made different design choices?

As a natural alternative we can look at RSA, the first widely deployed public-key cryptosystem. RSA signatures also involve applying a padding scheme to the raw message hash before feeding it into the trapdoor. Some of these schemes are probabilistic, meaning there is some arbitrariness in the way that the original input is extended from size of the hash function (say 32 bytes for SHA256) to the full size of RSA modulus (2048 bits or higher by contemporary standards) Such freedom in choosing those additional bits create the same problem as ECDSA. But RSA can also be used in conjunction with deterministic padding. One message, exactly one valid signature. While that eliminates a very specific threat involving covert channels from blackbox implementations, it raises other questions. How would RSA stack-up against ECDSA along other aspects- security, performance, efficiency?

Theoretical security: a paucity of provable results

At a very fundamental level, there is no clear winner. Reflecting our general state of ignorance about the complexity of core cryptographic primitives, neither cryptosystem has a formal proof of security. It is widely believed that the difficulty of RSA rests on factoring while that of ECDSA/ECDH rests on the discrete logarithm problem. But even those seemingly obvious facts have eluded proof as of this writing, if we define “proof” to mean a specific type of reduction: showing that an effective attack capable of breaking RSA (ECDSA, respectively) leads to an equally effectively algorithm for factoring (computing discrete logs.) It’s clear that efficient factoring would break RSA and efficient algorithms for computing discrete logs would break elliptic-curve cryptosystems. But it is the converse that matters. Proving or refuting that conjecture remains an open problem: whether there exists a way to decrypt messages or forge signatures without solving either of those problems. In fairness, few cryptosystems have been formally reduced to an existing problem. Rabin cryptosystem is provably equivalent to factoring. A few others rarely used in practice such as McEliece are based on an NP-complete underlying problem, but that does not imply that attacking a random instance is NP-hard.

Both elliptic-curve and RSA signatures vary in security level based on parameters, and there are various guidelines for relating the difficulty of breaking one to another. RSA is simpler because there are relatively few knobs and parameters to tweak when generating keys. Key-size is the main variable, with 2048-bits being the consensus recommendation today. In the past there have been varying opinions about choosing primes of special structure, on the grounds that most efficient factoring algorithms known at the time fared better when order of the group was “smooth” eg P-1 and Q-1 have many small divisors. But the most efficient algorithm known today, the number-field sieve, does not rely on such special structure and operates equally fast (or equally slowly, depending on your perspective) against all RSA moduli. Public exponent is generally chosen to be a small number to allow quick public-key operations, and attempts to optimize performance based on choosing special private-key exponents faces security problems. That leaves modulus-size as the main tunable parameter for determining security of the RSA cryptosystem.

For elliptic curves, the story is more complicated. For starters there are many subtypes of curves: they can be over prime fields or binary fields. Their parameters can be generated according to different criteria. Unlike choosing primes, picking a curve is both tricky and complex. So much that end-users do not generate their own curve, instead pick from a menu of options that are supported by their software. (“openssl ecparam -list_curves” for a glimpse at the options on your system.) This is where competition for curves starts to look like a market for used-cars, with each group pushing the virtues of their own curve. With many options come the possibility of some spectacularly bad ones. Some have been inadvertent, with advances in the field revealing that certain classes of curves were weaker than expected. Others may even be deliberate, clouded by allegations that curve parameters have been “cooked” by a conspiracy— with NSA usually playing the role of the supervillain in these stories— to have a very specific weakness exploitable by the powers-that-be.

Further aggravating matters, Bitcoin lacks cryptographic agility— it has committed to a single curve with no opportunity to increase hardness along this dimension short of a hard-fork. In any case, there is no linear scale as with RSA, where modulus size could be increased gradually one machine word at a time. Bumping up security requires a discontinuity, switching to a different curve with different parameters.

Implementation matters

Cryptographic algorithms also fail because incidental weaknesses in an implementation implementing that has nothing to do with the underlying mathematics. How does each choice fare along this dimension? As noted earlier, ECDSA is critically dependent on a source of randomness for each signature operation- and not just during key generation. Even a partial failure of the RNG during signing is fatal. RSA does not have this problem, particularly when used in conjunction with a deterministic padding mode.

Both algorithms are subject to side-channel attacks: in a naive implementation, the sequence of steps taken during the computation has direct relationship to private-key bits. For example the code may take one code path versus another depending on whether a key-bit is 0 or 1, and these different code-paths could have very different profiles in terms of time required for completion, memory references or even aggregate power consumption. Hardened implementations try to obscure these differences against external observation. For elliptic curves it can be more difficult to do this. To take one example, in RSA a source of side-channel leaks is that squaring operation is often distinguishable from side-multiply because it can be implemented faster, for example using the recursive Karatsuba technique or simply noting that only half as many multiplications are required because the same pair of operands appears twice. But this optimization is optional: an implementation can choose to forgo faster squaring and fall back to doing it the hard way.** For elliptic curves, there is no such option. The basic group operation, point addition, is inherently different when adding a point to itself as opposed to adding two different points. That difference can’t be swept aside by falling back to a less efficient version of addition. More subtle techniques such as the Montgomery ladder are required to avoid key-dependent behavior.

Efficiency

At some level there is an irony to worrying about the performance of Bitcoin—an extremely inefficient system predicated on wasting massive amounts of CPU cycles, and consequently energy, to operate without any centralized trust. But suspending disbelief and assuming that speed matters, RSA appears to have a surprising advantage for transaction signing. Surprising, because in general speed is touted as one of the advantages for elliptic-curve cryptography. That part still holds true, and one can get ballpark estimates from (highly outdated) Crypo++ benchmarks or running openssl speed tests locally:

                              sign    verify    sign/s verify/s
256 bit ecdsa (nistp256)   0.0000s   0.0001s  26840.5  11228.1

                  sign    verify    sign/s verify/s
rsa 2048 bits 0.000554s 0.000025s   1805.1  39505.7

That’s an order of magnitude difference. The problem for ECDSA is that both signing and verification are relatively expensive. RSA on the other hand has a very asymmetric performance profile: private-key operations (decryption and signing) are 20x slower than public-key operations (encryption and verification, respectively.) Bitcoin transactions are signed once by the originator but verified thousands if not millions of times by full-nodes validating the ledger. In that usage mode, RSA gets the nod for lowering total cost across the network over the lifetime of the transaction.

Looking at key-generation on the other hand, ECDSA gains a slight edge. RSA keys have mathematical structure as the product of two large primes; not any old number will do. Key generation involves a relatively costly search for primes by testing candidates repeatedly until primes are discovered. For ECDSA any positive integer less than the order of the generator is a valid private-key. Randomly choosing a value in this range is sufficient and very quick; so much that bulk of the time in ECDSA key-generation is spent computing the public-key. That property also makes it easier to implement deterministic key-derivation schemes such as BIP32, where multiple keys are derived from a single root secret. Because any old integer in a particular range is a valid private key, it is easy to map outputs from key-derivation functions to ECDSA private keys in one-to-one manner. While RSA key generation can be made deterministic based on the starting point for prime search, it does not have a similar uniqueness guarantee.

A different measure of efficiency is space: how many bytes are consumed by the signature. Shorter is better and arguably this is more important than speed for Bitcoin. Signature size is a major determinant of transaction size. Given that each block is capped and current 1MB limit is already becoming a limitation, any increase in TX size further constrains how many transactions can be mined per generation. On that dimension ECDSA looks a lot better. The size of an ECDSA signature is twice that of the underlying curve. For secp256k1 that comes out to 64 bytes. (Somewhat puzzling is that Bitcoin sticks with traditional ASN1 encoding of signature, which introduces up to 7 bytes or 10% of additional overhead compared to plain layout.) At first glance, even that inefficient encoding looks like major savings compared to the alternative of using 2048-bit RSA signatures, which represents a whopping 300% increase.

Surprisingly some of the overhead for RSA can be optimized away by merging it into the signature. RSA supports message recovery, where part or entirety of the message is squeezed into the signature block itself. For example with 2048b modulus and SHA256 as the underlying hash function, over 220 bytes of plaintext message can be safely recovered using PSS-R padding. Surprisingly the net overhead than could be lower than ECDSA depending on amount of data merged into the signature. Using a data-point from December 2014 that average Bitcoin transaction size is ~250 bytes, we can estimate ~190 bytes of preimage without the ECDSA signature. That is less than the maximum recoverable message which means that the entire TX would become a single RSA signature, without any appreciable bump in space used.

Unfortunately one detail in Bitcoin signing greatly limits the applicability of message recovery optimizations: every input into a transaction must be signed individually. This holds true even if the inputs are using the same address and in theory a single signature could prove control over all of them. For transactions containing multiple inputs, the hypothetical RSA signature overhead would apply for each input. That means one may run out of “message” to squeeze into the signature, unless we complicate matters further by starting to squeeze signatures into each other. In that case part of the “message” recovered from an RSA signature could be a chunk of a different RSA signature. Such cascades break one of the assumptions about TX construction: inputs can be added independently of each other. (There is a even a signature type that allows anyone to add inputs.) If input signatures incorporate fragments of other inputs, parsing becomes tricky.

One last point on space efficiency of existing Bitcoin signatures: ECDSA is far from being state of the art in optimizing signature size. It has been known for some time that pairing-based schemes enable shorter signatures. For example the Boneh–Lynn–Shacham scheme from 2001 provides a security level comparable to 1024-bit RSA using as few as 20 bytes. But pairing based cryptography remains relatively esoteric with limited implementation support and a potential minefield of intellectual property claims, making it an unlikely candidate for Bitcoin.

Verdict on ECDSA for Bitcoin

In retrospect then ECDSA looks like a reasonable middle-ground for the requirements Satoshi faced. It produces relatively compact signatures compared to RSA without requiring elaborate message-recovery tricks, allows rapid key-generation to promote better privacy for funds and has plenty of implementation support. (Modulo the choice of secp256k1; most of the initial focus on elliptic-curve support has been in response to NIST Suite-B curves. secp256k1 is not among them. For example Windows native cryptography API still does not support it.) On the downside, it has a suboptimal distribution of load between signing/verification, inflating the global cost of verifying blocks.

** To be clear, this only removes the timing leak. Other side-channels exist to still permit distinguishing squaring from multiplication by a different operand. A side-multiply will typically reference different memory regions than multiplying a number by itself. Observations exploiting local channels such as CPU caches as can still infer such key-dependent memory references.