Use and misuse of code-signing (part I)

Or, there is no “evil bit” in X509

A recent paper from CCS2015 highlights the incidence of digitally signed PUP— potentially unwanted programs: malicious applications that harm users, spying on them, stealing private information or otherwise acting against the interests of the user. While malware is dime-a-dozen and occurence of malware digitally-signed with valid certificates is not new either, this is one of the first systemic studies of how malware authors operate when it comes to code signing. But before evaluating the premise of the paper, let’s step back and revisit the background on code-signing in general and MSFT Authenticode in particular.

ActiveX: actively courting trouble

Rewinding the calendar back to the mid-90s: the web is still in its infancy and browsers highly primitive in their capabilities compared to native applications. These are the “Dark Ages” before AJAX, HTML5 and similar modern standards which make web applications competitive with their native counterparts. Meanwhile JavaScript itself is still new and awfully slow. Sun Microsystems introduced Java applets as an alternative client-side programming model to augment web pages. Ever paranoid, MSFT responds in the standard MSFT way: by retreating to the familiar ground of Windows and trying to bridge the gap from good-old Win32 programming to this this scary, novel web platform. ActiveX controls were the solution the company seized on in hopes of continuing the hegemony of the Win32 API. Developers would not have to learn any new tricks. They would write native C/C++ applications using COM and invoking native Windows API as before—conveniently guaranteeing that it could only run on Windows— but they could now deliver that code over the web, embedded into web pages. And if the customers visiting those web pages were running a different operating system such as Linux or running on different hardware such as DEC Alpha? Tough luck.

Code identity as proxy for trust

Putting aside the sheer Redmond-centric nature of this vision, there is one minor problem: unlike the JavaScript interpreter, these ActiveX controls execute native code with full access to operating system APIs. They are not confined by an artificial sandbox. That creates plenty of room to wreak havoc with machine: read files, delete data, interfere with the functioning of other applications. Even for pre-Trustworthy-Computing MSFT with nary a care in the world for security, that was an untenable situation: if any webpage you visit could take over your PC, surfing the web becomes a dangerous game.

There are different ways to solve this problem, such as constraining the power of these applications delivered over the web. (That is exactly what JavaScript and Java aim for with a sandbox.) Code-signing was the solution MSFT pushed: code retains full privileges but it must carry some proof of its origin. That would allow consumers to make an informed decision about whether to trust the application, based on the reputation of the publisher. Clearly there is nothing unique to ActiveX controls about code-signing. The same idea applies to ordinary Windows applications sure enough was extended to cover them. Before there were centralized “App Stores” and “App Markets” for purchasing applications, it was common for software to be downloaded straight from the web-page of the publisher or even a third-party distributor website aggregating applications. The exact same problems of trust arises here: how can consumers decide whether some application is trustworthy? The MSFT approach translates that into a different question: is the author of this application trustworthy?

Returning to the paper, the researchers make a valuable contribution in demonstrating that revocation is not quite working as expected. But the argument is undermined by a flawed model. (Let’s chalk up the minor errors to lack of fact-checking or failure to read specs: for example asserting that Authenticode was introduced in Windows 2000 when it predates that, or stating that only SHA1 is supported when MSFT signtool has supported SHA256 for some time.) There are two major conceptual flaws in this argument:
First one is misunderstanding the meaning of revocation, at least as defined by PKIX standards. More fundamentally, there is a misunderstanding what code-signing and identity of the publisher represent, and the limits of what can be accomplished by revocation.

Revocation and time-stamping

The first case of confusion is misunderstanding how revocation dates are used: the authors have “discovered” that malware signed and timestamped continues to validate even after the certificate has been revoked. To which the proper response is: no kidding, that is the whole point of time-stamping; it allows signatures to survive expiration or revocation of the digital certificate associated with that signature. This behavior is 100% by design and makes sense for intended scenarios.

Consider expiration. Suppose Acme Inc obtains a digital certificate valid for exactly one year, say the calendar year 2015. Acme then uses this certificate so sign some applications published on various websites. Fast forward to 2016, and a consumer has downloaded this application and attempts to validate its pedigree. The certificate itself has expired. Without time-stamping, that would be a problem because there is no way to know whether the application was signed when the certificate was still valid. With time-stamping, there is a third-party asserting that the signature happened while the certificate was still valid. (Emphasis on third-party; it is not the publisher providing the timestamp because they have an incentive to backdate signatures.)

Likewise the semantics of revocation involve a point-in-time change in trust status. All usages of the key afterwards are considered void; usage before that time is still acceptable. That moment is intended to capture the transition point when assertions made in the certificate are no longer true. Recall that X509 digital certificates encode statements made by the CA about an entity, such as “public key 0x1234… belongs to the organization Acme Inc which is headquartered in New York, USA.” While competent CAs are responsible for verifying the validity of these facts prior to issuance, not even the most diligent CA can escape the fact that their validity can change afterwards. For example the private-key can be compromised and uploaded to Pastebin, implying that it is no longer under sole possession of Acme. Or the company could change its name and move its business registration to Timbuktu, a location different than the state and country specified in the original certificate. Going back to the above example of the Acme certificate valid in 2015: suppose that half-way through the calendar year Acme private-key is compromised. Clearly signatures produced after that date can not be reliably attributed to Acme: it could be Acme or it could be the miscreants that stole the private-key. On the other hand signatures made before, as determined by third-party trusted timestamp should not be affected by events that occurred later.**

In some scenarios this distinction between before/after is moot. If an email message was encrypted using the public-key found in an S/MIME certificate months ago, it is not possible for the sender to go back in time and recall the message now that the certificate is revoked. Likewise authentication happens in real-time and it is not possible to “undo” previous instances when a revoked certificate was accepted. Digital signatures on the other hand are unique: the trust status of a certificate is repeatedly evaluated at future dates when verifying a signature created in the past. Intuitively signatures created before revocation time should still be afforded full trust, while those created afterwards are considered bogus. Authenticode follows this intuition. Signatures time-stamped prior to the revocation instant continue to validate, while those produced afterwards (or lacking a time-stamp altogether) are considered invalid. The alternative does not scale: if all trust magically evaporates due to revocation, one would have to go back and re-create all signatures.

To the extent that there is a problem here, it is an operational error on the part of CAs in choosing the revocation time. When software publishers are caught red-handed signing malware and this behavior is reported to certificate authorities, it appears that CAs are setting revocation date to the time of the report, as opposed to all the way back to original issuance time of the certificate. That means signed malware still continues to validate successfully according to Authenticode policy, as long as the crooks remembered to timestamp their signatures. (Not exactly a high-bar for crooks, considering that Verisign and others also operate free, publicly accessible time-stamping services.) The paper recommends “hard-revocation” which is made-up terminology for setting revocation time all the way back to issuance time of the certificate, or more precisely the notBefore date. This is effectively saying some assertion made in the certificate was wrong to begin with and the CA should never have issued it in the first place. From a pragmatic stance, that will certainly have the intended effect of invalidating all signatures. No unsuspecting user will accidentally trust the application because of a valid signature. (Assuming of course that users are not overriding Authenticode warnings. Unlike the case of web-browser SSL indicators which have been studied extensively, there is comparatively little research on whether users pay attention to code-signing UI.) While that is an admirable goal, this ambitious project to combat malware by changing CAs behavior is predicated on misunderstanding of what code-signing and digital certificates stand for.

[continued]

** In practice this is complicated by the difficulty of determining precise time of key-compromise and typically involves conservatively estimating on the early side.

The problem with devops secret-management patterns

Popular configuration management systems such as Chef, Puppet and Ansible all have some variant of secret management solution included. These are the passwords, cryptographic keys and similar sensitive information that must be deployed to specific servers in a data-center environment while limiting access to other machines or even persons involved in operating the infrastructure.

Chef has encrypted data-bags
Ansible has vaults (Not to be confused with Hashicorp Vault, which is a service for distributing secrets)
Puppet has encrypted hiera

The first two share a fundamental design flaw. They store long-term secrets as files encrypted using a symmetric key. When it is time to add a new secret or modify an existing one, the engineer responsible for introducing the change will decrypt the file using that key, make changes and reencrypt using the key. Depending on design, decryption during an actual deployment can happen locally on the engineer machine (as in the case of Ansible) or remotely on the server where those secrets are intended to be used.

This model is broken for two closely related reasons.

Coupling read and write-access

Changing secrets also implies being able to view existing ones. In other words, in order to add a secret to an encrypted store or modify the existing one (in keeping with best-practices, secrets are rotated periodically right?) requires knowledge of the same passphrase that also allows viewing the current collection of those secrets.

Under normal circumstances, “read-only” access is considered less sensitive than “write” access. But when it comes to managing secrets, this wisdom is inverted: being able to steal a secret by reading a file is usually more dangerous than being able to clobber the contents of that file without learning what existed before.**

Scaling problems

Shared passphrases do not scale well in a team context. “Three can keep a secret, if two of them are dead” said Benjamin Franklin. Imagine a team of 10 site-reliability engineers in charge of managing secrets. The encrypted data-bag or vault can be checked into a version control system such as git. But the passphrase encrypting it must be managed out of band. (If the passphrase itself were available at the same location, it becomes a case of locking the door and putting the key under the doormat.) That means coordinating a secret to be shared among multiple people. That creates two problems:

The attack surface of secrets is increased. An attacker need only compromise 1 of those 10 individuals to unlock all the data.
It increases the complexity of revoking access. Consider what happens when an employee with access to decrypt this file leaves the company. It is not enough to generate a new passphrase to reencrypt the file. Under worst-case assumptions, the actual secrets contained in that file were visible to that employee and could have been copied. At least some of the most sensitive ones (such as authentication keys to third-party services) may have to be assumed compromised and require rotation.

Separating read and write access

There is a simple solution to these problems. Instead of using symmetric cryptography, secret files can be encrypted using public-key cryptography. Every engineer has a copy of the public-key and can edit the file (or fragments of the file depending on format, as long as secret payloads are encrypted) to add/remove secrets. But the corresponding private-key required to decrypt the secrets does not have to be distributed. In fact, since these secrets are intended for distribution to servers in a data-center, the private-key can reside fully in the operational environment. Anyone could add secrets by editing a file on their laptop; but the results are decrypted and made available to machines only in the data-center.

If it turns out that one of the employees editing the file had a compromised laptop, the attacker can only observe secrets specifically added by that person. Similarly if that person leaves the company, only specific secrets he/she added need to be considered for rotation. Because they never had access to decrypt the entire file, remaining secrets were not accessible.

Case-study: Puppet

An example of getting this model right is Puppet encrypted hiera. Its original PGP-based approach to storing encrypted data is now deprecated. But there is an alternative that works by encrypting individual fields in YAML files called hiera-eyaml. By default it uses public-key encryption in PKCS7 format as implemented by OpenSSL. Downside is that it suffers from low-level problems, such as lack of integrity check on ciphertexts and no binding between keys/values.

Improving Chef encrypted data-bags

An equivalent approach was implemented by a former colleague at Airbnb for Chef. Chef data-bags define a series of key/value pairs. Encryption is applied at the level of individual items, covering only the value payload. The original Chef design was a case of amateur hour: version 0 used the same initialization vector for all CBC ciphertexts. Later versions fixed that problem and added an integrity check. Switching to asymmetric encryption allows for a clean-slate. Since the entire toolchain for editing as well as decrypting data-bags have to be replaced, there is no requirement to follow existing choice of cryptographic primitives.

On the other hand, it is still useful to apply encryption independently for each value, as opposed to on the entire file. That allows for distributed edits: secrets can be added or modified by different people without being able to see other secrets. It’s worth pointing out that this can introduce additional problems because ciphertexts are not bound strongly to the key. For example, one can move values around, pasting an encryption key into a field meant to hold a password or API key, resulting in the secret being used for an unexpected scenario. (Puppet hiera-eyaml has the same problem.) These can be addressed by including the key-name and other meta-data in the construction of the ciphertext; a natural solution is to use those attributes as additional data in an authenticated-encryption mode such as AES-GCM.

Changes to the edit process

With symmetric encryption, edits were straightforward: all values are decrypted to create a plain data-bag written into a temporary file. This file can now be loaded in a favorite text editor, modified and saved. The updated contents are encrypted from scratch with the same key.

With asymmetric keys, this approach will not fly because existing values can not be decrypted. Instead the file is run through an intermediate parser to extract the structure, namely the sequence of keys defined, into a plain file which contains only blank values. Edits are made on this intermediate representation. New key/value pairs can be added, or new values can be specified for an existing key. These correspond to defining a new secret or updating an existing one, respectively. (Note that “update” in this context strictly means overwriting with a new secret; there is no mechanism to incrementally edit the previous secret in place.) After edits are complete, another utility merges the output with the original, encrypted data-bag. For keys that were not modified, existing ciphertexts are taken from the original data-bag. For new/updated keys, the plaintext values supplied during edit session are encrypted using the appropriate RSA public-key to create the new values.

Main benefit is that all of these steps can be executed by any engineer/SRE with commit access to the repository where these encrypted data-bags are maintained. Unlike the case of existing Chef data-bags, there is no secret-key to be shared with every team member who may someday need to update secrets.

A secondary benefit is that when files are updated, diff comparisons accurately reflect differences. In contrast, existing Chef symmetric encryption selects a random IV for each invocation. Simply decrypting and reencrypting a file with no changes will still result in every value being updated. (In principle the edit utility could have compensated for this by detecting changes and reusing ciphertexts when possible but Chef does not attempt that.) That means standards utilities for merging, resolving conflicts and cherry-picking work as before.

Changes to deployment

Since Chef does not natively grok the new format, deployment also requires an intermediate step to convert the asymmetrically-encrypted databags into plain databags. This step is performed on the final hop, on the server(s) where the encrypted data-bag is deployed and its contents are required. That is the only time when the RSA private key is used. It does not have to be available anywhere other than on the machines where the secrets themselves will be used.

** For completeness, it is possible under some circumstances to exploit write-access for disclosing secrets. For example, by surgically modifying parts of a cryptographic key and forcing the system perform operations with the corrupted key, one can learn information about the original (unaltered) secret. This class of attacks falls under the rubric differential fault analysis.

Multi-signature and correlated risks: the case of Bitfinex

BitFinex has not yet provided a detailed account of the recent security breach that resulted in loss of 120000 bitcoins from the exchange. Much of what has been written about the incident is incomplete, speculation or opportunistic marketing . Fingers have been pointed at seemingly everyone, including strangely the CFTC, for contributing to the problem. (While casting regulators as villains seems de rigueur in cryptocurrency land these days, that particular argument was debunked elsewhere.) Others have questioned whether there is a problem with the BitGo service or even intrinsic problem with the notion of multi-signature in Bitcoin.

Some of these questions can not be answered until more information about the incident is released, either by Bitfinex or perhaps by the perpetrators- it is not uncommon for boastful attackers to publish detailed information about their methods after a successful heist. But we can dispel some of the misconceptions about the effect of multi-signature on risk management.

Multi-signature background

In the Bitcoin protocol, control over funds is represented by knowledge of cryptographic secrets, also known as private keys. These secrets are used for digitally signing transactions that move funds from one address to another. For example, it could be moving funds from a consumer wallet to a merchant wallet, when paying fora cup of coffee in Bitcoin. Multi-signature is an extension to this model when more than 1 key is required to authorize the transaction. This is typically denoted by two numbers as “M-of-N.” That refers to a group of N total keys, of which any quorum of M is sufficient to sign the transaction.

Multisig achieves two seemingly incompatible objectives:

Better security. It is more difficult for an attacker to steal 3 secrets than it is to steal a single one. (More about that assumption below.)
Higher availability & resiliency. Theft is not the only risk associated with cryptographic secrets. There is also the far more mundane problem of plain losing the key due to hardware failures, forgotten credentials etc. As long as N > M, the system can tolerate complete destruction of some keys provided there are enough left around to constitute a quorum.

Multi-signature with third-party cosigners

Bitfinex did not have a cold-wallet in the traditional sense: storing Bitcoins on an offline system, one that is not accessible from the Internet even indirectly. Instead the exchange leveraged BitGo co-signing engine to hold customer funds in a multi-signature wallet. This scheme involves a 2-of-3 configuration with specific roles intended for each key:

One key is held by the “customer,” in this case Bitfinex (Not to be confused with end-users who are customers of Bitfinex)
Second key is held by BitGo
Third key is an offline “recovery” key held by Bitfinex. It is the escape-hatch to use for restoring unilateral control over funds, in case BitGo is unable/unwilling to participate

In order to withdraw funds out of Bitfinex, a transaction must be signed by at least two out of these three keys. Assuming the final key is truly kept offline under wraps and only invoked for emergencies, that means the standard process involves using the first two keys. Bitfinex has possession of the first the key** and can produce that signature locally. But getting the second one involves a call to the BitGo API. This call requires authentication. BitGo must only co-sign transactions originating from Bitfinex, which requires that it has a way to ascertain the identity of the caller. The parameters of this authentication protocol are defined by BitGo; it is outside the scope of blockchain or Bitcoin specifications.

Correlated vs independent risks

The assertion that multi-signature improves security was predicated on an assumption: it is more difficult for an attacker to gain control of multiple cryptographic keys compared to getting hold of one. But is that necessary?

Consider a gate with multiple padlocks on it. Did the addition of second or third padlock make it any harder to open this gate? The answer depends on many factors.

Are the locks keyed the same? If they are, there is still a “SPOF” or single-point-of-failure: getting hold of that key makes it as easy to open a gate with 10 locks as it is one protected by a single lock.
Are the locks the same model? Even if they have individual keys, if the locks are the same type, they may have a common systemic flow, such as being susceptible to the same lock-picking technique.

The general observation is that multiple locks improve security only against threats that are independent or uncorrelated. Less obvious, whether risks are uncorrelated is a function of threat model. To a casual thief armed with bolt-cutters, the second lock doubles the amount of effort required even if it were keyed identically: they have to work twice as hard to physically cut through two locks. But a more sophisticated attacker who plans on stealing that one key ahead of time, it makes no difference. Armed with the key, opening two locks is not substantially more difficult than opening one. Same holds true if the locks are different but both keys are kept under the doormat in front of the gate. Here again the probability of second lock being breached is highly correlated with the probability that first lock was breached.

When multi-signature does not help

Consider Bitcoin funds tied to a single private-key stored on a server. Would it help if we transferred those funds to a new Bitcoin address comprised of multisig configuration with 2 keys stored on the same server? Unlikely— virtually any attack that can compromise one key on that server is going to also get the second one with equal ease. That includes code execution vulnerabilities in the software running on the box, “evil-maid” attacks with physical access or malicious operators. The difficulty for some attacks might increase ever slightly: for example if there was a subtle side-channel leak, it may now require twice as much work to extract both keys. But in general, the cost of the attack does not double by virtue of having doubled the number of keys.

Now consider the same multi-signature arrangement, except the second private-key is stored in a different server, loaded with a different operating system and wallet software, located in a different data-center managed by a different group of engineers. In this scenario the cost of executing attacks has increased appreciably, because it is not possible for attacker to “reuse” resources between them. Breaking into a data-center operated by hosting provider X does not allow also breaking into one operated by company Y. Likewise finding a remote-code execution vulnerability in the first OS does not guarantee identical vulnerability in the second one. Granted the stars may align for the attacker and he/she may discover find a single weakness shared by both systems. But that is more difficult than breaking into a single server to recover 2 keys at once.

Multi-signature with co-signers

Assuming the above description of Bitfinex operation is correct, Bitfinex operational environment that runs the service must be in possession of two sets of credentials:

One of the three multi-signature keys; these are ECDSA private keys
Credentials for authenticating to the BitGo API

These need not reside on the same server. They may not even reside in the same data-center, as far as physical locations go. But logically there are machines within Bitfinex “live” system that hold these credentials. Why? Because users can ask to withdraw funds at any time and both pieces are required to make that happen. The fact that users can go to a web page, press a button and later receive funds indicates that there are servers somewhere within Bitfinex environment capable of wielding them.

Corollary: if an attack breaches that environment, the adversary will be in possession of both secrets. To the extent that correlated risks exist in this environment- for example, a centralized fleet management system such as Active Directory that grants access to all machines in a data-center- they reduce the value of having multiple keys.

Rate-limiting

BitGo API designers were aware of this limitation and attempted to compensate for it. Their API interface supports limits on funds movement. For example, it is possible to set a daily-limit in advance such that request to co-sign for amounts greater than this threshold will be rejected. Even if the customer systems were completely breached and both sets of credentials compromised, BitGo would cap losses in any given 24 hour period to that limit by refusing to sign additional Bitcoin transactions.

By all indications, such limits were in effect for Bitfinex. News reports indicate that the attack was able to work around them. Details are murky, but looking at BitGo API documentation offers some possible explanations. It is possible to remove policies by calling the same API, authenticating with the same credentials as one would use for ordinary transaction signing. So if the adversary breached Bitfinex systems and gained access to a valid authentication token for BitGo, that token would have been sufficient for lifting the spending limit.

This points to the tricky nature of API design and the counter-intuitive observation that somethings are best left not automated. Critical policies such as spending limits are modified very infrequently. It’s not clear that a programmatic API to remove policies is necessary. Requiring out-of-band authentication by calling/emailing BitGo to modify an existing policy would have introduced useful friction and sanity checks into the process. At a minimum, a different set of credentials could have been required for such privileged operations, compared to ordinary wallet actions. Now BitGo did have one mitigation available: if a wallet is “shared” among 2 or more administrators, lifting a spending limit requires approval by at least one other admin. The documentation (retrieved August 2016) recommends using that setup:

If a wallet carries a balance and there are more than two “admin” users associated with a Wallet, any policy change will require approval by another administrator before it will take effect (if there are no additional “admin” users, this will not be necessary). It is thus highly recommended to create wallets with at least 2 administrators by performing a wallet share. This way, policy can be effective even if a single user is compromised.

One possibility is Bitfinex only had a single administrator setup. Another possibility is a subtle problem in the wallet sharing API. For example, documentation notes that removing a share requires approval by at least one other admin- ruling out the possibility of going from 2 to 1 unilaterally. But if adding another admin was not subject to same restriction, one could execute a Sybil attack: create another share which appears to be another user but is in fact controlled by the attacker. This effectively grants adversary 2 shares, which is enough to subvert policy checks.

Multi-signature in perspective

Until more details are published about this incident, the source of the single point of failure remains unknown. BitGo has went on the record stating that its system has not been breached and its API has performed according to spec. Notwithstanding those assurances, Bitfinex has stopped relying on BitGo API for funds management and reverted to a traditional, offline cold-wallet system. Meanwhile pundits have jumped on the occasion to question the value proposition for multi-signature, in a complete about-face from 2014 when they were embracing multi-signature as the security silver bullet. This newfound skepticism may have a useful benefit: a closer scrutiny on key-management beyond counting shares and better understanding of correlated risks.

** For simplicity, we assume there is just one “unspent transaction output” or UTXO in the picture with a single set of keys. In general, funds will be stored across hundreds or thousands of UTXO, each with their own unique 2-of-3 key sets that are derived from a hierarchical deterministic (HD) key generation scheme such as BIP32.

Getting by without passwords: web-authentication (part II)

In the second part of this series on web authentication without passwords, we look at the Firefox approach for interfacing cryptographic hardware. Recall that Chrome follows the platform pattern (“when in Rome, do as the Romans”) and uses the “native” middleware for each OS: Crypto API for Windows, tokend on OSX and PKCS#11 on Linux. By contract Firefox opts for what Emerson would have called foolish consistency: it always uses PKCS#11, even on platforms such as Windows where that interface is not the norm. It also turns out to be much less user-friendly. By giving up on functionality already built into the OS for automatic detection and management of cryptographic hardware, it forces user to jump through new hoops to get their gadgets working with Firefox.

Specifically, before using a PIV card or token with Firefox, we first have to tell Firefox how to interface with those objects. That means actually pointing at a PKCS#11 module on disk by jumping through some hoops. First, open the hamburger-menu and choose preferences to bring up a new tab with various settings. Next navigate to where most users rarely venture, namely “Advanced” group, choose “Certificates” tab and click “Security Devices” button:

Firefox advanced settings, certificates tab

This brings up a terse list of recognized cryptographic hardware grouped by their associated module:

“Security Devices” view in Firefox

There are already a few modules loaded, but none of them are useful for the purpose of using a PIV card/token with Firefox. Time to bring in reinforcements from the veritable OpenSC project:

Loading a new PKCS#11 module

The module name is arbitrary (“OpenSC” is used here for simplicity) and the more important part is locating the file on disk. OSX file-picker dialog does not make it easy to search for shared libraries. Navigating directly to the directory containing the module, typically at /usr/local/lib or /usr/lib, and selecting opensc-pkcs11.so is the easiest option. (Already this is veering into user-unfriendly territory; getting this far requires significant knowledge on the part of users to locate shared libraries on disk.)

With OpenSC loaded, Firefox now displays another token slot present:

With OpenSC module loaded, PIV cards/token visible

Caveat emptor: This part of the codebase appears to be very unstable. “Log In” fails regardless of PIN presented, and simply removing a token can crash Firefox.

With tokens recognized, we can revisit the scenario from previous post. Start a simple TLS web-server emulated by openssl, which is configured to request optional client-authentication without any restrictions on acceptable CAs. (This example uses a self-signed certificate for the server and will require adding an exception to get past the initial error dialog.) Visiting the page in Firefox with a PIV card/token attached brings up this prompt:

Firefox certificate authentication prompt

When there are multiple certificates on the card, the drop-down allows switching between them. Compared to the clean UI in Chrome, the Firefox version is busy and dense with information drawn from the X509 certificate fields.

Choosing a certificate and clicking OK proceeds to the next step to collect the PIN for authenticating to the card:

Firefox smart-card PIN prompt

After entering the correct PIN we have our mutually authenticated connection set up to retrieve a web page from openssl running as server:

Firefox retrieving a page from OpenSSL web server, using client-authentication

Having walked through how web authentication without passwords works in two popular, cross-platform web browsers, the next post will look at arguments for/against deploying this approach.

[continued]

Bitcoin’s meta problem: governance (part II)

To explain why Bitcoin governance appears dysfunctional, it helps to explain why changing Bitcoin is very difficult to begin with- any group or organization tasked with this responsibility is facing very long odds. Comparisons to two others large-scale system are instructive.

Case study: the web

The world-wide web is a highly decentralized system with millions of servers providing content to billions of clients using HTML and HTTP. Software powering those clients and servers is highly diverse. MSFT, Mozilla and Google are duking out for web-browser market share (Apple, if one includes iPhone), while on the server side nginx and Apache are the top predators. Meanwhile there is a large collection of software for authoring and designing web-pages, with the expectation that they will be rendered by those web browsers. Mostly for historical reasons the ownership of standards is divided up between IETF for the transport protocol HTTP and W3C for the presentation layer of HTML. But neither organization has any influence over the market or power to compel participants in the system to implement its standards. IETF specifications even carry the tentative, modest designation of “RFC” or “request for comments.” There is no punishment for failing to render HTML properly (if there were, Internet Explorer would have been banished eons ago) or for that matter introducing proprietary extensions.

That seems like a recipe for getting stuck, with no way to nudge the entire ecosystem to adopt improved versions of the protocol, or prevent fragmentation where every vendor introduces their own slightly incompatible variant in search of competitive advantage. (As an afterthought, they might even ask the standards body to sanction these extensions in the next version of the “standard”) But in reality the web has shown a remarkable plasticity in evolving and adapting new functionality. From XmlHttpRequest which fueled the AJAX paradigm for designing responsive websites to security features like Content Security Policy or improved versions of TLS protocol, web browsers and servers continue to add new functionality. Two salient properties of the system help:

The system is highly tolerant of change and experimentation. Web browsers ignore unknown HTTP headers in the server response. Similarly they ignore unknown HTML tags and attempt to render the page as best as they can. If someone introduces a new tag or HTTP header that is recognized by only one browser, it will not break the rest of the web. Some UI element may be missing and some functionality may be reduced, but these are usually not critical failures. (Better yet, it is easy to target based on audience capability, serving one version of the page to new browsers and a different one with to legacy versions.) That means it is not necessary to get buy-in from every single user before rolling out a new browser feature. The feature may get traction as websites adopt it, putting competitive pressure on other browsers to support it and eventual standardization. That was the path for X-Frame-Options and similar security headers originally introduced by IE. Or it may crater and remain yet another cautionary tale about attempts to foist proprietary vendor crud on the rest of the web- as was the case with most other MSFT “extensions” to HTML including VBScript, ActiveX controls and behaviors.
There is competition among different implementations. (This was not always the case; MSFT Internet Explorer enjoyed a virtual monopoly in the early 2000s, which not coincidentally was a period of stagnation in the development of the web.)
There exists a standardization process for formalizing changes and this process has credible claim to impartiality. While software vendors participate in work carried out by these groups, no single vendor exercises unilateral control over the direction of standards. (At least that is the theory- hijacking of standards group to lend the imprimatur of W3C or IETF on what is effectively a finished product already implemented by one vendor is not uncommon.)

The challenge of changing Bitcoin

Bitcoin is the exact opposite of the web.

Intolerant of experimentation

Because money is at stake, all nodes on the network have to agree on what is a valid transaction. There needs to be consensus about the state of the blockchain ledger at all times. Consensus can drift temporarily when multiple miners come up with new blocks at the same time, and it is unclear at first which will emerge as the “final” word on the ledger. But the system is designed to eliminate such disagreements quickly and have everyone converge on a single winning chain. What it can not tolerate is a situation where nodes permanently disagree about which block is valid because there is some new feature only recognized by some fraction of nodes. That makes it tricky to introduce new functionality without playing a game of chicken with upgrade deadlines.

There is a notion of soft-forks for introducing new features which “only” requires a majority of nodes to upgrade as opposed to everyone. These are situations where the change happens to be backwards compatible in the sense that a nodes that does not upgrade will not reject valid transactions using the new feature. But it may incorrectly accept bogus transactions, because it is not aware of additional criteria implied by that feature. Counter-intuitive as that sounds, this approach works because individual nodes only accepts transactions when they are confirmed by getting included by miners in the blockchain. As long as the majority of miners have upgraded to enforce new rules, bogus transactions will not make it into the ledger. This soft-fork approach has been flexible enough to implement a surprising number of improvements, including segregated-witness most recently. But there are limits: expanding blocksize limit can not be done this way because nodes would outright reject blocks exceeding the hardcoded limit even if miners mint them. That would require a hard-fork, which is the disruptive model where everyone must upgrade by a particular deadline. Those who fail face the danger of splitting off into a parallel universe where transactions move funds in ways that are not reflected in the “real” ledger recognized by everyone else.

No diversity in implementations

Until recently, virtually all Bitcoin peers were running a single piece of software (Bitcoin Core) maintained by the aptly named core team. Even today that version retains over 80% market share while its closest competitors are forks that are identical in all but one feature, namely the contentious question of how to increase maximum blocksize.

No impartial standard group

The closest to an organization maintaining a “specification” and deciding which Bitcoin improvement-proposal or BIPS gets implemented is the core team itself. It’s as if W3C is not only laying down the rules of HTML, but also shipping the web-browser and the web-server used by everyone in the world. Yet for all that power, that group still has no mandate or authority to compel software upgrades. It can release new updates to the official client with new features, but it remains up to miners and individual nodes to incorporate that release.

Between a rock and a hard-fork

This leaves Bitcoin stuck in its current equilibrium. Without the flexibility of experimenting with new features in a local manner by anyone with a good idea, all protocol improvements must be coordinated by a centralized group- the ultimate irony for a decentralized system. That group is vested with significant power. In the absence of any articulated principles around priorities- keeping the system decentralized, keeping miners happy, enabling greater Bitcoin adoption etc.- all of its decisions will be subject to second-guessing, met with skepticism or accusations of bias. Without the mandate or authority to compel upgrades across the network, even a well-meaning centralized planning committee will find it difficult to make drastic improvements or radical changes, lest the system descend into chaos with a dreaded hard-fork. Without the appetite to risk hard-forks, every improvement must be painstakingly packaged as a soft-fork, stacking the deck against timely interventions when the network is reaching a breaking point.

Making USB even more dicey: encrypted drives

Imagine you were expecting to documents from a business associate. Being reasonably concerned with op-sec, they want to encrypt the information en route. Being old fashioned, they also opt for snail-mailing an actual physical drive in the mail instead of using PGP or S/MIME to deliver an electronic copy. Is it safe to connect that USB drive that came out of the envelope? If this is bringing back memories of BadUSB, let’s take the exercise one step further: suppose this drive uses hardware-encryption and requires installing custom applications before users can access the encrypted side. But conveniently there is a small unencrypted partition containing those applications ready to install. Do you feel lucky today?

This is not hypothetical- it is how typical encrypted USB drives work.

Lost/stolen USB drives have been implicated in data breaches and encryption-at-rest is a solid mitigation against such threats. But this well-meaning attempt to improve security against one class of risks ends up reducing operational security overall against a different one: users are trained to install random applications from a USB drive that shows up in their mailbox. (It’s not a stretch to extend that bad habit to drives found anywhere, based on recent research.) Meanwhile the vendors are not helping their own case with a broken software distribution model and unsigned applications. Here is a more detailed peek at the Kingston DataTraveler Vault.

On connecting the drive, it will appear as a CD-ROM. This is not unusual; USB devices can encapsulate multiple interfaces as a single composite device. In this case the encrypted contents are not visible yet because the host PC does not have the requisite software. Looking into the contents of that emulated “CD” we find different solutions intended to cover Windows, OSX and Linux.

Windows

Considering this is an enterprise product and most enterprises are still Windows shops, one would expect the most streamlined experience here. Indeed- Windows itself recognizes the presence of an installer as soon as the drive is connected and asks about what to do. (Actually asking for user input is a major improvement over past Windows versions cursed with the overzealous autorun feature, which happily executed random binaries from any connected removable drive.)

If we decline this offer and decide to take a closer look at the installer, we can see that it has been digitally signed by the manufacturer using Authenticode, a de facto Windows standard for verifying the origin of binaries:

Looking at the digital signature of the installer in Explorer

Using the signtool utility:

Using signtool to examine Authenticode signature details

The signature includes a trusted timestamp hinting at the age of the binary. (Note the certificate is expired but the signature is still valid. This is possible only because the third-party timestamp provides evidence that the signature was originally produced at a point in time while the certificate was still valid.)

This particular signature uses the deprecated SHA1 hash algorithm, but we will give Kingston a pass on that. The urgency of migrating to better options such as SHA256 did not become apparent until long after this software had shipped. Authenticode features also include checking for certificate revocation, so we can be reasonably confident that Kingston is still in control of the private-key used to sign their binaries and they did not experience a Stuxnet moment. (Alternative view: if they have been 0wned, they are not aware of it and have not asked Verisign to revoke their certificate.)

Overall there is reasonable evidence suggesting that this application is indeed the legitimate one published by Kingston, although it may be slightly outdated given timestamps.

OSX

OSX applications can be signed and the codesign utility can be used to verify those signatures. Did Kingston take advantage of it? No:

What- me worry about code signing?

Not that it matters; a support page suggests that installing that application would have been a lost cause anyway:

The changes Apple made in MacOS 10.11 disabled the functionality of our secure USB drives. It will cause the error ‘Unable to start the DTXX application…’ or ‘ERR_COULD_NOT_START’ when you attempt to log into the drive. We recommend that you update your Kingston secure USB drive by downloading and installing one of the updates provided in links below.

(As an aside, it is invariably Apple’s fault when OS updates break existing applications.)

Luckily it is much easier to verify the integrity of this latest version- the download link uses SSL (although not the support page linking to the download, allowing for MITM attacks to substitute a different one.) More importantly Kingston got around to signing this one:

Kingston_signed_OSX_update

Linux

It is a rare surprise when any vendor attempts to get an enterprise scenario working on Linux. At this point, asking for some proof of the authenticity of binaries might be too much and the screenshot above confirms that.

In fairness Linux does not have its own native standard for signing binaries. There are some exceptions- boot loaders are signed with Authenticode thanks to UEFI requirements and some systems such as Fedora also require kernel-modules to be signed when secure-boot is used. There are also higher-level application frameworks such as Java and Docker with their own primitive code-signing schemes. But state-of-the-art code authentication in open source is PGP signatures. Typically there is a file containing hashes of files, which is cleartext signed using one of the project developers’ keys. (As for establishing trust in that key? Typically it is published on key servers and also available for download over SSL on the project pages- thus paradoxically relying on hierarchical certificate-authority model of SSL to bootstrap trust in the distribute PGP web-of-trust.) Luckily Kingston did not bother with any of that; Linux directory just contains a handful of ELF binaries without the slightest hint of verifying their integrity.

On balance

Where does that leave this approach to distributing files?

Users on recent versions of Windows have a fighting chance to verify that they are running a signed binary (Even that small victory comes with a long list of caveats: “signed” is not a synonym for “trustworthy.” There are dangerous applications which are properly signed, including remote-access tools.)
Users on OSX have to go out of their way to download the application from less dubious sources, by hunting around for it on the vendor website
Linux users should abandon hope and follow standard operating procedure: launch a disposable, temporary virtual machine to run dubious binaries and hope they did not include a VM-escape exploit.

Even considering 100% Windows environments, there is collateral damage: training users that it is perfectly reasonable to run random binaries that arrive on a USB drive. In the absence of additional controls such as group policy, users are one click away from bypassing the unsigned binary warning to execute something truly dangerous. (In fact given that Windows has sophisticated plug-and-play framework with capability to download drivers from Windows Update on demand, it is puzzling that any users action to install an application is required.)

Bottom line: encrypting data on removable drives is a good idea. But there are better ways of doing that without setting bad opsec examples, such as encouraging users to install random applications of dubious provenance.

Ethereum and lessons on how to wreck a decentralized system

For what has been billed as a decentralized platform for smart-contracts, Ethereum is proving surprisingly amenable to central control. It turns out that when push comes to shove and a too-big-to-fail Ethereum application appropriately called The DAO is on the verge of losing all investor money, proponents of trustless systems start agitating for interventions and bailouts that would make Bernanke blush.

With damage toll from The DAO attack at 3.5M ether and counting, Ethereum team is gearing up to introduce a generic blacklist for Ethereum clients to stop the stolen DAO funds from being cashed out. Starting with the vulnerable DAO contract itself, this blacklist will contain the list of “bad actors” on the network who will be prevented from participating transactions and for good reasons- after all somebody said they are bad people.

But in the security field it is well-known blacklists are a fragile design. Trying to enumerate all known bad actors in a system without a strong notion of identity is playing a game of whack-a-mole. Banned miscreants disappear and reappear under a different pseudonym starting over with a clean reputation. It is much better to whitelist known good entities and only let those people into the network than trying to kick out bad apples after the fact.

So if the objective is to destroy the decentralized nature of Ethereum by exerting control on who gets access to the network, here is a proposal for doing it far more effectively and in a scalable way:

Today anyone can participate in the Ethereum network by generating a cryptographic key. Your network “address” is derived from the public key. Armed with an address, users can send/receive Ether from other participants, launch new contracts or interact with existing ones. That’s not good. How can we distinguish the good guys in white-hats from the bad guys in black-hats if the hoi polloi are allowed on the network without so much as a background check?
Instead let us require that all Ethereum public-keys be certified with an X509 certificate,issued by a trusted third-party CA after vetting the identity of that person. This the same system that underlies confidence in the web. It guarantees that when consumers visit a dubious website asking for their bank login, they will feel much better after being tricked into giving it away.
There is undeniably some “friction” involved in getting certificates (Compounded by the fact that the enrollment process will only work on Windows XP at the outset.) It is necessary to incentivize users to stick with the righteous path. To that end, all Ethereum wallets will display a shiny padlock icon when they receive payments from a certified contract.
- To be clear: certified Ethereum addresses do not receive any preferential treatment from miners, nor are any safer than plain uncertified addresses. It remains a core principle that all Ethereum addresses are equal. But some are more equal than others.
Certificate issuance is a very competitive low-margin business for CAs, with a race to the bottom in prices. In order to help boost CA bottom-lines, an enhanced type of Ethereum certificate called Extraneous Validation or EV will be introduced requiring consumers to submit DNA samples for highest assurance levels. (Privacy concerns will be allayed by discarding those samples without actually checking them, in keeping with the traditional standards of CA due-diligence.) EV rating will naturally include a premium user experience too: compliant wallet software shall display full-screen animation of coins raining down from the sky whenever EV addresses/contracts are printed.
- In addition, transactions involving EV addresses must be stored at lower regions of the process heap. Higher memory locations are akin to nose-bleed seats and unworthy of favored Ethereum contracts.

It is expected that the Ethereum Politburo will take up this proposal as part of their next 5-year plan.

Long live the decentralized revolution.

Getting by without passwords: web-authentication (part I)

There is a recurring theme in the previous series on replacing passwords by cryptographic hardware: all the scenarios were under unilateral control of the user. If you plan to use SSH private-keys for connecting to a remote service such as Github, it is transparent to that service whether those keys are stored locally on disc protected by a passphrase or stored on external tamper-resistant hardware. Same goes for disk-encryption and PGP email signing; the changes required to move from passwords to hardware tokens are localized to that device. In this post we look at a more difficult use-case: authentication on the web.

The fundamental problem is that one exercises very little control over the authentication choices offered by a website. That decision is made by the persons controlling the web server configuration. Until about 5 years ago in fact, there were no options at all: passwords were the only game in town. Only a handful of websites offered two-factor authentication of any type. It was not until Google first deployed the feature that going beyond passwords became an aspirational goal for other services hoping to compete on security. But the majority of 2FA designs did not in fact dispense with passwords; they were more accurately “password-plus” concepts, incremental in nature and trying to augment a user-selected password with an additional factor such as SMS message and one-time passcode generated by a mobile app.

Authentication integrated into SSL/TLS

Given the dearth of options, it may come across as a surprise that the fundamental security protocol that secures web traffic— SSL or TLS, Transport Layer Security in the modern terminology— supports a better alternative. Even more surprisingly, that feature has existed nearly since the inception of the protocol in the late 1990s. The official name for the feature is client-authentication. That terminology is derived from the roles defined in SSL/TLS protocols: a user with a web-browser (“client”) establishes a secure connection to a website (“server”.) 99.999% of the time, only one side of that connection is authenticated as far as TLS is concerned– the server. It is the server that must present a valid digital certificate, and manage corresponding cryptographic keys. Clients have no credentials at the TLS layer. User authentication, to the extent it happens, occurs at a higher application layer; for example with a password typed into a web page and submitted for verification. While that is very much of a legitimate type of authentication, it takes place outside the purview of TLS protocol, without leveraging any functionality built into TLS.

This functionality built into TLS for authenticating users is the exact mirror-image of how it authenticates servers: using public-key cryptography and X509 certificates. Just as websites must obtain server-certificates from a trusted certificate-authority, or CA for short, users obtain client-certificates from a trusted CA. (It could even be the same CA; while the commercial prospects for issuing client-certificates is dim compared to server certificates, a few intrepid CAs have ventured into personal certificate issuance.) But more importantly, users must now manage their own private-keys, the cryptographic secrets associated with those certificates.

Therein lies the first usability obstacle for TLS client-authentication: these credentials are difficult to manage. Consider that passwords are chosen by humans, with the full expectation that they will be typed by humans. They can be complex, random looking string of characters but more likely, they are simple, memorable phrases out of a dictionary. To “roam” a password from one computer to another, that person needs only her own memory and fingers. By comparison, cryptographic keys are not memorable. There is too much entropy, too many symbols to juggle for all but the few blessed with a photographic memory. (Meanwhile attempts to derive cryptographic keys from memorable phrases rarely have good outcomes- consider the BrainWallet debacle for Bitcoin.) It’s relatively easy to generate secret keys on one device and provision a certificate there. The real challenge lies in moving that secret to another device, or creating a backup such that one can recover from the loss/failure of the original. Imagine being locked out of every online account because your computer crashed.

Compounding that is a historical failure to converge on standards: while every piece of software can agree on using passwords the same way, there is great variety (read: inconsistency) in how they handle cryptographic secrets. If you can login to a website using Firefox with a given password, you can count on being able to type the exact same password into Chrome, Internet Explorer or for that matter, a mobile app to achieve same result. The same is not true of cryptographic keys. To pick a simple example: openssh, PGP and openssl all perform identical operations using private-keys (signing a message) in the abstract, from a mathematical perspective. But they differ in the format they expect to find those keys such that it is non-trivial to translate keys between different applications. Even different web browsers on the same machine can have their own notions of what certificates/keys are available. Firefox uses its own “key-store” regardless of operating system, while Chrome defers to the operating system (for example, leveraging CAPI on Windows and tokend on OSX) and Internet Explorer only recognizes keys defined through Windows cryptography API. It is entirely possible for a digital certificate to exist in one location while being completely invisible to the other applications that look for their credentials in another one.

It is no wonder then that TLS client authentication has been relegated to niche applications, typically associated with government/defense scenarios with UI that qualifies for the label “enterprise-quality” Browser vendors rightfully assume that only hapless employees of large organizations who are required to use that option by decree of their IT department will ever encounter that UI.

Server-side support

Assuming that one can control the configuration of a server, Apache, IIS and nginx can all be set up with client-authentication. At a high-level this involves:

Specifying a set of certificate authorities (CAs) trusted for issuing client certificates. This is the counterpart of clients maintaining their own collection of trust-anchors to validate server certificates presented by websites. Getting this right is important because it influences client behavior. During the TLS handshake, the server will actually send the client this list to serve as hint for picking the correct certificates. The user-agent can show different UI depending on whether there is zero, one or multiple options for the user to choose from.
- Unlike web-browsers who implicitly trust hundreds of public CAs in existence, trust-anchors for the server are bound to be very limited. Most public CAs do not issue client certificates to individuals, so there is no point to including them in the group. Typically TLS client authentication is deployed in closed-loop systems: members of an organization accessing internal resources, with exactly one CA also operated by the same entity to issue everyone their credentials.
- There is also the degenerate case of trusting any issuer, which includes self-signed certificates. That is not useful for authentication by itself: it only establish a pseudonym, by proving continuity of identity. “This is the same person that authenticated last time with same public-key.” But it can be coupled with other, one-time verification mechanisms to implement strong authentication.
Deciding whether client-authentication is mandatory vs optional. In the first case, the server will simply drop the connection when the client lacks valid credentials. There is no opportunity for the application layer to present a user-friendly error message, or even redirect them to an error page. That requires exchanging HTTP traffic, which is not possible when the lower-layer TLS channel has not been setup. For that reason it is wiser to use the optional setting, and have application code handle errors by checking if a certificate has been presented.

Fine-grained control & renegotiation

One crucial difference between server capabilities is that Apache and IIS can apply client-authentication to specific paths such as only pages under /foo directory. By comparison nginx can only enable the feature for all URLs fielded by a given server. It is not possible to configure on a per-path basis.

At first blush, it is somewhat puzzling that Apache and IIS can distinguish between different pages and know when to prompt the user. Recall that TLS handshake takes place before any application level data- in this case HTTP- has been exchanged. Meanwhile the URL path requested is conveyed as part of the HTTP request. So how is it possible for a web server to decide when to request a client certificate? Let’s rule out one trick: this behavior is not implemented by always prompting for a client certificate and then ignoring authentication failures based on path. Requesting a client certificate can modify the user experience: some user-agents simply panic and rop the connection, others such as early version of IE might display a dialog box with an empty list of certificates that confuses users.

The answer is a little-known and occasionally problematic feature of TLS: renegotiation. Most TLS connections involve a single handshake completed at the beginning, followed by exchanging application level data. But the protocol permits the server to request a new handshake from scratch, even after data has been exchanged. This is what allows Apache and IIS to provide fine-grained control: initially a TLS connection is negotiated with server authentication only. This allows the client to start sending HTTP traffic. Once the request is received, the server can inspect the URL and decide whether that particular page requires additional authentication for the client. In that case the exchange of application-level HTTP traffic is temporarily paused, and a lower-level TLS protocol message (namely ServerHello) is sent to trigger a second handshake. This time the server includes an additional message type (namely CertificateRequest) signaling the client to present its certificate.

Client-side support

Popular web browsers have also supported client-authentication since late 1990s, but the user-experience often leaves much to be desired. As an additional challenge, our objective is not just using public-key authentication but also using external cryptographic hardware to manage those keys. That introduces the added complexity of making sure the web browser can interface with such devices.

Chrome

Here is Chrome on OSX displaying a modal dialog that asks the user to choose from a list of three certificates:

TLS Client Authentication prompt in Google Chrome 51, expanded view

This dialog appears when accessing a simple demo web-server that requested client authentication. (It is an expanded version of the original abridged dialog, which starts out with only the list at the top, without detailed certificate information below.) All available certificates are displayed, including those that are stored locally on the machine, and those which are associated with a hardware token- in this case a PIV token- currently connected. Note that the UI does not visually distinguish; but hardware credentials only appear in the listing when the token is connected.

Choosing one of the certificates on the PIV token leads to a PIN prompt:

Chrome PIN prompt for hardware token

Chrome turns out to be something of a best-case scenario:

It is integrated with the OSX key-chain and tokend layer for interfacing with cryptographic hardware (In theory tokend has been deprecated since 10.7, but in practice remains fully functional and supported via third-party stacks.)
Credentials do not need to be explicitly “registered” with Chrome, as long as there is a tokend module which supports them. This example uses the same OpenSC tokend that makes possible previous scenarios such as smart-card login on OSX. Connecting the PIV token is enough.
The dialogs themselves are part of OSX, which creates uniform visuals. For example here is the same UI from Safari- note the icon overlaid on the padlock is now the Safari logo:

TLS client authentication prompt in Safari 9.1, compact view

For an example of a more tricky setup, we next turn to Firefox and getting the same scenario working there.

[continued]

Box enterprise key-management: in cloud we (still) trust

Notions of privacy in cloud-computing

Private computation in the cloud has been around—in concept, if not in actual implementation— for almost as long cloud computing or “grid computing” as its early predecessors were known by a distinctly industrial sounding moniker. From the outset, concerns about data security has been one of the primary obstacles to outsourcing computing infrastructure to third-parties. What happens to proprietary company information when it is sitting on servers owned by somebody else? Can this cloud-provider be trusted to not peek at customer data or tamper with the operation of services that tenants run inside the virtual environment? Can the IaaS provider guarantee that some rogue employee can not help themselves to the confidential data uploaded there? What protections exist if government agencies with creative interpretations of the fourth-amendment show up at the door? Initially cloud providers were quick to brush aside these concerns, or at best respond with appeals to brand authority and credentials (ISO 27001 certified, PCI-compliant etc.) Once customers proved skeptical and demanded actual evidence of operational security, special cases were crafted: Amazon provides a dedicated cloud for its government customers, presumably with improved controls and isolated from the unwashed masses with their own VMs running less sensitive applications.

Meanwhile the research community has welcomed the emerging questions as the starting point for a research agenda around computing on encrypted data. These schemes take a drastic step by placing no trust in cloud providers, positing that they will only receive encrypted data which they can not decrypt- not even temporarily, an important distinction that critically fails for many of the existing systems, as we will see. The question becomes whether cloud services can still perform meaningful operations on encrypted data, such as searching for keywords or number-crunching, producing results which can only be decrypted by the original owner. That approach holds a lot of promise, provided it can be implemented efficiently. It preserves the advantage of cloud computing (lease CPU cycles, RAM and disk space from someone else on demand) while maintaining confidentiality of the data processed.

Status quo

Between these ends of the spectrums, private-computation in the cloud appears to be caught in a chasm between a rock and a hard place:

Stuff that can not possibly work: impassioned self-declarations of honesty/competence by the vendor
Stuff that does not yet work: promising research in academic literature around special-cases without a feasible solution for the general case

Solutions in the first category effectively boil down to the premise: “trust us, we will not peek at your data.” Some of these are transparently non-technical in nature: for example warrant canaries are an attempt to work-around the gag-orders accompanying national security letters (NSLs) by using the absence of a statement to hint at some incursion by law enforcement. Others attempt to bury those critical trust assumptions in layers of complex technology.

Box & enterprise key-management

Take for instance Box’s enterprise key-management. On paper this is attempting to address a legitimate scenario discussed in earlier posts: storing data in the cloud encrypted such that the cloud-provider can not read the data even if it wanted to. This is a far-cry from how popular cloud storage providers operate today: by default Google Drive, Microsoft One Drive and Dropbox have full access to customer data. Sure, they may store that data encrypted within their own data-center– a capability hilariously touted as some ground-breaking privacy feature or competitive advantage. In reality such encryption is only there to protect against risks internal to the cloud service provider: rogue employees, theft of hardware from data-centers etc. At the end of the day that layer of encryption is fully reversible by the hosting service, without any cooperation required by the original custodian.

Far more robust are designs which guarantee that the storage provider can not recover user data, even if they were compelled or employees were hit with the evil stick. Until a few years ago, that level of security would have seemed a quixotic requirement out of paranoid minds. Along came Snowden and a sea-change happened: cloud-providers started trying to build actual cryptographic protections instead of vouching for their good intentions.

The solution Box has announced with much fanfare decidedly fails to achieve that objective. To see, why let’s review the outline of that design to the extent that can be gleamed from public sources:

There is a master-key for each “customer” (defined as an enterprise, rather than end-user; recall that Box distinguishes itself from Dropbox and similar services by focusing on managed IT environments.)
As before, individual files uploaded are encrypted with a key that Box generates.
The new twist is that those individual bulk-encryption keys are in turn encrypted by the master-key
So far, this is only adding a hierarchical aspect to key management. Where EKM is different is transferring custody of the master-key back to the customer, specifically to HSMs hosted at AWS (via CloudHSM) and backed up by additional HSMs hosted in the customer data-center carrying replicas of the same secrets. (It is unclear whether this is a symmetric key or asymmetric key-pair but the latter design would make far more sense. It would allow encryption to proceed locally without involving remote HSMs and only decryption to require interaction.)

Box implies that this last step is sufficient to provide “Exclusive key control – Box can’t see the customer’s key, can’t read it or copy it.”

Is that sufficient? Let’s ask what could go wrong.

Trust by any other name

First observe that the bulk-data encryption keys are generated by Box. These keys need to be generated “randomly” and discarded afterwards, keeping only the version wrapped by the master-key. A trivial way for Box to retain access to customer data— for example, if ordered by law enforcement— is to generate keys using a predictable scheme or simply stash aside the original key.

Second, note that Box can still decrypt data anytime, as long as the HSM interface is up. Consider what happens when employee Alice uploads a file and shares it with employee Bob. At some future instant, Bob will need to get a decrypted copy of this file on his machine. By virtue of the fact Box must be given access to HSMs, there must exist at least one path where that decryption takes place within Box environment, with Box making an authenticated call to the HSM. (Tangent: Box has a smart-client and mobile app so in theory decryption could also be taking place on the end-user PC. In that model HSM access is granted to customer devices instead of Box service itself, keeping the trust boundary internal to the organization. But that model faces practical difficulties in implementation. Among other things, HSM access involves some shared credentials- for example in the case of Safenet Luna SA7000s used by CloudHSM, there is a partition passphrase that would need to be distributed to all clients. There is also the problem that user Alice could decrypt any document, even those she did not have access to by permission. Working around such issues would require additional infrastructure, such as placing another service in front of HSMs to authenticates users based on their own enterprise identity, rather than Box account. Even then there is the scenario for files from a web-browser when no such intelligence exists to perform on the fly decryption client-side.)  That raises two problems. Most obvious one is that the call does not capture user intent. As Box notes, any requests to HSM will create an audit-trail but that is not sufficient to distinguish between the cases:

Employee Bob is really trying to download the file Alice uploaded
Some Box insider went rogue and wants to read that document

While there is an authentication step required to access HSMs, those protocols can not express whether Box is acting autonomously versus acting on behalf of a user at the other side of the transaction requesting a document.

The second problem applies even if Box refrains from making additional HSM calls in order to avoid arousing suspicion (Just to be on the safe side, in case the enterprise is checking HSM requests against records of what documents its own employees accessed, even though the latter is provided by Box and presumably subject to falsification.) During routine use of Box, in the very act of sharing content between users, plaintext of the document is exposed. If Box wanted to start logging documents— because it has gone rogue or is being compelled by law enforcement— it could simply wait until another user tries to download the same document, in which case decryption will happen. No spurious HSM calls are required. For that matter Box could wait until Alice makes some revisions to the document and uploads a new version in plaintext.

Point-in-time trust vs ongoing trust

Stepping back from these specific objections, there is an even more fundamental flaw in this concept: customers still have to trust that Box has in fact implemented a system that works as advertised. This is not one-time trust at the outset, but ongoing trust for the lifetime of the service. The first case would have been easy to accept. It is the type of optimistic assumption one makes all the time: when purchasing hardware from a manufacturer, one hopes the units were not back-doored. If the manufacturer decides to go rogue after the units are shipped, it is too late; they can not go back and compromise existing inventory in the field. (Barring auto-update or remote-access mechanisms, of course.) Here the problem is much worse: Box can go “rogue” anytime to start logging cleartext data, silently escrow keys to another party or simply use weak keys that can be recovered later. Now current Box employees will no doubt swear upon a stack of post-IPO shares that no such shenanigans are taking place. This is the same refrain: “trust us, we are honest.” They are almost certainly right. But to outsiders their cloud service is an opaque black-box: there is no way to verify that such claims are accurate. At best an independent audit may confirm the claims made by the service provider, corroborating one pledge of good-intentions with another ( “trust Ernst & Young, they are honest too”) without altering the core dynamic: this design critically relies on competent and faithful execution by cloud provider to guarantee privacy.

A miss is as good as a mile

Why single out Box when this is the modus operandi for most popular cloud-storage services? Because of all cloud scenarios, storage is most amenable to end-to-end privacy. Unlike searching encrypted documents or computing complex functions over a spreadsheet of encrypted columns, there is no cutting-edge research problem here. There are no breakthroughs required in efficient homomorphic encryption, no waiting for more iterations of Moore’s law to get hardware fast enough. It is already possible to implement remote storage in the cloud with nearly zero trust. In the same way that one does not have to worry about whether the disk-drive on their laptop is selling out their, one can design systems such that service providers are truly glorified disk-drives in the cloud with no visibility into the bits they are storing. (It is of course a different question whether such services can be as those that “add value” by actively processing stored data and trying to offer additional services around it.) Such providers compete on availability– providing service without having outages or losing data. They never have to compete on privacy by making heart-felt statements about their good intentions. Cryptography properly employed already solves that problem.

Looking back on the Google Wallet plastic card

[Full disclosure: This blogger worked on Google Wallet 2011-2013]

Google recently announced that its Wallet Card launched in 2013 will be discontinued this summer:

“After careful consideration, we’ve decided that we’ll no longer support the Wallet Card as of June 30. Moving forward, we want to focus on making it easier than ever to send and receive money with the Google Wallet app.”

This is the latest in a series of changes starting with the rebranding of original Google Wallet into Android Pay. It is also a good time to look back on the Wallet Card experiment in the context of the overall consumer-payments ecosystem.

Early iteration of the Wallet card

Boot-strapping NFC payments

The original Google Wallet launched in 2011 was a mobile application focused on contactless payments or colloquially tap-and-pay: making purchases at bricks-and-mortar stores using the emerging Near Field Communication wireless interface on Android phones. NFC already enjoyed support from payment industry, having been anointed as the next generation payment interface combining the security of chip & PIN protocols with a modern form factor suitable for smartphones. (Despite impressive advances in manufacturing every-thinner phones, it’s still not possible to squeeze one into a credit-card slot, although LoopPay had an interesting solution to that problem.) There were already pilot projects with cards supporting NFC, typically launched without much marketing fanfare. At one point Chase and American Express shipped NFC-enabled cards. That is remarkable considering that on the whole, banks have been slow to jump on the more established contact-based chip & PIN technology. NFC involves even more moving parts. Not only must the card contain a similar chip to execute more secure payment protocols, but it requires an antenna and additional circuitry to draw power wirelessly from the field generated by a point-of-sale terminal. In engineering terms, that translates into more opportunities for a transaction to fail and leave a card-holder frustrated.

Uphill battle for NFC adoption

Payment instruments have multiple moving pieces controlled by different entities: banks issue the cards, merchants accept them as a form of payment with help from payment-processors and ultimately consumers make payments. Boot-strapping a new technology can be either an accelerating virtuous-cycle or stuck in a chicken-and-egg circularity.

Issuers: It’s one thing for banks to be issuing NFC-enabled plastic cards on their own, quite another for those cards to be usable through Google Wallet. After all the whole point of having a card with chip is that one can not make a functioning “copy” of that card by typing in the number, expiration-date and CVC into a form. Instead the bank must cooperate with the mobile-wallet provider (in other words, Google) to provision cryptographic keys over the air into special hardware on the phone. Such integrations were far from standardized in 2011 when Wallet launched, leaving customers with only two choices: a Citibank MasterCard and a white-label prepaid card from Metabank. Not surprisingly, this was a significant limitation for consumers who were not existing Citibank customers or interested in the hassle of maintaining a prepaid card. It would have been a hard slog to scale up one issuer at a time but an even better option presented itself with the TxVia acquisition: virtual cards for relaying transactions transparently via the cloud to any existing major credit-card held by the customer. That model wasn’t without its own challenges, including unfavorable economics and fraud-risk concentration at Google. But it did solve the problem of issuer support for users.
Merchants: Upgrading point-of-sale equipment is an upfront expense for merchants, who are reluctant to spend that money without a value proposition. For some being on the cutting edge is sufficient. When mobile wallets were new (and Google enjoyed ~3 year lead before ApplePay arrived on the scene) it was an opportunity to attract a savvy audience of early-adopters. But PR benefits only extend so far. Card networks did not help the case either: NFC transactions still incurred same costs in credit-card processing fees, even though expected fraud rates are lower for when using NFC compared to magnetic stripes which are trivially cloned.
Users: For all the challenges of merchant adoption, there was still a decent cross-section of merchants accepting NFC payments in 2011: organic grocery-chain Whole Foods, Peet’s coffee, clothing retailer Gap, Walgreens pharmacies, even taxicabs in NYC. But merchants were far from being the only limiting factor for Google. In the US wireless carriers represented an even more formidable obstacle. With Verizon, AT&T and T-Mobile having thrown in their lot with a competing mobile payments consortium called ISIS (later renamed Softcard to avoid confusion with the terrorist group) they lobbied to block their own subscribers from installing Google Wallet on their phones.

Another pre-release design

From virtual to physical: evolution of the proxy-card

Shut out of its own customers’ devices and locked in an uneasy alliance with wireless carriers over the future of Android, Google turned to an alternative strategy to deliver a payment product with broader reach, accessible to customers who either did not have an NFC-enabled phone or could not run Google Wallet for any reason. This was going to be a regular plastic card, powered by the same virtual card technology used in NFC payments.

For all intents and purposes, it was an ordinary MasterCard that could be swiped anywhere MasterCard was accepted. It could also be used online for card-not-present purchases with CVC2 code. Under the covers, it was a prepaid-card. Consumers could only spend existing balances loaded ahead of time. There was no credit extended, no interests accruing on balances, no late fees. It did not show up on credit history or influence FICO scores.

There would still be a Google Wallet app for these users; it would show transactions and managing funding sources. But it could not be used for tap-and-pay. NFC payments— once the defining feature of this product— had been factored out from the mobile application, becoming an optional feature available to a minority of users when the stars aligned.

Prepaid vs “prepaid”

But there was one crucial difference from the NFC virtual-card: users had to fund their card ahead of time with a prepaid balance. That might seem obvious given the “prepaid” moniker, yet it was precisely a clever run-around that limitation which had made the Google Wallet NFC offering a compelling product. When users paid tapped their phone, the request to authorize that transaction was routed to Google. But before returning a thumbs up or down, Google in turn attempted to place a charge for the exact same amount on the credit-card the customer had setup in the cloud. The original payment was authorized only after this secondary transaction cleared. In effect, the consumer has just funded their virtual-card by transferring $43.98 from an existing debit/credit card, and immediately turned around to spend that balance to make a purchase which coincidentally was exactly $43.98.

Not for the plastic card: there was an explicit stored-value account to keep track of. This time around that account must “prepaid” for real, with an explicit step taken by the consumer to transfer funds from an existing bank-account or debit/credit card associated with the Google account. Not only that but using a credit card as funding source involves explicit fees to the tune of 2.9% to cover payment processing. (If the same logic applied to NFC scenario, $97 purchase at the cash-register would have been reflected as $100 charge against the original funding source.)

The economics of the plastic card necessitate this. Unlike its NFC incarnation, this product could be used at ATMs to withdraw money. If there were no fees for funding from a credit-card, it would have effectively created a loop-hole for free cash-advances: tapping into available credit on a card without generating any of the interchange fees associated with credit transactions. While having to fund in advance was a distinct disadvantage, in principle existing balance could be spent through alternative channels such as purchases from Google Store or peer-to-peer payments to other users. But none of those other use-cases involve swiping— which raises the question: what is the value proposition of a plastic card in the first place?

Final version

End of the road, or NFC reaches critical mass

In retrospect the plastic card was stuck in no man’s land. From the outset it was a temporary work-around, a bridge solution until mobile-wallets could run on every device and merchants accepted NFC consistently. That first problem was eventually solved by jettisoning the embedded secure-element chip at the heart of the controversy with wireless carriers, and falling back to a less-secure but more open alternative called host-card emulation. As for the second problem, time eventually took care of that with a helping hand from ApplePay which gave NFC a significant boost. In the end, the plastic proxy-card lived out its shelf-life, which is the eventual fate for all technologies predicated on squeezing out a few more years out of swipe transactions, including dynamic/programmable stripes and LoopPay.