Safenet HSM key-extraction vulnerability (part II)

Exploit conditions

One question we have not addressed until now is the threat model. Typically before deriving related-keys and HMACing our chosen message, we have authenticate to the HSM. In the case of our Luna G5, that takes place out-of-band with USB tokens and PIN entered on external PIN-entry-device, or PED, attached to the HSM. For CloudHSM it uses a more rudimentary approach involving passwords sent by the client. (Consequently CloudHSM setup can only achieve level-2 security assurance in FIPS 140-2 evaluation criteria while PED-authenticated versions can achieve level-3.) Regardless of the authentication mode, the client must have a logged in session with HSM to use existing keys.. It is enough then for an attacker to compromise the client machine in order to extract keys. That may sound like a high barrier or even tautological- “if your machine is compromised, then your keys are also compromised.” But protecting against that outcome is precisely the reason for using cryptographic hardware in the first place. We offload key management to special-purpose, tamper-resistant HSMs because we do not trust our off-the-shelf PC to sufficiently resist attacks. The assumption is that even if the plain PC were compromised, attackers only have a limited window for using HSM keys and only as long as they retain persistence on the box, where they risk detection. They can not exfiltrate keys to continue using them after their access has been cut off. That property both limits damage and gives defenders time to detect/respond. A key extraction vulnerability such as this breaks that model. With a vulnerable HSM, temporary control over client (or HSM credentials, for that matter) allows permanent access to key outside the HSM.

PKCS #11 object attributes

The vulnerability applies to all symmetric keys, along with elliptic curve private-keys. There is one additional criteria required for exploitation: the key we are trying to extract must permit key-derivation operations. PKCS#11 defines a set of boolean attributes associated with stored objects that describe usage restrictions. In particular CKA_DERIVE determines whether a key can be used for derivation. A meta-attribute CKA_MODIFIABLE determines whether other attributes (but not all of them) can be modified. Accordingly an object that has CKA_DERIVE true or CKA_MODIFIABLE true— which allows arbitrarily changing the former attribute— is vulnerable.

Surprisingly many applications create keys with all of these attributes enabled, even when the operation is not meaningful. For example the Java JSP provider for Safenet creates keys with modifiable attribute set to true, and all possible purposes enabled. If a Bitcoin key were generated using that interface, the result would support not only digital signature- which is the only meaningful operation for Bitcoin keys, as they are used to sign transactions- but also wrap/unwrap, decryption and key derivation. It requires using the low-level PKCS #11 API to correctly configure attributes according to the principle of least-privilege, with only intended operations enabled. In fairness, part of the problem is that the APIs can not express the concept of an “ECDSA key” at generation time. This is obvious for the generic Java cryptography API which uses a generic “EC” type for generating elliptic curve keys. The caller does not specify ahead of time the purpose that key is being generated. Similarly PKCS #11 does not differentiate based on object type but relies on attributes. A given elliptic-curve private key can be used in ECDSA for signing, ECDH key-agreement to derive keys or ECIES for public-key decryption depending on whether corresponding CKA_* attributes are set.

Mitigation

Latest firmware update from Safenet addresses the vulnerability by removing weak key-derivation schemes. This is the more cautious approach. It is preferable to incremental tweaks such as attempting to set a minimum key-length, which would not be effective. For example if the HSM still allowed extract-key-from-key but required a minimum of 16 bytes, one could trivially work around it: prepend (or append) known 15 bytes to an existing key, then extract the first (or last, respectively) 16 bytes. Nominally the derived key is 16 bytes long and satisfies the constraints. In reality all but one byte is known and brute-forcing this key is no more difficult than brute forcing a single byte.

Likewise it is tempting to blame the problem on extract-key-from-key but other bit-flipping and splicing mechanism are equally problematic. All of the weak KDF schemes permit “type-casting” keys between algorithms, allowing attacks against one algorithm to be applied to keys that were originally intended for a different one. For example an arbitrary 16-byte AES can not be brute-forced given state-of-the-art today. But suppose you append/prepend 8 known bytes to create a 3DES key, as Safenet HSMs permit with the concatenate mechanisms. (Side-note: Triple-DES keys are 21 bytes but they are traditionally represented using 24 bytes with least-significant bit reserved as parity check.) The result is a surprisingly weak key that can be recovered using a meet-in-the-middle attack with the same time complexity as recovering a single-DES key, albeit at the cost of using a significant amount of storage. Similarly XOR and truncation together can be used to recover keys by exploiting an unusual property of HMAC: appending a zero-byte to an HMAC key does not alter its outputs, up to the block size of the hash function. Even XOR alone without any truncation is problematic when applied to 3DES, where related-key attacks against the first and third subkey are feasible.

Workarounds using PKCS#11 attributes

Since the attack relies on using key-derivation mechanisms, the following work-around seems natural for protecting existing keys: set CKA_DERIVE to false which will prevent the key from being used in derivation mechanism and also set CKA_MODIFIABLE to false, making the object “read-only” going forward. This does not work; the CKA_MODIFIDABLE attribute is immutable and determined at time of key generation. If the key was not generated with proper set of attributes, it can not be protected after the fact. But there is a slightly more complicated work-around that uses object cloning. While a modifiable object can not be “fused” into a read-only object, it is possible to duplicate it and assign new attributes to the clone. This is the one opportunity for changing CKA_MODIFIABLE attribute to false. (Incidentally the transition in the opposite direction is disallowed: it is not possible to make a modifiable clone of an object that started out being immutable.) That creates a viable work-around: duplicate all objects and set modifiable/derive attributes to false in the new copy, delete the original. Applications may have to be reconfigured to use the new copy, which will have a different numeric handle, but could retain same label as original, if keys were being looked-up by name.

One limitation of this approach is that some secrets are intended for key-derivation. For example that secp256k1 private-key could have been used for ECDH key-agreement. That operation happens to be considered “key-derivation” according PKCS#11. That means CKA_DERIVE can not be set to false without rendering the key unusable. Per-object policy does not distinguish between derivation mechanisms at a granular level.

FIPS to the rescue?

Safenet HSMs have an option to be configured in “strict-FIPS” mode. This setting is defined by administrator at HSM-level and disables certain weak algorithms. At first we were hopeful this could be the one time where FIPS demonstrably improves security by outright mitigating a vulnerability. That turns out not to be the case. Even though the documentation states that weak algorithms are “disallowed” in FIPS mode, the restrictions only come into play when using keys. For example HSM will still generate a single DES key in strict-FIPS mode; but it will refuse to perform single-DES encryption. As for the problematic key-derivation mechanisms at the heart of this vulnerability: they are still permitted, as is HMAC using very short secrets.

Even if strict-mode FIPS worked as expected, it is not practical for existing users. Switching FIPS policy is a destructive operation; all existing keys are deleted. Instead a more indirect operation is required: backup all keys to a dedicated backup device, switch FIPS setting and restore from the backup or another HSM. After all that trouble any defensive gains would still be short-lived: nothing prevents switching the FIPS mode back and restoring from backups again.

Residual risks: cloning

The same problem with backup-and-restore also applies to cloning. Safenet defines a proprietary replication protocol to copy keys from one unit to another, as long as they share certain configurations:

Both HSMs must have same authentication mode: eg password-authenticated (FIPS 140-2 level 2) or PED-authenticated (FIPS 140-2 level 3)
Both HSMs must be configured with the same cloning domain. This is an independent password or set of PED keys, distinct from “crypto-officer” or “crypto-user” credential required to use existing keys.

Strangely cloning works even when source/target HSM have different FIPS settings- it is possible to clone from an HSM in strict FIPS mode to one that is not. More surprisingly, it also works across HSMs with different firmware versions. So there is still an attack here: clone all keys from a fully-patched HSM to a vulnerable unit controlled by the attacker. Weak key-derivation algorithms will be enabled (on purpose) in this latter unit, allowing the attack to be carried out.

How serious is this risk? Cloning requires exactly the same access as working with existing keys in the HSM: for the USB connected Luna G5, that is a USB connection. For the SA7000 as featured in AWS CloudHSM, it can be done remotely over the network. In other words an attacker who compromises a machine authorized to use the HSM, they get this access for free. The catch is that an additional credential is required, namely the cloning domain. Unlike standard “user” credentials necessary to operate the HSM, cloning-domain is not used under normal operation, only when initializing HSMs. Compromising a machine that is authorized to access the HSM guarantees compromise of the user role (or “partition owner” role in Safenet terminology.) But it does not guarantee that cloning-domain credentials can be obtained from the same box, unless the operators were being sloppy in reusing same passphrase.

On Safenet HSM key-extraction vulnerability CVE-2015-5464 (part I)

This series of posts is provides a more in-depth explanation of the key-extraction vulnerability we discovered and reported to Safenet, designated as CVE-2015-5464.

PKCS11

Safenet HSMs are closely based on the PKCS#11 specification. This is a de facto standard designed to promote interoperability between cryptographic hardware by providing a consistent software interface. Imagine how difficult it would be to write a cryptographic application such as Bitcoin wallet to work with external hardware if each device required a different API for signing a Bitcoin transaction. Certainly at low-level differences between devices are apparent: some connect over USB while others are addressed over TCP/IP, each device typically requires different device driver much like brands of printers do. Instead PKCS11 seeks to provide a higher-level point where these differences can be abstracted behind a unified API, with a vendor-provided PKCS#11 module translating each function into the appropriate commands native to that brand of hardware.

PKCS#11 is a very complex standard with dozens of APIs and wide-range of cryptographic operations, called “mechanisms” for everything from encryption to random number generation. Safenet vulnerability involves the key derivation mechanisms. These are used to create a cryptographic key as a function of another key. For example BIP-32 for Bitcoin proposes the notion of hierarchical-deterministic wallets where a family of Bitcoin addresses are derived from a single “seed” secret. Designed properly, key-derivation provides such an amplification effect while protecting the primary secret.Even if a derived key is compromised, the damage is limited. One can not work their way back to the seed. But when designed improperly, the derived key has a simple relationship to the original secret and leaks information about it.

Some options are better left unimplemented

That turns out to be the problem with several of the key-derivation mechanisms defined in PKCS#11 and implemented by Safenet. (To give a flavor of what is supported, here is the list of options presented by the demonstration utility ckdemo shipped as part of Safenet client.) Many of these are sound. A few are problematic, with varying consequences. For example the ability to toggle secret-key bits using XOR and perform operations with the result leads to exploitable conditions for certain algorithms.

Related-key cryptanalysis is the specific branch specializing in these attacks. It turns out that for Safenet HSMs, we do not need to dig very deep into cryptanalytic results. There are at least two mechanisms that are easy to exploit and work generically against a wide-class of algorithms: extract-key-from-key and XOR-base-and-data.

Slicing-and-dicing secrets

Extract-key-from-key is defined in section 6.27.7 of PKCS#11 standard version 2.30. It may as well have been renamed “extract-substring” as the analog of standard operation on strings. This derivation scheme creates a new key by taking a contiguous sequence of bits at desired offset and length from an existing key. Here is an example of this in action with ckdemo utility provided by Safenet.

We start out with an existing 256-bit AES key with handle #37. Here are its PKCS #11 attributes:

PKCS #11 attributes of original AES key

Note CKA_VALUE_LEN attribute is 0x20 in hex, corresponding to 32 bytes as expected for 256-bit AES. Because the object is sensitive, those bytes comprising the key can not be displayed. But we can use key-derivation mechanism to extract a two-byte subkey from the original. We pick extract-key-from-key mechanism, start at the most-significant bit (ckdemo starts indexing bit-positions at 1 instead of 0) and extract 2 bytes:

Using extract-key-from-key derivation

Now we look at attributes of the derived key. In particular note that its length is reported as 2 bytes:

PKCS #11 attributes of derived key

So what can we do with this resulting two-byte key, which is not going to be very difficult to brute-force? Safenet supports HMAC with arbitrary sized keys so we can HMAC a chosen message:

HMAC chosen message using derived key

Given this primitive, the attack is straightforward: brute-force the short key by trying all possibilities against known message/HMAC pairs. In this case we get 0x5CD3 since:

$ echo -n ChosenMessage | openssl dgst -sha256 -hmac `echo -en "\x5c\xd3"` 
(stdin)= 1db249f0e928b3aeff345aedaa3365ea690f06f3710433fc4a063b4cfffbe930

That corresponds to the two most-significant bytes of the original key. Now we can iterate: derive another short-key at different offset (say bits 17 through 32), brute-force that using a chosen message attack, repeat until all key bytes are recovered. Fully automated, this requires a couple of seconds with Luna G5, much less time with the more powerful SA7000 used in CloudHSM. Main trade-off is available computing power to brute-force key fragments offline. Given more resources, larger fragments of multiple contiguous bytes can be recovered at a time, necessitating fewer key derivation and HMAC operations. (Also since we have a chosen-plaintext attack with HMAC input that we control, there are time-space tradeoffs to speed up key recovery by building look-up tables ahead of time.)

Surprisingly this works not only against symmetric keys such as AES or generic HMAC secrets but also against elliptic-curve private keys (RSA, plain DSA and Diffie-Hellman were not affected.) This is an implementation quirk: these mechanisms are typically intended for symmetric-keys only. For elliptic-curve keys, the byte array being truncated is the secret scalar part of the key. For example the “secret” component for a Bitcoin ECDSA key is a discrete logarithm in secp256k1. Internally that discrete logarithms is just stored as 32-byte scalar value, and extract-key-from-key can be used to successively reveal chunks of that scalar value.

XOR-base-and-data suffers from a very similar problem. This operation derives a new key by XORing user-chosen data with original secret key. While there are cryptographic attacks exploiting that against specific algorithms such as 3DES, a design choice made by Safenet leads to simpler key recovery attack that works identically against any algorithm: when the size of data is less than size of the key, result is truncated to data size. XORing 256-bit AES key with one-byte data results in one-byte output. That provides another avenue for recovering a key incrementally: we derive new HMAC key by XORing with successively longer sequences of zero bytes, with only the last segment of new key left to brute-force at each step.

[continued in part II: exploit conditions, workarounds and mitigations]

NFC hotel keys: when tag-types matter

Sometimes low-tech is better for security, particularly when compared to choosing the wrong type of latest technology. Growing use of NFC in hotel keys provides some instructive examples.

The hospitality industry started out with old-fashioned analog keys, where guests were given actual physical objects. Not surprisingly, they promptly proceeded to lose or misplace those keys, creating several problems for the hotel. First one is getting replacement keys made— certainly there are additional copies lying around, but they are still finite in number compared to the onslaught of careless guests. But more costly is having to potentially change that lock itself. With a key missing (and very likely attached to a key-tag bearing the hotel logo, along with room number) there is now someone else out there who could get into the room at will to burglarize the place. Upgrading to programmable card keys conveniently solves both problems. With a vast supply of cheap plastic stock, the hotel need not even charge guests for failing to return their key at checkout. Lost cards are not a security problem either, as the locking mechanism itself can be programmed to reject previous keys.

Until recently most of these cards used magnetic-stripes, the same 40-year old technology found in traditional credit-cards. It is trivial to clone such cards by reading the data encoded on the card and writing it over to a new blank card. Encrypting or authenticating card contents will not help— there is no need to reverse-engineer what those bits on the stripe represent or modify them. We are guaranteed that an identical card with same exact bits will open the door as long as the original does. (There may well be time-limitations, such as check-out date encoded in the card beyond which the door will not permit entry but those apply to both the original and clone.)

Jury is out on whether cards are easier to copy than old-fashioned keys. In both cases, some type of proximity is required. Reading magnetic stripe involves swiping the card through a reader; this can not be done with any stealth while the guest remains in possession of the card. Physical keys can also be copied at any home-improvement store but more surprisingly a clone can be made from a photograph of the key alone. The bitting, that pattern of bumps and ridges, is the abstract representation of the key that is sufficient to create a replica. This can be extracted from an image, as famously demonstrated for Diebold voting-machines. One paper from 2008 even used a high-powered zoom lenses to demonstrate feasibility of automatically extracting bitting from photographs taken 200ft away from the victim.

Next cycle of upgrades in the industry is replacing magnetic stripes by NFC. This blogger recently used such a key and decided to scan it using the handy NXP TagInfo application on Android:

Ultralight hotel key

How does this fare in comparison to the previous two options? It is arguably easiest to clone.

First notice the tag type is Ultralight. As the name implies, this is among the simplest types with least amount of memory. More importantly, Ultralight tags have no security mechanism against protecting their contents against reading. They are effectively 48 bytes worth of open books, available for anyone to read without authentication. Compare that to Mifare DESFire or even Mifare Classic, where each sector on the card has associated secret keys for read/write access. Data on those tags can only be accessed after NFC reader authenticates with the appropriate key.

So far this is not that different from magnetic stripes. The problem is NFC tags can be scanned surreptitiously while they are still on the person. It requires proximity; “N” does stand for “near” and typical read-ranges are limited to ~10cm. The catch is tags can be read through common materials such as cotton, plastic or leather. Metal shielding is required to effectively protect RFID-enabled objects against reading. It takes considerable skill to grab a card out of someone’s pocket, swipe it through a magnetic-stripe reader and return it undetected. It is much easier to gently bump into the same person while holding an ordinary phone with NFC capabilities.

The business case for upgrading a magnetic-stripe system to Ultralight is unclear. It certainly did not improve the security of guest rooms. By all indications it is also more expensive. Even the cheapest NFC tag still costs more than the ubiquitous magnetic stripe which has been around for decades. Same goes for readers mounted at every door required to process these cards. The only potential cost savings lie in using off-the-shelf mobile devices such as Android tablets for encoding cards at the reception desk, a very small piece of overall installation costs. (Speaking of mobile, this system also can not implement the more fancy use case of allowing customers to use their own phone as room key. While NFC hardware in most devices is capable of card-emulation, it can only emulate specific tag types. Ultralight is not one of them.) For a small incremental cost, the company providing this system could have used Ultralight C tags, which have three times as much memory, but also crucially, support 3DES-based authentication with readers that makes cloning non-trivial.

Dual-EC, BitLocker disk encryption and conspiracy theories

[Full disclosure: this blogger worked at MSFT but has not been involved in BitLocker development]

Infosec community is still looking for a replacement since the cancellation of TrueCrypt. Last year the mysterious group behind the long standing disk-encryption system announced they were discontinuing work. In a final insult to users, they suggested current users migrate to BitLocker, the competing feature built into Windows. It could not have been worse timing, just when NCC Group announced to great fanfare their completion of an unsolicited security audit on the project. (Not to worry; there are plenty of audit opportunities left in OS/2 and DEC Ultrix for PDP11, to take other equally relevant systems as TrueCrypt.) What to do when your favorite disk encryption system has reached end-of-life? Look around for competing alternatives and weigh their strengths/weaknesses for starters. However a recent article on Intercept looking at Windows BitLocker spends more time spinning conspiracy theories than helping users migrate to BitLocker. There are four “claims” advanced:

Windows supports the dual-EC random number generator (RNG) which is widely believed to have been deliberately crafted by the NSA to be breakable
BitLocker is a proprietary implementation, and its source code is not available for review
MSFT will comply with law-enforcement requests to provide content
MSFT has removed the diffuser from BitLocker without a good explanation, demonstrably weakening the implementation

Let’s take these one by one.

“Windows has dual-EC random number generator”

It is true that Windows “next-generation” crypto API introduced in Vista supports dual-EC RNG, widely believed to have been designed by the NSA with a backdoor to allow predicting its output. In fact it was a pair of MSFT employees who first pointed out in a very restrained rump-session talk at 2007 Crypto conference that dual-EC design permits a backdoor without speculating on whether NSA itself had availed itself of the opportunity. Fast forward to Snowden revelations, and RSA Security finds itself mired in a PR debacle when it emerged that the company accepted $10M payment from the NSA for incorporating dual-EC.

Overlooked in the brouhaha is that while dual-EC has been available as an option in Windows crypto API, it was never set as the default random number generator. Unless some application went out of its way to request a different RNG— and none of the built-in Windows features including BitLocker ever did that— the backdoor would have sat idle. (That said it creates interesting opportunities for post-exploit payloads: imagine state-sponsored malware whose only effect on target is switching default system RNG, with no other persistence.)

From a product perspective, the addition of dual-EC RNG to Vista can be considered as a mere “checkbox” feature aimed at a vocal market segment. There was a published standard from NIST called SP800-90 laying down a list of officially-sanctioned RNG. Such specifications may not matter to end-users but carry a lot of weight in government/defense sector where deployments are typically required to operate in some NIST-approved configuration. That is why the phrase “FIPS-certified” makes frequent appearances in sales materials. From MSFT perspective, a lot of customers required those boxes to be checked as a prerequisite for buying Windows. Responding to market pressure, MSFT added the feature and did so in exactly the right way such “niche-appeal” features should be introduced: away from the mainline scenario, with zero impact on majority of users who do not care about it. That is the main difference between RSA and Windows: RSA made dual-EC the default RNG in their crypto library. Windows offered it as an option, but never set as the default the system RNG. (It would have made no sense; in addition to security concerns, it was plagued by dog-slow performance compared to alternatives based on symmetric ciphers such as AES counter-mode.)

Bottom line: Existence of a weak RNG as an additional option to satisfy some market niche— an option never used by default— has no bearing on the security of BitLocker.

“BitLocker is not open-source”

Windows itself is not open-source either but that has never stopped people from discovering hundreds of significant vulnerabilities by reverse engineering the binaries. Anyone is free to disassemble any particular component of interest or single-step through it in a debugger. Painstaking as that effort may be compared to reading original source, thousands of people have made a career out of this within the infosec community. In fact Microsoft even provides debug symbols drawn from source-code to make that task easier. As far as closed-source binaries go, Windows is probably the most carefully examined piece of commercial software with an entire cottage industry of researchers working to make that process in crafty ways. From comparing patched binaries against their earlier version to reveal silently fixed vulnerabilities to basic research on how security features such as EMET operate, being closed-source has never been a hurdle to understanding what is going on under the hood. The idea that security research community can collectively uncover hundreds of very subtle flaws in the Windows kernel, Internet Explorer or the graphics subsystem— massively complex code-bases compared to BitLocker— while being utterly helpless to notice a deliberate backdoor in disk encryption is laughable.

Second, many people past and present did get to look at Windows source code at their leisure. Employees, for starters. Thousands of current and past MSFT employees had the opportunity to freely browse Windows code, including this blogger during his tenure at MSFT. (That included the separate crypto codebase “Enigma” which involved signing additional paperwork related to export-controls.) To allege that all of these people, many of whom have since left the company and spoken out in scathing terms about their time, are complicit in hiding the existence of a backdoor or too oblivious/incompetent to notice its presence is preposterous.

And it is not only company insiders who had many chances to discover this hypothetical backdoor. Some government customers were historically given access to Windows code to perform their own audit. More recently the company has opened transparency centers in Europe inviting greater scrutiny. The idea that MSFT would deliberately include a backdoor with full knowledge that highly sophisticated and cautious customers— including China, not the most trusting of US companies— would get to pore over every line, or for that matter provide doctored source-code to hide the backdoor, is equally preposterous.

Bottom-line: Being open-source may well improve the odds for security community at large to identify vulnerabilities in a particular system. (But even that naive theory of “given enough eyeballs, all bugs are shallow” has been seriously questioned in the aftermath of Shellshock and never-ending saga of OpenSSL) But being closed-source in and of itself can not be a priori reason to disqualify a system on security grounds, much less serve as “evidence” that a hidden backdoor exists after having survived years of reverse-engineering in arguably the most closely scrutinized proprietary OS in the world. Linking source-code availability to security that way is a non-sequitur.

“MSFT will comply with law-enforcement requests”

This is a very real concern for content hosted in the cloud. For data stored on servers operated by MSFT such as email messages at Hotmail/Outlook.com, files shared via One Drive or Office365 documents saved to the cloud, the company can turn over content in response to an appropriate request from law enforcement. MSFT is not alone in that boat either; same rules apply to Google, Facebook, Twitter, Yahoo, DropBox and Box. Different cloud providers compete along the privacy dimension based on product design, business model, transparency and willingness to lobby for change. But they can not hope to compete in the long run on their willingness to comply with existing laws on the books or creative interpretations of these laws.

All that aside, BitLocker is disk encryption for local content. It applies to data stored on disk inside end-user machine and removable media such as USB thumb-drives. It is not used to protect content uploaded to the cloud. (Strictly speaking one could use it to encrypt cloud storage, by applying BitLocker-To-Go on virtual disk images. But that is at best a curiosity, far from mainstream usage.)

On the surface then it seems there is not much MSFT can do if asked to decrypt a seized laptop with BitLocker enabled. If disk encryption is implemented properly, only the authorized user possesses the necessary secret to unlock. And if there is some yet-to-be-publicized vulnerability affecting all BitLocker usage such as cold-boot attacks, weak randomness or hardware defects in TPMs, there is no need to enlist MSFT assistance in decryption. Law enforcement might just as well exploit that vulnerability on their own, using their own offensive capabilities. Such a weaknesses would have existed all along, before the laptop is seized pursuant to an investigation. There is nothing MSFT can do to introduce a new vulnerability after the seizure, any more than they can go back in time to back-door BitLocker before it was seized.

But there is a catch. Windows 8 made a highly questionable design decision to escrow BitLocker keys to the cloud by default. These keys are stored associated with the Microsoft Live account, presumably as a usability improvement against forgotten passphrases. If a user were to forget their disk encryption passphrase or the TPM used to protect keys malfunctions, they can still recover as long as they can log into their online account. That capability provides a trivial way for MSFT to assist in the decryption of BitLocker protected volumes: tap into the cloud system to dig up escrowed keys. Good news is that default behavior can be disabled; in fact, it is disabled by default in enterprise systems presumably because MSFT realized IT departments would not tolerate such a cavalier attitude around key management.

Bottom-line: There is a legitimate concern here, but not in the way the original article envisioned. Intercept made no mention of the disturbing key-escrow feature in Windows 8. Instead the piece ventures into purely speculative territory around Government Security Program from 2003 and other red-herrings around voluntary public/private-sector cooperation involving MSFT.

“MSFT removed the diffuser”

For a change, this is a valid argument. As earlier posts mentioned, full-disk encryption suffers from a fundamental limitation: there is no room for an integrity check. The encryption of one sector on disk must fit exactly on that one sector. This would not be a problem if our only concern was confidentiality, or preventing other people from reading the contents of data. But it is a problem for integrity, detecting whether unauthorized changes were made. In cryptography this is achieved by adding an integrity check to data. That process is frequently combined with encryption because both confidentiality and integrity are highly desirable properties.

But in FDE schemes without any extra room to stash an integrity check, designers are forced to take a different approach. They give up on preventing bad guys from making changes, but try to make sure those changes can not be controlled with any degree of precision. In other words you can flip bits in the encrypted ciphertext stored on disk, and it will decrypt to something (without an integrity check, there is no such thing as “decryption error”) but that something will be meaningless junk; or so the designers hope. The original BitLocker diffusers attempted to achieve that effect, by “mixing” the contents within a sector such that modifying even a single bit of encrypted data would result in randomly introducing errors all over the sector after decryption. That notion was later formalized in cryptographic literature, standardized into modes such as XTS that are now supported by self-encrypting disk products on the market.

Fast forward to Windows 8 and the diffuser mysteriously goes away, leaving behind vanilla AES encryption in CBC mode. With CBC mode it is possible to introduce partially controlled changes at the level of AES blocks. (“Partial” in the sense that one block can be modified freely but then the previous block is garbled.) How problematic is that? It is easy to imagine hypothetical scenarios based on what the contents of that specific location represent. What if it is a flag that controls whether firewall is on and you could disable it? Or registry setting that shuts off ASLR? Or enables kernel-debugging, which then allows controlling the target with physical access? It turns out a more generic attack is possible in practice involving executables. The vulnerability was already demonstrated with LUKS disk-encryption for Linux. Suppose that sector on disk happens to hold an executable file that will be run by the user. Controlled changes mean the attacker can modify the executable itself, controlling what instructions will be executed when that sector is decrypted to run the binary. In other words, you get arbitrary code execution. More recently, the same attack was demonstrated against the diffuser-less BitLocker.

So there is a very clear problem with this MSFT decision. It weakens BitLocker against active attacks where the adversary gets the system to decrypt the disk after having tampered with its contents. That could happen without user involvement if decryption is done by TPM alone. Or it may be an evil-maid attack where the laptop is surreptitiously modified but the legitimate owner, being oblivious, proceeds to unlock the disk by entering their PIN.

Bottom-line: Windows 8 did weaken BitLocker, either because the designers underestimated the possibility of active attacks or made a deliberate decision that performance was more important. It remains to be seen whether Windows 10 will repair this.

Private cloud-computing and the emperor’s new key management (part II)

[continued from part I]

So what are the problems with Box enterprise-key management?

1. Key generation

First observe that the bulk data encryption keys are generated by Box. These are the keys used to encrypt the actual contents of files in storage. These keys need to be generated “randomly” and discarded afterwards, keeping only the version wrapped by the master-key. But access to the customer key is not required if one can recover the data-encryption keys directly. A trivial way for Box to retain access to customer data- for example, if ordered by law enforcement- is to generate keys using a predictable scheme or simply stash aside the original key.

2. Possession of keys vs. control over keys

Note that Box can still decrypt data anytime, as long as the HSM interface is up. For example consider what happens when employee Alice uploads a file and shares it with employee Bob. At some future instant, Bob will need to get a decrypted copy of this file on his machine. By virtue of the fact Box must be given access to HSMs, there must exist at least one path where that decryption takes place within Box environment, with Box making an authenticated call to the HSM.**

That raises two problems. The first is that the call does not capture user intent. As Box notes, any requests to HSM will create an audit-trail but that is not sufficient to distinguish between the cases:

Employee Bob is really trying to download the file Alice uploaded
Some Box insider went rogue and wants to read that document

While there is an authentication step required to access HSMs, those protocols can not express whether Box is acting autonomously versus acting on behalf of a user at the other side of the transaction requesting a document. That problem applies even if Box refrains from making additional HSM calls in order to avoid arousing suspicion— just to be on the safe side, in case the enterprise is checking HSM requests against records of what documents its own employees accessed, even though the latter is provided by Box and presumably subject to falsification. During routine use of Box, in the very act of sharing content between users, plaintext of the document is exposed. If Box wanted to start logging documents- because it has gone rogue or is being compelled by an authorized warrant- it could simply wait until another user tries to download the same document, in which case decryption will happen naturally. No spurious HSM calls are required. For that matter Box could just wait until Alice makes some revisions to the document and uploads a new version in plaintext.

3. Blackbox server-side implementation

Stepping back from specific objections, there is a more fundamental flaw in this concept: customers still have to trust that Box has in fact implemented a system that works as advertised. This is ongoing trust for the life of the service, as distinct from one-time trust at the outset. The latter would have been an easier sell because such leaps of faith are common when purchasing IT. It is the type of optimistic assumption one makes when buying a laptop for example, hoping that the units were not Trojaned from the factory by the manufacturer. Assuming the manufacturer was honest at the outset, deciding to go rogue at later point in time would be too late- they can not compromise existing inventory already shipped out. (Barring auto-update or remote-access mechanisms, of course.)

With a cloud service that requires ongoing trust, the risks are higher: Box can change its mind and go “rogue” anytime. They can start stashing away unencrypted data, silently escrowing keys to another party or generating weak keys that can be recovered later. Current Box employees will no doubt swear upon a stack of post-IPO shares that no such shenanigans are taking place. This is the same refrain: “trust us, we are honest.” They are almost certainly right. But to outsiders a cloud service is an opaque black-box: there is no way to verify that such claims are accurate. At best an independent audit may confirm the claims made by the service provider, reframing the statement into “trust Ernst & Young, they are honest” without altering the core dynamic: this design critically relies on competent and honest operation of the service provider to guarantee privacy.

Bottom line

Why single out Box when this is the modus operandi for most cloud operations? Viewing the glass as half-full, one could argue that at least they tried to improve the situation. One counter-point is that putting this much effort for negligible privacy improvement makes for a poor cost/benefit tradeoff. After going through all the trouble of deploying HSMs, instituting key-management procedures and setting up elaborate access-controls between Box and corporate data center, the customer ends up not much better than they would have been using vanilla Google Drive.

That is unfortunate because this problem is eminently tractable. Of all the different private-computing scenarios, file storage is most amenable to end-to-end privacy- after all there is not much “computing” going on, when all you are doing is storing and retrieving chunks of opaque ciphertext without performing any manipulation on it. Unlike solving the problem of searching over encrypted text or calculating formulas over a spreadsheet with encrypted cells, no new cryptographic techniques are required to implement this. (With the possible exception of proxy re-encryption; but only if we insist that Box itself handle sharing. Otherwise there is a trivial client-side solution, by decrypting and reencrypting to another user public-key.) Instead of the current security theater, Box could have spent about the same amount of development effort to achieve true end-to-end privacy for cloud storage.

** Tangent: Box has a smart-client and mobile app so in theory decryption could also be taking place on the end-user PC. In that model HSM access is granted to enterprise devices instead of Box service itself, keeping the trust boundary internal to the organization. But that model faces practical difficulties in implementation. Among other things, HSM access involves some shared credentials- for example in the case of Safenet Luna SA7000s used by CloudHSM, there is a partition passphrase that would need to be distributed to all clients. There is also the problem that user Alice could decrypt any document, even those she did not have access to by permission. To work around such issues, would require adding a level of indirection by putting another service in front of HSMs that authenticates users via their standard enterprise identity, not their Box account. Even then there is the scenario for files from a web-browser when no such intelligence exists to perform on the fly decryption client-side.

Private cloud-computing and the emperor’s new key management (part I)

The notion of private computation in the cloud has been around at least in theory for almost as long cloud computing itself, even predating the times when infrastructure-as-a-service went by the distinctly industrial sounding moniker “grid-computing.” That precedence makes sense, because it addresses a significant deal-breaker for many faced with the decision to outsource computing infrastructure: data security. What happens to proprietary company information when it is now sitting on servers owned by somebody else? Can this cloud-provider be trusted to not “peek” at the data or tamper with the operation of the services that tenants are running inside the virtual environment? Can the IaaS provider guarantee that some rogue employee can not help themselves to confidential data in the environment? What protections exist if some government with creative interpretation of fourth-amendment right comes knocking?

Initially cloud providers were quick to brush aside these concerns with appeals to brand authority and brandishing certifications such as ISO 27001 audits and PCI-compliance. Some customers however remained skeptical, requiring special treatment beyond such assurances. For example Amazon has a dedicated cloud for its government customers, presumably with improved security controls and isolated from the other riff-raff always threatening to break out of their own VMs to attack other tenants.

Provable privacy

Meanwhile the academic community was inspired by these problems to build a new research agenda around computing on encrypted data. These schemes assume cloud providers are only given encrypted data which they can not decrypt- not even temporarily, an important distinction that critically fails for many of the existing systems as we will see. Using sophisticated cryptographic techniques, the service provider can perform meaningful manipulations on ciphertext such as searching for text or number-crunching, producing results that are are only decryptable by the original data owner. This is a powerful notion. It preserves the main advantage of cloud computing: lease CPU cycles, RAM and disk space from someone else on demand to complete a task while maintaining confidentiality of the data being processed, including crucially the outputs from the task.

Cloud privacy in practice

At least that is the vision. Today private-computation in the cloud is caught in a chasm between:

Ineffective window-dressing that provides no meaningful security- subject of this post
Promising ideas that are not quite feasible at-scale yet, such as fully homomorphic encryption

In the first category are solutions which boil down to the used-car salesmen pitch: “trust us, we are honest and/or competent.” Some of these are transparently non-technical in nature: for example warrant canaries are an attempt to work-around the gag-orders accompanying national security letters by using the absence of a statement to hint at some incursion by law enforcement. Others attempt to cloak or hide the critical trust assumption in layers of complex technology, hoping that an abundance of buzzwords (encrypted, HSM, “military-grade,” audit-trail, …) can pass for a sound design.

Box enterprise key management

As an example consider enterprise-key management feature pitched by Box. On paper this is attempting to solve a very real problem discussed in earlier posts: storing data in the cloud encrypted in such a way that the cloud-provider can not read the data. To qualify as “private-computation” in the full sense, that guarantee must hold even when the service provider is:

Incompetent- experiences a data-breach by external attackers out to steal any data available
Malicious- decides to peek into or tamper with hosted data, in violation of existing contractual obligations to the customer
Legally compelled- required to provide customer data to law-enforcement agency pursuant to an investigation

A system with these properties would be a far-cry from popular cloud storage solutions available today. By default Google Drive, Microsoft One Drive and Dropbox have full access to customer data. Muddying the waters somewhat, they often tout as “security feature” that customer data is encrypted inside their own data-centers. In reality of course such encryption is complete window-dressing: it can only protect against risks introduced by the cloud service provider, such as rogue employees and theft of hardware from data-centers. That encryption can be fully peeled away by the hosting service whenever it wants, without any cooperation required by the original data custodian.

Design outline

The solution Box has announced with much fanfare claims to do better. Here is an outline of that design to the extent that can be gleamed from published information:

There is a master-key for each customer, where “customer” is defined as an enterprise rather than individual end-users. (Recall that Box distinguishes itself from Dropbox and similar services by focusing on managed IT environments.)
As before, individual files uploaded to Box are encrypted with a key that Box generates.
The new twist is that those individual bulk-encryption keys are in turn encrypted by the customer specific master-key

So far, this is only adding a hierarchical aspect to key management. Where EKM is different is transferring custody of the master-key back to the customer, specifically to HSMs hosted at Amazon AWS and backed-up by units hosted in the customer data-center holding duplicates of the same secrets keys. (It is unclear whether these are symmetric or asymmetric keys. The latter design would make more sense by allowing encryption to proceed locally without involving remote HSMs and only decryption to require interaction.)

Box implies that this last step is sufficient to provide “Exclusive key control – Box can’t see the customer’s key, can’t read it or copy it.” Is that sufficient? Let’s consider what could go wrong.

[continued in part II]

Interactive services detection and crypto hardware: when security features collide

It is not uncommon for security features to have unexpected interactions, undercutting each other. For example Tor and Bitcoin do not mix. More subtle are situations when one feature designed to mitigate a specific threat blocks some other security feature from working. This blogger recently ran into an example with Windows Server.

Enterprises frequently have to operate public-key infrastructure (PKI) systems to issue credentials to their own employees—and arguably such closed PKI systems have been far more successful than the house-of-cards that is SSL certificate issuance for the web. There are stand-alone certificate authority products such as the open-source EJBCA package but for most MSFT environments, the requirement is typically addressed by CA functionality built into Windows Server. Certificate Services is a role that can be added to the server configuration, either integrated with Active Directory (not necessarily colocated with the domain controller) or as stand-alone CA.

Offloading key-management

Since the security of PKI is critically dependent on security of the cryptographic keys used by the certificate authority, one of the standard ways to harden such a system is to move key material into dedicated cryptographic hardware. In enterprise environments, this usually means a hardware security module (HSM) connected to the CA servers. Lately the meaning of “HSM” has been greatly watered-down by companies suggesting that a glorified smart-card qualifies- perhaps these designers envision an unorthodox data-center layout with card readers glued to the side of server racks. But if one is willing to live without the improved tamper-resistance and higher performance of a dedicated HSM product, there is a more attractive option built-in. Starting with Windows 8, a Trusted Platform Module can be used to emulate virtual smart cards based on the GIDS specification.

Regardless of hardware choice, from the tried-and-true FIPS140 certified massive box to the jankiest USB token from an over-enthusiastic DIY project, these solutions all have the same defining feature. Private key material used by the CA for signing certificates is stored on an external device. No matter how badly the Windows server itself has been compromised, that key can not be extracted. (Of course the device will happily oblige if the compromised server asks to sign any message. That can be almost as bad as having direct key access when the signature has high value, as Adobe found out in the code-signing case.)

Getting along with external cryptographic hardware

At the nuts and bolts level, getting this scenario to work requires that Windows have some awareness of external cryptographic hardware. Windows crypto stack includes an extensibility layer for vendors to integrate their own device by authoring smart card mini-drivers. Certificate Services role in turn has an option to pick a particular cryptographic service provider during setup:

Screenshot from ADCS configuration UI

As noted earlier, the smart-card provider is actually a “meta-provider” that can route operations to other hardware using a mini-driver for that model of hardware. So the most direct route would be:

Create a virtual-smart card on TPM and initialize it with PIN
Configure Certificate Server to use smart-card provider
Generate the signing key on the virtual smart card

By all appearances, this process appears to work during initial configuration of certificate services. When the CA is being initialized, a PIN prompt for the virtual smart-card will appear and after authenticating, a self-signed certificate will be created as expected. (Assuming we are going with the most common configuration of a root certificate. There are other options such as creating a CSR to source a certificate from a third-party; in that case the CSR will be created correctly.)

But when the service itself is started, something strange happens. The certificate server management console can not connect to the service as RCPs time out. It appears to be stuck; in fact it does not look like it started successfully. It can not be stopped or restarted either, short of killing the hosting process. So what is going on?

Shatter attacks

Explaining why the CA service got stuck involves a flash back to 2005. Prior to Vista, different applications showing UI on a Windows desktop were not isolated from each other. For example a privileged application running with administrator or system account could show a prompt in the same desktop session that unprivileged user applications operate. While this might seem obvious— where else would the UI appear if not on the user’s screen?— there is a serious security problem here. By design applications can send UI-related messages, called “window messages,” to each other. For example one application can send a message to simulate clicking on a button in another application or pasting text into a dialog box.

The original example of this vulnerability dubbed Shatter, involved more than just simulating button clicks or faking keystrokes. It relied on the existence of a specific message that includes a callback function- effectively instructing the application to invoke arbitrary piece of code. As envisioned, these callback functions were supposed to have been specified by the application itself and accordingly trusted. But nothing prevented a different app running under a different OS account from injecting the same type of message into the message queue of another application. When you can influence the flow of execution in a process to the point of making it jump to an arbitrary specified memory address, you have full control. (The original attack also relied on injecting shell-code into the memory space of the target via an earlier window message simulating pasting of text.)

But even if that dangerous message type was deprecated or applications modified to validate the incoming callback before invoking it, the broader architectural problem remains: applications running at different privilege levels can influence each other. For example if the user opened an elevated command line prompt running as administrator, even her unprivileged user applications could send keystrokes to that window, executing arbitrary commands as administrator.

Vista introduced proper UI isolation to address this problem. These changes also affected a special class of applications that normally would never be expected to show UI or interact with users: services. But it turns out many background services are not content to run invisible in the background, and occasionally feel compelled to converse with users. Session-0 isolation comes into play for that case. There are now multiple sessions in Windows, and services all operate in the special privileged session 0. Any UI displayed there would have no effect on the main desktop the user is staring at. This uses the same principle as terminal server: if multiple users were logged into a server, an application opened by one user would only render on his/her desktop, with no effect on others users with their own remote desktop session.

Interactive services detection

Hiding UI from background services under the rug may trades the security problem for an application-compatibility problem. Services are not supposed to display UI directly. Instead they are supposed to have an unprivileged counterpart in the user session they can communicate with via standard inter-process channels such as named-pipes or LPC. Knowing that developers do not necessarily know- much less care to follow- best practices, Windows team faced the problem of accommodating “legacy” services. As far as the service is concerned, there is nothing obviously wrong. It has rendered UI and is waiting for the user to make a decision, which appears to be taking forever.

Interactive services detection attempts to solve that problem by detecting such UI and alerting the user in their own session with a notification. By acting on the notification, user can switch to session 0 temporarily, which has only that one dialog from the interactive service visible, and deal with the prompt.

“Our code never makes that mistake”

That provides an explanation for why certificate services is stuck:

During initial configuration of certificate services role, the cryptographic hardware is being accessed from an MMC console process running in the user session. PIN collection dialog renders without a problem.
During sustained operation of certificate services, the same hardware is accessed from a background service.

So why did interactive service detection not kick-in and alert the user that there is some UI demanding attention in session 0?

The answer is an optimistic assumption made by MSFT that by “now” (defined as Windows Server 2012 time-frame) all legacy services will have been fixed, rendering interactive service detection redundant. In WS2012 the feature is disabled by default. It turns out even Windows Server 2008 had traces of that optimism: 64-bit services were exempt on the theory that developers porting their service from x86 to x64 might as well be forced to fix any interactivity. But in this case the “faulty” code is the built-in certificate service running in native 64-bit mode from MSFT: credential-collection prompts from the smart-card stack are showing up in session 0.

Luckily the feature can be enabled with a registry tweak. With interactive service detection enabled, when certificate services starts up, the expected notification does show up in user session. Switching to session 0 one finds the familiar PIN prompt for the virtual smart card. (Note that entering the PIN is not required each time the CA uses the external crypto device to sign. It is only collected once to create an authenticated session, allowing the system to operate as a true hands-off service after the operator has kick-started it.)

Taskbar notification for interactive service

Interactive services detection dialog on main desktop

The problem is not confined to use of virtual smart cards. Other vendors appear to have run into the same problem. Thales who manufactures the nSafe (formerly nCipher) line of HSMs has a white-paper noting that interactive service detection must be enabled for their product to operate correctly. Oddly enough, the configuration dialog for certificate server already has a checkbox to indicate that administrator interaction may be required for use of signing keys. That alone should have been a hint that this “background service” may in fact need to interact with the administrator, especially when the UI-related behavior of vendor-specific drivers can not be known in advance.

From stock options to predatory lending for tech employees

“We are an investment firm interested in structuring liquidity for current or former [company name redacted] employees. If you or anyone you know is interested in help exercising options […] or is making a major life purchase, we may be able to help.”

Not exactly your average spam message. Addressed by name and referring specifically to a particular start-up, this email contained an offer for an unusual type of financial transaction: loan to help offset the cost of exercising options. Welcome to the Silicon Valley version of predatory lending.

To make sense of the connection between high-risk loans and technology IPOs, we need to revisit the history of equity-based compensation. While employee ownership is by no means new phenomenon, starting with 1990s it became a significant if not primary component of employee compensation for technology startups. Leaving established companies to join startups often involved exchanging predictable salaries for higher-risk, higher-reward structure conferred by equities. In many ways, the original success story for equity in technology was a blue-chip company predating that first dot-com bubble: Microsoft. “Never touch your options” was the advice offered to new employees, urging them to hold out on exercising as long as possible— because the price of MSFT much like tulip bulbs (and later real-estate) could only go up, the argument went.

Equity rewards come in two main flavors: stock grants or stock options. (Strictly speaking there are also important distinctions within each category, such as incentive-stock options versus non-qualifying stock options which we will put aside here.) Stock grants are straightforward. Employees are rewarded a number of shares as part of employment offer, subject to a vesting schedule. Typically there is a “cliff” at one-year for starting to vest some fraction—employees who don’t last that long end up with nothing. This is followed by incremental additions monthly or quarterly until the end of the vesting schedule. It is also not uncommon to issue refresher grants yearly or based on performance milestones.

Options, or more precisely call-options, represent the right to buy shares of stock at a predetermined price. For example, employee Alice joins Acme Inc when company stock is trading at $100. That would be her “strike price.” She is given a grant, which represents some number of option, also subject to a vesting schedule. Fast forward a year later when vesting starts, let’s say Acme stock is trading at $120— representing a remarkable 20% yearly increase, which was not uncommon for technology companies in their early growth phase. These options allow Alice to still purchase her shares at the original $100 price and if she chooses to, turn around and sell at prevailing market price.

Due to vagaries of accounting, options were far more favorable to a company than outright grant of stock. Downside is they transfer significant risk downstream to employees. Notwithstanding the irrational exuberance of asset bubbles, it turns out there is no iron-clad rule guaranteeing a sustained increase in stock prices, not even for technology companies. Even if the company itself is relatively healthy, stock price can be dragged down by overall economic trends— such as recession— such that current market price may well be below the strike price at any given time, rendering the options worthless. There is also a great element of volatility. Employees with start dates only weeks apart may observe very different outcomes if there were wild price swings determining the initial price— not at all uncommon for small growth stocks. It is not even guaranteed that an earlier start is necessarily an advantage. If the strike price happens to be set right before a major correction, employees may find their options underwater for a long time. (After Microsoft experienced a steep decline following its ill-advised adventure with DOJ anti-trust trial and dot-com crash of 2000, some employees observed that quitting and re-applying for their current job would be the rational course of action.) Such volatility runs deep; small fluctuations in price can make a significant difference. Consider an employee whose strike price was $100, when it is trading at $110 currently. So far, so good. Another 10% increase in price would roughly doubles the value of those options. 10% decrease by contrast renders them completely worthless. Stock grants on the other hand are far more stable against such perturbations. A 10% movement in the underlying stock represents exactly 10% change in either direction in the value of the grant.

No wonder then that options have historically invited foul-play. There has been more than one scandal including at the mighty Apple for backdating. Executives have faced criminal charges for these shenanigans. Even Microsoft, the original pioneer of option-based compensation, eventually threw in the towel and switched to stock grants after years of stubborn persistence. Of course for MSFT there was a more compelling reason: the stock price plateaued, or even slightly declined in inflation adjusted terms rendering the options worthless for most employees. (Interestingly, Google also switched from options to RSU stock grants around 2010, even when the core business was still experiencing robust growth, unlike its mismanaged counterpart in Redmond. But Google also created a one-time opportunity for employees to exchange underwater options for new ones priced at post-2008 collapse valuation.)

So what does all this have to do with questionable loans for startup employees? Our scenario involving the hypothetical employee made one assumption: she can sell the stock after exercising the option. Recall that the option by itself is a right to buy the stock. In other words, using the option involves spending money and ending up in possession of actual stock. After that initial outlay, one can turn around immediately and sell those shares on the market to realize the difference as profit. This process is common enough to have its own designation: exercise-and-sell. Most employee stock-option plans allow employees to execute it without fronting the funds for initial purchase of stock. Similarly there is exercise-and-sell-to-cover, where some fraction is sold at market price to offset exercise costs while holding on to remaining shares. In neither case does the employee have to provide any funds upfront for the purchase of stock at the strike price.

Therein lies a critical assumption: that there exists a market for trading the shares. That holds for publicly traded companies but gets murky in pre-IPO situations. Facebook shares were traded on Second Market prior to the IPO. Next generation of startups took that lesson to heart and incorporated clauses into their employee agreements to remove that liquidity avenue. Typically the equity rewards have been structured such that shares are not transferrable and can not be used as collateral.

Suppose employee Alice is leaving her pre-IPO company to take another opportunity. If the equity plan involved outright grants, the picture is simple: Alice retains what she has vested until her final day of employment and leaves on the table the unvested portion. In the case of stock options, the picture gets more complicated. Option plans are typically contingent on employment. Not only does vesting end, but unexercised options are forfeited unless they are exercised before departure or within short time-window afterwards. That means if Alice wants to retain her investment in future success of the company, she must buy the stock outright at its current valuation (typically set based on private financing rounds, at artificially low 409A valuations) and pay tax on “gains” depending on the type of option involved. Of course such gains exist purely on paper at that point, since there is no way for Alice to capture the difference until there is a liquidity event to enable selling her equity. In other words, Alice is forced to take on additional risk to protect the value of options. Alice must come up with cash to cover both the purchase of an asset that has presently zero liquidity— and may well turn out to have zero value, if the company flames out before IPO— and income tax on the profits IRS believes have resulted from this transaction.

This is where the shady lenders enter the picture. Targeting employees at start-ups who have recently changed jobs on LinkedIn, these enterprising folks offer to lend money for covering the value of the options. While this blogger can not comment on the legality of such services, it is clear that both sides are taking on significant risk. Recall that agreements signed by the employee preclude transfer of unexercised options— more precisely, the company will not recognize the new “owner.” Such assets can’t be used as meaningful collateral for the loan, leaving the lender with no recourse if he/she were to default. They could report the default to credit bureaus or forgive the loan— in which case it becomes “income” as far as IRS is concerned, triggering additional tax obligations— but none of that reduces the lender exposure.

Meanwhile the employee is taking a gamble on the same outcome, namely that the executed future value of these options will be realized. If the company does not IPO or market valuation ends up significantly lower than the projections used for purposes of calculating exercise costs, they will end up in the red against the loan. That is materially worse than ending up with underwater options. Recall that an option is a right but not an obligation to purchase stock at previously agreed upon (and one hopes, lower than current market value) price. There is no requirement to exercise an option that is underwater; the rational strategy is to walk away from it. But once options are exercised and shares owned outright, the employee is fully exposed to risk of future price swings. They have effectively “bought” those options on margin, except that instead of a traditional brokerage house extending margin with full control over the account, it is a new breed of opportunistic predatory lenders.

What is wrong with Apple Pay? NFC and cross-channel fraud (2/2)

[continued from part #1]

(Full disclosure: this blogger worked on Google Wallet 2012-2013)

Mobile payment systems as implemented today muddy the clean lines between “card-present” and “card-not-present” interactions. Payments take place at a brick-and-mortars store with the consumer present and tapping an NFC terminal. But the card itself is delivered to the smart-phone remotely, via over-the-air provisioning. Consumers do not have to walk into a bank-branch or even prove ownership of a valid mailing address. They may be required to pass a know-your-customer or KYC test by answering a series of questions designed to deter money laundering. But that requires no physical presence, being carried out within a web-browser or mobile application.

Arguably this is not too different from traditional plastic-cards: they are also “provisioned” remotely, by sending the card in an envelope via snail mail. Meanwhile the application takes place online, with the customer completing a form to provide necessary information for a credit check; effectively card-not-present information. The main difference is that applying for a new credit card has a much higher bar than provisioning an existing card to a smart-phone. In one case, the bank is squarely facing default risk: the possibility that the consumer may run up a hefty bill on the new card and never pay it back. In the latter scenario, there is no new credit being extended— NFC purchases are charged against preexisting line of credit, with the same limits and interest rates as before. There is no reason to suspect that an otherwise prudent and restrained customer will become a spendthrift merely because they can also spend their money via tap-and-pay.

Consequently the burden of proof is much lower when proving that one owns an existing card vs proving that one is a good credit risk for an additional card based on past loan history. Applying for a new line of credit typically requires knowledge of social-security number (perhaps the most misused piece of quasi-secret information, having been repurposed from an identifier into an authenticator) billing address and personal information such as date of birth. Adding an existing card into an NFC wallet is much simpler, although variations exist between different implementations. For example Google Wallet required users to enter their card number, complete with CVV2 to prove ownership. Apple Pay goes one step further, borrowing an idea from Coin: users take a picture of the card with the phone camera. This is largely security theater. Any competent carder can create convincing replicas that will fool a store clerk inspecting the card in his/her hand; a blurry image captured with a phone camera is hardly an effective way to detect forgeries. It is more likely a concession from Apple to issuing banks. More important, no amount of taking pictures will reveal magnetically encoded information from the stripe. Neither Apple Pay or Google Wallet have any assurance of CVV1. (Interestingly Coin can verify that because it ships with an actual card-reader that users must swipe their cards through— a necessity because Coin effectively depends on cloning mag-stripes.)

Bottom line: all of the information required to provision an existing card to a mobile device for NFC payments can be obtained from a card-not-present transaction. For example, it can be obtained via phishing or compromising an online merchant to observe cards in flight through their system. For the first time, it becomes possible to “obtain” a new credit card using only card-not-present data lifted from an existing card. That payment instrument can be now used to go on a spending spree in the real-world, at any retailer accepting NFC payments. Online fraud has breached what used to be a Chinese-wall and crossed over into bricks-and-mortar space.

What is wrong with Apple Pay? NFC and cross-channel fraud (1/2)

Several news stories have “discovered” that Apple Pay has not, in fact, spelled the end of credit card fraud and may even have created new opportunities. That seems surprising considering that NFC payments were supposed to be an improvement over magnetic-stripe swipes in terms of security, using a cryptographic protocol that prevents reusing information stolen from one merchant to make additional fraudulent transactions at another one. Much of the problem turns out to be usual sad state of technology journalism. It is not that NFC or EMV have a new vulnerability that is being exploited against hapless iPhone users. (To be fair, EMV does have its fair share of security weaknesses and mobile payments have introduced some incremental risks, but those subtleties are not what has the press riled up.) Apple has not created a new way to steal credit-cards. But it has created a more effective avenue for monetizing already stolen cards. Apple Pay is not the vulnerability— it is just one particular technique for exploiting an ancient one.

Online vs in-store transactions

Going back to our summary of how credit-card payments operate, we differentiated between two types of transactions:

Card-present, or more colloquially “in-store.” The customer walks up to a cash registers and hands over their card to the merchant. That card can be “read” in different ways. At the low-tech end of the spectrum is old-school mechanical imprinting, creates an actual carbon-copy of the front of the card bearing embossed numbers. More common is the “swipe” where information encoded in the magnetic-stripe at the back of the card is scanned by moving the card through a magnetic field. Finally if the card has a smart chip, there is the EMV option of executing a complex cryptographic protocols between card/terminal. In these cases each interaction is unique and the data observed by the terminal different, unlike a magnetic-stripe which has the same information every time.
“Online,” or what used to be called phone-order/mail-order back when picking up a telephone or sending pieces of paper via USPS did not seem such an antiquated concepts. Generically this class is known as “card-not-present” transaction, because the merchant does not have the actual piece of plastic in hand when placing the charge. (We will avoid the term “online” because in payments it is also used to describe when a point-of-sale terminal is communicating in real-time with card network, as opposed to batching up transactions for later submission.)

Containing fraud

From a fraud perspective, the key observation is that each modality exchanges slightly different information with the terminal. All of them share the same basic data such as credit-card number and expiration. But each one also introduces a unique twist. Track-data on magnetic stripe has a 3-5 digit “card validation code” commonly called CVV1. Online transactions use a different value called CVV2, printed on the card itself but not encoded in the magnetic-stripe. Meanwhile the basic version of EMV simulates the action of swiping for “track-data” for backwards compatibility, but substitutes a dynamic CVV or CVV3 which changes each time in a manner unpredictable without knowing the cryptographic secret stored in the chip.

A corollary of this difference is that it acts as a natural “firewall” between channels. Fraud remains largely contained to its original channel. Consider the criminals who popped Target or HomeDepot point-of-sale terminals in the past. This attack allowed to them amass a cache of raw track-data from magnetic-stripes swiped at those cash registers. That information can be used to create convincing replicas of cards that will behave exactly like the original card when swiped through a reader. But there is no CVV2 encoded in the magnetic-stripe.** That is a limiting factor if our criminals wanted to monetize those cards online, instead of walking into a store. Most websites these days will collect and validate CVV2 for online orders. (As an aside, there trade-offs in both avenue for monetization. In-store fraud is harder to scale because it requires recruiting mules to run the risk of walking into a store with fake cards, with their faces captured on camera. Online fraud scales better; there is no limit to how many websites you can drive to or how many big-box items can fit into the trunk. Downside is delivery involves a shipping address that can be traced- notice how many ecommerce sites flat out refuse shipping to PO boxes.)

[continued]

** Some merchants have started asking for or keying in CVV2 by looking at the card during retail transactions. That is a dangerous pattern. It may help that particular merchant reduce fraud temporarily by doing additional verification on the card, but it weakens the overall ecosystem by putting card-not-present at greater risk against compromised terminals.

Random Oracle

Building and breaking systems