Downside to security standards: when vulnerable is better than uncertified

Are security standards effective? The answer depends to a great extent on the standards process and certification process in question. On the one hand, objective third-party evaluations with clearly spelled-out criteria represent a big improvement over vendors’ own marketing fiction. In a post-Snowden world it has become common to throw around meaningless phrases such as “military grade” and “NSA proof” to capitalize on increased consumer awareness. At the same there is the gradual dawning that “certified” does not equal secure. For all the profitable consulting business PCI generates, that particular compliance program could not save Target and Home Depot from massive data breaches. One could argue that is a symptom of PCI being improperly designed & implemented, rather than any intrinsic problem with information security standards per se. After all it is the canonical example of fighting last year’s war: there is still verbiage in the standard around avoiding WEP and single-DES, as if the authors were addressing time-travelers from 1990s. Narrowing our focus to more “reputable” standards such as FIPS or Common Criteria which have long histories and fewer examples of spectacular failures, the question becomes: are consumers justified in getting a warm and fuzzy feeling on hearing the words “FIPS-certified” and “CC EAL5”?

When vendors are left to their own devices

On the one hand, it is easy to point out examples of comparable products where one model that is not subject to a certification process has demonstrable weaknesses that would have been easily caught during testing. BlackHat briefings recently featured a presentation on extracting 3G/4G cryptographic secrets out of a SIM card using elementary differential power-analysis. For anyone working on cryptographic hardware— and SIM cards are just smart-cards manufactured in a compact form factor designed to fit into a mobile device— this was puzzling. These attacks are far from novel; earliest publications date back to late 1990s. Even more remarkable, off-the-shelf equipment was sufficient to implement this attack against modern hardware. No fancy lab with exotic equipment required. The attack represents a complete break of the SIM security model. These cards have one job: safeguard the secret keys that authenticate the mobile subscriber to the wireless network. When these keys can be extracted and moved to another device, the SIM card has been effectively “cloned”— it has failed at its one and only security objective. How did the manufacturer miss such an obvious attack vector?

It turns out that vanilla SIM cards are not subject to much in the way of security testing. Unless that is, they are also going to be used for something  more critical: mobile payments over NFC. More commonly known as UICC, these units which are also designed to hold credit-card data must go through a series of rigorous evaluations defined by EMVCo. Examples of QA include both passive measurement of side-channel leaks, as well as attempting active attacks to deliberately induce faults in the hardware by manipulating power supply or zapping the chip with laser pulses. Amateur mistakes such as the one covered in BH presentation are unlikely to survive that level of scrutiny.

Frozen in time

From an economic perspective this makes sense. In the absence of external standards, manufacturers have little incentive to implement a decent security program. They are still free to make any number of marketing assertions, which is cheaper than actually doing the work or paying a  qualified, independent third-party to verify these assertions. It is tempting to conclude that more mandatory certification programs can only help bring transparency to this state of affairs: more standards applied to a broader range of products. But there is an often ignored side-effect where inflexible application of certification programs can have the effect of decreasing security. Mindless insistence on only using certified products can introduce friction to fixing known vulnerabilities in fielded systems. This is subtly distinct from the problem of delaying time to market or raising cost of innovation. That part goes without saying: introducing a new checklist as prerequisite to shipping certainly extends development schedule, while empowering third-party certification labs as gatekeepers to be paid by the vendor is effectively a tax on new products. Meanwhile some customer with an unsolved problem is stuck waiting while the wheels are turning on the certification process. That outcome is certainly suboptimal but the customer is not stuck with a defective product with a known exposure.

By contrast, consider what happens when an actual security flaw is discovered in a product that has already been through a certification process, bears the golden stamp of approval and has been deployed by thousands of customers. Certification systems are not magical, they can not be expected to catch all possible defects. (Remember the early optimism equating “PCI compliance” with a guarantee against credit-card breaches?) A responsible vendor can act on the discovery and make improvements to the next version of the product to address the weakness. Given that most products are designed to have some type of update mechanism, they may even be able to ship a retroactive fix that existing customers can apply to their units in the field to mitigate the vulnerability. Perhaps it is a firmware update or recommend configuration change for customers.

There is a catch: this new version may not have the all-important certification. At least, not initially out of the gate. Whether or not changes reset certification and require another pass varies by standard. Some are more lenient and scale the process based on extent of changes. They might exempt certain “cosmetic” improvements or offer fast-tracked “incremental” validation against an already vetted product. But in all cases there is now a question about what happens to the existing certification status. That question must be resolved before the update can be delivered to customers.

Certifiably worse-off

That dilemma was impressed on this blogger about 10 years ago on the Windows security team. When research was published describing new cache-based side-channel attacks affecting RSA implementations, it prompted us into evaluating side-channel protections for existing Windows stack. This was straightforward for the future  version of cryptography stack dubbed CNG. Slated for Vista, that code was still years from shipping, and there was no certification status to worry about.

Fixing the existing stack (CAPI) proved to be an entirely different story. Based on BSAFE library, the code running on all those XP and Windows Server 2003 boxes out there already boasted FIPS 140 certifications as a cryptographic module. The changes in question were not cosmetic; they were at the heart of multi-precision integer arithmetic used in RSA. It became clear that changing that code any-time soon with a quick update such as the monthly “patch Tuesday” would be a non-starter. While we could get started on FIPS recertification with one of the qualified labs, there was no telling when that process would conclude. Most likely outcome would be a CAPI update in a state of FIPS-limbo. That would be unacceptable for customers in regulated industries— government, military, critical-infrastructure— who are required by policy to operate systems according to FIPS compliance at all times. Other options were no better. We could exempt some Windows installations from the update, for example using  conditional logic that reverts back to existing code paths on systems configured in strict FIPS mode. But that would paradoxically disable a security improvement on precisely those systems where people most care about security.

All things considered, delaying the update in this example had relatively low risk. The vulnerability in question was local information disclosure, only  exploitable in specific scenarios when hostile code coexists on same OS with trusted applications performing cryptography. While that threat model became more relevant later with sand-boxed web-browser designs pioneered by Chrome, in the mid-2000s most likely scenario would have been terminal server installations shared by multiple users.

But the episode serves as a  great demonstration of what happens when blind faith in security certifications is taken to its logical conclusion: Customers prefer running code with known vulnerabilities as long as it has the right seal of approval, compared to new version that lives under a cloud of “pending certification.”

Designing for updates

The root cause of these situations is not limited to a rigid interpretation of standards that requires running compliant software 100% of the time even when it has a demonstrable flaw. Security certifications themselves are perpetuating a flawed model based on blessing static snapshots. Most certifications pass judgement on a very specific version of a product evaluated in a very specific configuration and frozen in time. In a world where threats and defenses constantly evolve, one would expect that well-designed products will also change in lockstep. Even products that are traditionally considered “hardware” such as firewall appliances, smart-cards and HSMs have significant amount of software that can be updated in the field. Arguably the capability for delivering new improvements in response to new vulnerabilities is itself an important security consideration, one that is under-valued by criteria focused on point-in-time evaluations.

So the solutions are two fold:

  • More flexible application of existing standards to accommodate a period of uncertainty following a new update being available before it has received official certification. There is already a race against the clock between defenders trying to patch their systems and attackers trying to exploit known vulnerabilities- and the existence of a new update from the vendor tips off even more would-be adversaries, increasing the second pool. There is no reason to further handicap defenders by waiting on the vagaries of third-party evaluation process.
  • Wider time-horizon for security standards. Instead of focusing on point-in-time evaluations, standards have to incorporate the possibility that a product may require updates in the field. That makes sense even for static evaluations: software update capabilities themselves often create vulnerabilities that can be exploited to gain control over a system. But more importantly, products and vendors need a story around how they can respond to new vulnerabilities and deliver updates (so they do not end up with another Android) without venturing into the territory of “non-certified” implementations.

CP

Software auto-updates: revisiting the trade-offs

Auto-updating software sounds like a great idea on paper: code gets better and repairs its own defects magically, without the user having to lift a finger. In reality it has a decidedly mixed track record. In case you missed the latest example, a buggy Windows 10 update caused machines to get into reboot loop. This is far from the first time that Redmond shipped faulty software updates. But it raises the stakes for automatic-update features, since MSFT has drawn a line in the sand with mandatory updates in Windows 10.

That makes it a good time to review the arguments for and against auto-updates:

  • Customers get a better product sooner, with fewer defects and enhanced functionality. Often users lack incentives to go out of their way to apply updates. Significant improvements could be hidden under the hood, without the benefit of shiny objects to lure users. Mitigations for security vulnerabilities are the prime example of invisible yet crucial improvements. In the absence of automatic mechanism for applying updates, few people would go out of their way to install something with non-descriptive name along the lines of “KB123456 Fix for CVE-2015-1234.” (But this may be changing, now that mainstream media routinely covers actively exploited vulnerabilities in Adobe Reader, Flash and IE. It’s as if journalists were enlisted into a coordinated public awareness campaign for security updates.)
  • Makes life easier for the vendor, with long-term benefits for customers. Having all users on the latest version of the product greatly reduces development costs, compared to actively supporting multiple versions for customers who have decided to not update. All new feature development happens against the latest version. Security fixes only have to be developed once, not multiple times for slightly different code-bases each with their quirks. Quality assurance is also helped by having only one version to check against, reducing the probability of buggy updates.
  • In some cases the positive externalities extend beyond the software publisher. Keeping all users on the latest and greatest version of a platform can boost the entire ecosystem. When there are few versions of an application floating in use, other people building on top of that application also have an easier time. Remember the never-ending saga of Internet Explorer on XP? For years versions before IE9 were the bane of web developers: no support for modern HTML5, idiosyncratic security problems such as content-sniffing and random departures from web standards implemented faithfully by every other browser. One site went so far as to institute a surcharge for users on IE7, to compensate for the extra work required to support them. But IE versions did not start out that way: when released in 2001 IE6 was arguably a perfectly satisfactory piece of code. With MSFT having no leverage to migrate those users to a newer version, the company created a massive legacy problem not only for itself but for anyone trying to design a modern website who had to contend with the quirks and limitations of 10+ year old technology.

Downsides to auto-updating break down into several categories:

  • Collateral damage. This is probably the most common complaint about updates gone wrong. There seems to be a paucity of evidence around what percent of Windows updates need to be recalled due to bugs—and MSFT may be understandably reluctant to release that figure— but public instances of updates gone awry are ubiquitous.
  • Downtime. Often updates require restarting the application, if not rebooting the machine altogether. This represents downtime and some loss of productivity, although the impact varies greatly and can be managed with judicious scheduling. Individual machines are rarely used 24/7 and updates scheduled at off-hours can be transparent. On the other hand rebooting the lone server supporting a global organization incurs a heavy cost; there may be no good time for that. (It also points to a design flaw in the IT infrastructure with one machine constituting a single-point-of-failure without redundancy.)
  • Revenue model. If all updates are given away for free, typically required for auto-updating, significant monetization opportunity is lost.  This is a problem for the vendor rather than customers, specific to business models relying on selling discrete software bundles, as opposed to a subscription service along the lines of Creative Cloud. But economics matter. This inconvenient fact lies at the heart of Android security update debacle– with no upside from delivering updates to a phone that has been already paid for, neither OEMs or carriers have slightest interest in shipping security fixes. Usually there is some line drawn between incremental improvements vs significant changes that merit an independent purchase. For example MSFT always shipped service packs free of charge while requiring new licenses for OS releases— until Windows 10, which breaks that pattern by offering a free upgrade for existing Windows 7/8 users.
  • Vendor controlled backdoor. Imagine you have an application running on your machine that calls out to a server in the cloud controlled by a third-party, receives arbitrary instructions and starts doing exactly what was prescribed in those instructions. One might rightly call that a backdoor or remote-access Trojan (RAT).  Auto-update capabilities are effectively no different, only legitimized by virtue of that third-party being some “reputable” software publisher. But security dependencies are not transformed away by magical promises of good behavior: if the vendor experiences an intrusion into their systems, the auto-update channel can now become a vector for targeting anyone running that application. It’s tempting to say that dependency (or leap-of-faith, depending on your perspective) already exists when installing an application written by the vendor. But there is an important difference between a single act trusting the integrity of one application as point-in-time decision, versus ongoing faith that the vendor will be vigilant 24/7 about safeguarding their update channel.

So what is a reasonable stance? There are at least two situations where disabling auto-updates makes sense. (Assuming the vendor actually provided controls for doing that. Many companies including Google in the early-days were a little too enthusiastic about forcing software on users.)

  1. Managed IT environments, or so-called “enterprise” scenario. These organizations have in-house IT departments capable of performing additional quality-assurance on the update before rolling it out to thousands of users. More importantly that QA can use a realistic configuration that mirrors their own deployment, as opposed to the generic or stand-alone testing. For example an update may function just fine on its own, but have a bad interaction with antivirus from vendor X or VPN client Y. Such  combinations can not be exhaustively checked by the original publisher.
  2. Data-centers and server environments. Regardless of the amount of redundancy present in a system, having servers update on their own without admin oversight is recipe for unexpected downtime.

In these situations the benefits of auto-updating outweighed the risks. By contrast that calculus gets inverted in the case of consumers, with the possible exception of power users. Most home users are neither in a position nor have the inclination to verify updates in an isolated environment, short of actually applying the update to one of their machines. The good news is that most end-user devices are not mission-critical either, in the sense that downtime does not inconvenience a large number of other people relying on the same machine. In these situations little is gained by delaying updates. It may buy a little extra insurance against the possibility that the vendor discovers some new defect (based on the experience of other early-adopters) and recall the update. But for critical security updates, that insurance comes at the cost of living with a known security exposure just as the release of a patch starts the clock for reverse-engineering the vulnerability to develop an exploit.

CP

Safenet HSM key-extraction vulnerability (part II)

[continued from part I: Introduction]

Exploit conditions

One question we have not addressed until now is the threat model. Typically before deriving related-keys and HMACing our chosen message, we have authenticate to the HSM. In the case of our Luna G5, that takes place out-of-band with USB tokens and PIN entered on external PIN-entry-device, or PED, attached to the HSM. For CloudHSM it uses a more rudimentary approach involving passwords sent by the client. (Consequently CloudHSM setup can only achieve level-2 security assurance in FIPS 140-2 evaluation criteria while PED-authenticated versions can achieve level-3.) Regardless of the authentication mode, the client must have a logged in session with HSM to use existing keys.. It is enough then for an attacker to compromise the client machine in order to extract keys. That may sound like a high barrier or even tautological- “if your machine is compromised, then your keys are also compromised.” But protecting against that outcome is precisely the reason for using cryptographic hardware in the first place. We offload key management to special-purpose, tamper-resistant HSMs because we do not trust our off-the-shelf PC to sufficiently resist attacks. The assumption is that even if the plain PC were compromised, attackers only have a limited window for using HSM keys and only as long as they retain persistence on the box, where they risk detection. They can not exfiltrate keys to continue using them after their access has been cut off. That property both limits damage and gives defenders time to detect/respond. A key extraction vulnerability such as this breaks that model. With a vulnerable HSM, temporary control over client (or HSM credentials, for that matter) allows permanent access to key outside the HSM.

PKCS #11 object attributes

The vulnerability applies to all symmetric keys, along with elliptic curve private-keys. There is one additional criteria required for exploitation: the key we are trying to extract must permit key-derivation operations. PKCS#11 defines a set of boolean attributes associated with stored objects that describe usage restrictions. In particular CKA_DERIVE determines whether a key can be used for derivation. A meta-attribute CKA_MODIFIABLE determines whether other attributes (but not all of them) can be modified. Accordingly an object that has CKA_DERIVE true or CKA_MODIFIABLE true— which allows arbitrarily changing the former attribute— is vulnerable.

Surprisingly many applications create keys with all of these attributes enabled, even when the operation is not meaningful. For example the Java JSP provider for Safenet creates keys with modifiable attribute set to true, and all possible purposes enabled. If a Bitcoin key were generated using that interface, the result would support not only digital signature- which is the only meaningful operation for Bitcoin keys, as they are used to sign transactions- but also wrap/unwrap, decryption and key derivation. It requires using the low-level PKCS #11 API to correctly configure attributes according to the principle of least-privilege, with only intended operations enabled. In fairness, part of the problem is that the APIs can not express the concept of an “ECDSA key” at generation time. This is obvious for the generic Java cryptography API which uses a generic “EC” type for generating elliptic curve keys. The caller does not specify ahead of time the purpose that key is being generated. Similarly PKCS #11 does not differentiate based on object type but relies on attributes. A given elliptic-curve private key can be used in ECDSA for signing, ECDH key-agreement to derive keys or ECIES for public-key decryption depending on whether corresponding CKA_* attributes are set.

Mitigation

Latest firmware update from Safenet addresses the vulnerability by removing weak key-derivation schemes. This is the more cautious approach. It is preferable to incremental tweaks such as attempting to set a minimum key-length, which would not be effective. For example if the HSM still allowed extract-key-from-key but required a minimum of 16 bytes, one could trivially work around it: prepend (or append) known 15 bytes to an existing key, then extract the first (or last, respectively) 16 bytes. Nominally the derived key is 16 bytes long and satisfies the constraints. In reality all but one byte is known and brute-forcing this key is no more difficult than brute forcing a single byte.

Likewise it is tempting to blame the problem on extract-key-from-key but other bit-flipping and splicing mechanism are equally problematic. All of the weak KDF schemes permit “type-casting” keys between algorithms, allowing attacks against one algorithm to be applied to keys that were originally intended for a different one. For example an arbitrary 16-byte AES can not be brute-forced given state-of-the-art today. But suppose you append/prepend 8 known bytes to create a 3DES key, as Safenet HSMs permit with the concatenate mechanisms. (Side-note: Triple-DES keys are 21 bytes  but they are traditionally represented using 24 bytes with least-significant bit reserved as parity check.) The result is a surprisingly weak key that can be recovered using a meet-in-the-middle attack with the same time complexity as recovering a single-DES key, albeit at the cost of using a significant amount of storage. Similarly XOR and truncation together can be used to recover keys by exploiting an unusual property of HMAC: appending a zero-byte to an HMAC key does not alter its outputs, up to the block size of the hash function. Even XOR alone without any truncation is problematic when applied to 3DES, where related-key attacks against the first and third subkey are feasible.

Workarounds using PKCS#11 attributes

Since the attack relies on using key-derivation mechanisms, the following work-around seems natural for protecting existing keys: set CKA_DERIVE to false which will prevent the key from being used in derivation mechanism and also set CKA_MODIFIABLE to false, making the object “read-only” going forward. This does not work; the CKA_MODIFIDABLE attribute is immutable and determined at time of key generation. If the key was not generated with proper set of attributes, it can not be protected after the fact. But there is a slightly more complicated work-around that uses object cloning. While a modifiable object can not be “fused” into a read-only object, it is possible to duplicate it and assign new attributes to the clone. This is the one opportunity for changing CKA_MODIFIABLE attribute to false. (Incidentally the transition in the opposite direction is disallowed: it is not possible to make a modifiable clone of an object that started out being immutable.) That creates a viable work-around: duplicate all objects and set modifiable/derive attributes to false in the new copy, delete the original. Applications may have to be reconfigured to use the new copy, which will have a different numeric handle, but could retain same label as original, if keys were being looked-up by name.

One limitation of this approach is that some secrets are intended for key-derivation. For example that secp256k1 private-key could have been used for ECDH key-agreement. That operation happens to be considered “key-derivation” according PKCS#11. That means CKA_DERIVE can not be set to false without rendering the key unusable. Per-object policy does not distinguish between derivation mechanisms at a granular level.

FIPS to the rescue?

Safenet HSMs have an option to be configured in “strict-FIPS” mode. This setting is defined by administrator at HSM-level and disables certain weak algorithms. At first we were hopeful this could be the one time where FIPS demonstrably improves security by outright mitigating a vulnerability. That turns out not to be the case. Even though the documentation states that weak algorithms are “disallowed” in FIPS mode, the restrictions only come into play when using keys. For example HSM will still generate a single DES key in strict-FIPS mode; but it will refuse to perform single-DES encryption. As for the problematic key-derivation mechanisms at the heart of this vulnerability: they are still permitted, as is HMAC using very short secrets.

Even if strict-mode FIPS worked as expected, it is not practical for existing users. Switching FIPS policy is a destructive operation; all existing keys are deleted. Instead a more indirect operation is required: backup all keys to a dedicated backup device, switch FIPS setting and restore from the backup or another HSM. After all that trouble any defensive gains would still be short-lived: nothing prevents switching the FIPS mode back and restoring from backups again.

Residual risks: cloning

The same problem with backup-and-restore also applies to cloning. Safenet defines a proprietary replication protocol to copy keys from one unit to another, as long as they share certain configurations:

  • Both HSMs must have same authentication mode: eg password-authenticated (FIPS 140-2 level 2) or  PED-authenticated (FIPS 140-2 level 3)
  • Both HSMs must be configured with the same cloning domain. This is an independent password or set of PED keys, distinct from “crypto-officer” or “crypto-user” credential required to use existing keys.

Strangely cloning works even when source/target HSM have different FIPS settings- it is possible to clone from an HSM in strict FIPS mode to one that is not. More surprisingly, it also works across HSMs with different firmware versions. So there is still an attack here: clone all keys from a fully-patched HSM to a vulnerable unit controlled by the attacker. Weak key-derivation algorithms will be enabled (on purpose)  in this latter unit, allowing the attack to be carried out.

How serious is this risk? Cloning requires exactly the same access as working with existing keys in the HSM: for the USB connected Luna G5, that is a USB connection. For the SA7000 as featured in AWS CloudHSM, it can be done remotely over the network. In other words an attacker who compromises a machine authorized to use the HSM, they get this access for free. The catch is that an additional credential is required, namely the cloning domain. Unlike standard “user” credentials necessary to operate the HSM, cloning-domain is not used under normal operation, only when initializing HSMs. Compromising a machine that is authorized to access the HSM guarantees compromise of the user role (or “partition owner” role in Safenet terminology.) But it does not guarantee that cloning-domain credentials can be obtained from the same box, unless the operators were being sloppy in reusing same passphrase.

CP

On Safenet HSM key-extraction vulnerability CVE-2015-5464 (part I)

This series of posts is provides a more in-depth explanation of the key-extraction vulnerability we discovered and reported to Safenet, designated as CVE-2015-5464.

PKCS11

Safenet HSMs are closely based on the PKCS#11 specification. This is a de facto standard designed to promote interoperability between cryptographic hardware by providing a consistent software interface. Imagine how difficult it would be to write a cryptographic application such as Bitcoin wallet to work with external hardware if each device required a different API for signing a Bitcoin transaction. Certainly at low-level differences between devices are apparent: some connect over USB while others are addressed over TCP/IP, each device typically requires different device driver much like brands of printers do. Instead PKCS11 seeks to provide a higher-level point where these differences can be abstracted behind a unified API, with a vendor-provided PKCS#11 module translating each function into the appropriate commands native to that brand of hardware.

PKCS#11 is a very complex standard with dozens of APIs and wide-range of cryptographic operations, called “mechanisms” for everything from encryption to random number generation. Safenet vulnerability involves the key derivation mechanisms. These are used to create a cryptographic key as a function of another key. For example BIP-32 for Bitcoin proposes the notion of hierarchical-deterministic wallets where a family of Bitcoin addresses are derived from a single “seed” secret. Designed properly, key-derivation provides such an amplification effect while protecting the primary secret.Even if a derived key is compromised, the damage is limited. One can not work their way back to the seed. But when designed improperly, the derived key has a simple relationship to the original secret and leaks information about it.

Some options are better left unimplemented

That turns out to be the problem with several of the key-derivation mechanisms defined in PKCS#11 and implemented by Safenet. (To give a flavor of what is supported, here is the list of options presented by the demonstration utility ckdemo shipped as part of Safenet client.) Many of these are sound. A few are problematic, with varying consequences. For example the ability to toggle secret-key bits using XOR and perform operations with the result leads to exploitable conditions for certain algorithms.

Related-key cryptanalysis is the specific branch specializing in these attacks. It turns out that for Safenet HSMs, we do not need to dig very deep into cryptanalytic results. There are at least two mechanisms that are easy to exploit and work generically against a wide-class of algorithms: extract-key-from-key and XOR-base-and-data.

Slicing-and-dicing secrets

Extract-key-from-key is defined in section 6.27.7 of PKCS#11 standard version 2.30. It may as well have been renamed “extract-substring” as the analog of standard operation on strings. This derivation scheme creates a new key by taking a contiguous sequence of bits at desired offset and length from an existing key. Here is an example of this in action with ckdemo utility provided by Safenet.

We start out with an existing 256-bit AES key with handle #37. Here are its PKCS #11 attributes:

PKCS #11 attributes of original AES key

PKCS #11 attributes of original AES key

Note CKA_VALUE_LEN attribute is 0x20 in hex, corresponding to 32 bytes as expected for 256-bit AES. Because the object is sensitive, those bytes comprising the key can not be displayed. But we can use key-derivation mechanism to extract a two-byte subkey from the original. We pick extract-key-from-key mechanism, start at the most-significant bit (ckdemo starts indexing bit-positions at 1 instead of 0) and extract 2 bytes:

Using extract-key-from-key derivation

Using extract-key-from-key derivation

Now we look at attributes of the derived key. In particular note that its length is reported as 2 bytes:

PKCS #11 attributes of derived key

PKCS #11 attributes of derived key

So what can we do with this resulting two-byte key, which is not going to be very difficult to brute-force? Safenet supports HMAC with arbitrary sized keys so we can HMAC a chosen message:

HMAC chosen message using derived key

HMAC chosen message using derived key

Given this primitive, the attack is straightforward: brute-force the short key by trying all possibilities against known message/HMAC pairs. In this case we get 0x5CD3 since:

$ echo -n ChosenMessage | openssl dgst -sha256 -hmac `echo -en "\x5c\xd3"` 
(stdin)= 1db249f0e928b3aeff345aedaa3365ea690f06f3710433fc4a063b4cfffbe930

That corresponds to the two most-significant bytes of the original key. Now we can iterate: derive another short-key at different offset (say bits 17 through 32), brute-force that using a chosen message attack, repeat until all key bytes are recovered. Fully automated, this requires a couple of seconds with Luna G5, much less time with the more powerful SA7000 used in CloudHSM. Main trade-off is available computing power to brute-force key fragments offline. Given more resources, larger fragments of multiple contiguous bytes can be recovered at a time, necessitating fewer key derivation and HMAC operations. (Also since we have a chosen-plaintext attack with HMAC input that we control, there are time-space tradeoffs to speed up key recovery by building look-up tables ahead of time.)

Surprisingly this works not only against symmetric keys such as AES or generic HMAC secrets but also against elliptic-curve private keys (RSA, plain DSA and Diffie-Hellman were not affected.) This is an implementation quirk: these mechanisms are typically intended for symmetric-keys only. For elliptic-curve keys, the byte array being truncated is the secret scalar part of the key. For example the “secret” component for a Bitcoin ECDSA key is a discrete logarithm in secp256k1. Internally that discrete logarithms is just stored as 32-byte scalar value, and extract-key-from-key can be used to successively reveal chunks of that scalar value.

XOR-base-and-data suffers from a very similar problem. This operation derives a new key by XORing user-chosen data with original secret key. While there are cryptographic attacks exploiting that against specific algorithms such as 3DES, a design choice made by Safenet leads to simpler key recovery attack that works identically against any algorithm: when the size of data is less than size of the key, result is truncated to data size. XORing 256-bit AES key with one-byte data results in one-byte output. That provides another avenue for recovering a key incrementally: we derive new HMAC key by XORing with successively longer sequences of zero bytes, with only the last segment of new key left to brute-force at each step.

[continued in part II: exploit conditions, workarounds and mitigations]

CP