When defense-in-depth failed: Heartbleed

A good benchmark of foresight in security is not avoiding vulnerabilities but having defense-in-depth measures in place to minimize the impact from unforeseen flaws. (Or failing that, one can quibble over the consolation prize of who was fastest to respond and deploy patches.) In the recent case of the openssl Heartbleed vulnerability, such success stories were mostly absent. Few seemed to speak up and shrug-off the vulnerability by saying, in effect, that they were immune because of some brilliant feature implemented years ago as a precaution, without an inkling of specific details around the exact vulnerability it would insure against one day. Given how pervasive openssl is not just on web servers, but even traditional client-side software including mobile operating systems, most of the chuckling seemed to come from the Microsoft camp, continuing their “hat trick” of having steered clear of three epic SSL bugs in 2014: “goto fail” compliments of Apple, GnuTLS certificate validation and now Heartbleed.

Everyone else appeared to be in the same boat: catastrophic failure, caused by a straightforward implementation bug. Nothing fancy, no subtle ROP/ASLR-bypass techniques required to exploit it, completely platform independent, no massive fuzzing runs required to spot the bug in the first place: in fact it was so “shallow” from source-code that two different groups found it almost simultaneously.

“Show me the exploit”

It did not take long before the armchair-philosophizing started around the exact nature of what could be done. There was no question that the vulnerability is exploitable trivially to extract chunks of memory from the affected system. (Thankfully avoiding a throwback to 1990s-style “that vulnerability is purely theoretical” argument.) Naturally speculation moved to the second-order effects: what exactly could be in those regions of memory graciously returned by openssl to anyone who asked. That sensitive information could be spilled was proved relatively easily. Typical of most popular security mistakes, Yahoo! reprised its usual role with a dramatic demonstration of how usernames and clear-text passwords for other people could be recovered from their servers. While damaging and no doubt a big deal for the users involved, these disclosures were relatively contained in scope. A lot more uncertainty lingered around whether private-keys for the SSL server could be extracted using Heartbleed.

Cloudflare

This is where some initially put their hopes in “accidental-mitigations,” fortunate properties of the heap manager that just might protect private-keys because of their position in memory relative to leaked regions. This is what CloudFlare initially argued, complete with pretty pictures of heap layouts to visualize the location of where keys are loaded in memory and how far they are from the regions allocated for buffers. For a change CloudFlare was also willing to put its money where its idle speculations. The company created a challenge website [note: certificate revoked] running the vulnerable nginx/openssl version, inviting anyone to attempt extraction of SSL keys. That did not take very long. In less than a day multiple users had extracted the supposedly protected private-key. (Only much later it became clear why reality did not agree with CloudFlare’s theories. At least one code-path in openssl creates a temporary copy of a prime factor of the modulus– sufficient to factor it and recover private keys– and later frees it, returning that copy to the heap without zeroing it out. Interestingly this is also a poignant reminder of why it is important to properly clear out sensitive data from memory when it is no longer needed. openssl had a mechanism in place for doing precisely that, which happens to be skipped in this case.)

Akamai

A more interesting argument was advanced by the content-distribution network Akamai. [Full disclosure: this blogger’s current employer is an Akamai customer] Akamai claimed that due to a custom memory-allocation scheme employed in their modified version of openssl, private-keys would be protected from Heartbleed attacks. Unlike the Cloudflare argument based on heap layouts, Akamai was not claiming this was a matter of luck. Instead we were told, it was a deliberate, intentional defense-in-depth feature intended to protect cryptographic keys:

We replace the OPENSSL_malloc call with our own secure_malloc. Our edge server software sets up two heaps. Whenever we’re allocating memory to hold an SSL private key, we use the special “secure heap.” When we’re allocating memory to process any other data, we use the “normal heap.”

The story may have ended there as that rare success story of security-by-design in the field… until Akamai decided to contribute this patch to the open-source project. Once other people started looking at the code, it became obvious that the modification did not work. Worse the culprit was an amateurish mistake, demonstrating a complete failure to understand how RSA private-key operations are implemented using the Chinese-remainder theorem (CRT) representation of keys. Akamai patch protected only some of the CRT components but not others which are still sufficient to recover the key. It did not help that aside from failing to achieve the stated security objective, the patch contained implementation errors and required multiple iterations to get working.

Key-isolation: old is new again

After the dust settled and benefit of hindsight kicked-in, much better ideas for improving the current situation were put forward. One example is emulating the privilege separation in SSH: create an out-of-process key agent that holds the private keys and exposes a “decryption oracle” without processing hostile input directly from incoming connections.

Strangely missing in action were vendors of hardware-security modules or HSMs. In one sense there is no need to invent any novel technology to better protect cryptographic keys. There is a tried-and-true approach commercially available since at least as far back as 1990: store keys in dedicated crypto hardware, never exposing them directly to the main application. SSL has been one of the markets targeted by HSM vendors, often rebranding their products as “SSL accelerators.” These early attempts emphasized performance instead of security, trying to capitalize on perceived costs of SSL when server CPUs struggled to keep up with RSA private-key operations. But they never made much inroads into that market, instead remaining confined to special purpose applications such as banking or certificate authorities. Heartbleed would have been the perfect backdrop for a customer case-study, highlighting a website that deployed HSMs and can now assert confidently that their private keys were not compromised– and unlike Akamai, this mythical website would have been correct.

HCE vs embedded secure element: taking Android out of the TCB (part IV)

[continued from part III]

Continuing the security comparison between HCE and secure-element based NFC applications, this post expands on an earlier theme around attack-surface, focusing on one massive piece of vulnerable code: the Android operating system itself.

When root means game-over

Consider the following hypothetical: is it possible to build an EMV payment application that runs on Android and yet can resist attacks from a remote adversary who is running arbitrary code with root privileges? At first this seems theoretically impossible. Payments are authorized based on the possession of a cryptographic key only known to the legitimate card holder. The assumption is if you have possession of the key, by executing a protocol specified by EMV that calls for proving possession of that key without revealing it– you are authorized to spend the funds associated with that account. If our hypothetical adversary has achieved root privileges on the device, they have full access to memory and storage of every application running on that device. That includes the payment application wielding this secret. Even if that secret was protected by a PIN/passphrase (let’s suspend disbelief and assume users pick high-entropy secrets immune to guessing) that PIN/passphrase will be input via phone UI such as an on-screen keyboard at some point. Since we posit that our adversary is running with root privileges, she can watch this interaction and learn the passphrase. Alternatively she can simply wait until after the secret has been fully decrypted locally– necessary for the device to be able to perform the type of computations envisioned by EMV payment protocols– and capture it at that point. No secret managed completely by an Android application can be immune from an attacker with root privileges on that Android device. It’s only a matter of knowing when/where to look for that secret in the target application’s address space; this is security through obscurity at best.

SE difference

Unless of course, the secret is not directly available to the mobile application either. Enter hardware secure elements. If secrets such as cryptographic keys are stored in the secure element and never available in the clear to the mobile application, even root-level compromise of Android does not help our adversary. The embedded secure element is its own mini-computer, with separate RAM, storage and embedded operating system with its own concept of access control. Being root on Android is meaningless to SE: it grants no special privileges. In much the same way that an HSM can isolate cryptographic keys by offering an interface to perform operations with the key without revealing the raw bits of the key, SE keeps Android at arms-length from secret material.

Safe provisioning

The astute reader might ask how these cryptographic secrets that enable payments reached the secure element in the first place. It is common for provisioning to take place over-the-air, as opposed to the much more impractical alternative of requiring physical presence, where the user must walk into a bank branch and present their phone. That implies that some bits are pushed over the network to the mobile application which relays them to the secure element. (Recall that the only direct network interface SE has access to is NFC which is only good over short distances. All other communication such as accessing the card-issuer API endpoint over the Internet must be pass through the host device.)

Does that create an opportunity for malware running with root privileges to capture the secret, for ever so briefly when it is passing through the Android application? Not if the provisioning system is properly designed. Recall that secure elements compliant with Global Platform standard are “personalized” with unique keys (colloquially known as card-manager keys) required to perform administrative operations. These keys are not known to the owner of the phone or even the operating system on the mobile device; they are managed by a third-party called “trusted services-manager” or TSM. When TSM is installing a payment application and configuring that application with its own set of secrets, the commands sent to the SE for performing those steps are encrypted. Global Platform specification defines a “secure messaging” protocol describing the exact steps for establishing an authenticated & encrypted link between TSM and secure element. This protocol, or more accurately series of protocols since there are multiple variants, is designed to ensure that even when TSM and SE are not in physical proximity, as in the case of provisioning a payment application over the Internet– the issuer can be assured that sensitive information such as payment keys are delivered only to the designated SE and not recoverable by anyone else. While it is true that the commands sent by TSM are visible to the host operating system, where they can be intercepted or even modified by our hypothetical adversary who has attained root privileges, the secure messaging protocol ensures that no useful information is learned by mounting such an attack.

Threat models: ideal vs as-implemented

So this is another fundamental difference between applications implemented in SE versus those relying on host-card emulation. For HCE applications, Android operating system is part of the trusted computing-base by necessity. Any vulnerability that allows defeating the isolation between different apps– such as local privilege escalation to root– is game-over for scenarios using host-card emulation without a hardware secure element. By contrast, SE based applications can in principle survive attacks in this very stringent threat model, where we posit a remote adversary (eg one who does not have physical possession of the phone but is running her choice of code on the device) has attained root privileges.

“In principle” is the operative phrase. While Global Platform protects secrets en route the secure element, there is a preceding step not covered by GP: authenticating the card-holder. Suppose the consumer is asked to enter the username and password for their online bank account into the Android application to verify their identity and request provisioning from that bank. That design is blatantly vulnerable to local malware on the device. Those bank account details could be captured and relayed to a remote attacker, who proceeds to repeat the exact same process on a device she controls, with the payment credentials being delivered to her secure element. She could even preempt the legitimate user, by provisioning first and instructing the malware to prevent the original card-holder from completing the task, lest the bank notice the anomaly of two different devices requesting cards in quick succession. Some form of out-of-band authentication is required to thwart this attack, such as logging into the bank website from a different machine to initiate the request. (Ironically calling the bank and answering questions will not work since the audio-stream from the phone is also available to the attacker.) But even in the basic scenario with weak provisioning, SE provides a useful guarantee: if the device is compromised by malware after payment credentials are provisioned, the credentials are safe.

There is one more subtle attack that does not rely on trying to extract secrets from the local device. Suppose the remote attacker with root privileges abandons their effort to extract secrets from SE but instead attempts to use the SE, exactly as it would have been invoked for a payment transaction? The next and final post in this series will look at how the integration of SE with NFC controller can frustrate such relay attacks, in stark contract from HCE applications which are necessarily vulnerable.

[continued]

Lessons from LinkedIn Intro for product security teams

That was quick: the controversial LinkedIn Intro feature– which effectively performed a man-in-the-middle attack against user email on iOS devices– has been scrapped unceremoniously less than three months after its introduction. It was a quiet farewell to a feature announced with great fanfare in a blog-post trumpeting the impressive engineering feats involved: install LinkedIn as email proxy to intercept and capture user email, route all incoming messages to LinkedIn servers in the cloud all for the purpose of annotating them with LinkedIn profile information about recipients and deliver it back into the inbox. Amidst the predictable chorus of I-told-you-so and schadenfreude is a question that is rarely raised: Why did this feature ship at all in the first place? More importantly from the perspective of security professionals: what was the involvement of LinkedIn security team in its conception, design and implementation? What lessons can we draw from this debacle for avoiding reputational risk with ill-conceived features damaging user trust?

Much of what follows is qualified by the caveat that this blogger is not privy to the internal deliberations that may have taken place at LinkedIn around Intro. For that reason we can not distinguish between whether this is a failure in the process for security assurance (as such, entirely avoidable) or simply well thought-out business decision that simply proved incorrect hindsight.

Moving security upstream

In companies with an immature approach to information security, the bulk of the activity around managing risks appear towards the end of the product cycle. At that stage the product requirements have been already cast in stone, technical architecture well-defined, specifications written– to the extent that anyone writes specs in this day and age of agile methodologies– and even the bulk of the implementation is completed. Into this scene steps in our security engineer at the eleventh hour to conduct some type of sanity-check. It may range from a cursory high-level threat model to low-level code review or black-box testing. That work can be extremely valuable depending on the amount of time/skills invested in uncovering vulnerabilities. Yet the bigger problem remains: much of the security battle has already been won or lost based on decisions made long before a single line of code is written. Such decisions are not expressed in localized code chunks that can be spotted by the security reviewer scrutinizing the final implementation of a finished product. Their implications cut across feature lines.

Beyond whack-a-mole

Consider the way web browsers were designed before IE7 and Chrome: massively complicated functionality– HTML rendering and JavaScript execution– subject to attacks by any website the user cared to visit, contained in a single process running with the full privileges of the user. If there was a memory corruption vulnerability anywhere in those millions of lines of code (guaranteed to be present statistically speaking), the attacker exploiting that gets full control of the user’s account– and often the entire machine, since everyone was running with administrator privileges anyway. There are two approaches to solving this problem. First one is what MSFT historically tried: throwing more security review time at the problem, trying to find one more buffer-overrun. While this is not entirely a game of playing whack-a-mole– a downward trend in bugs introduced vs. found can be achieves with enough resources– it does not solve the fundamental problem: memory corruption bugs are very likely to exist in complex software implemented in low-level languages. At some point diminishing returns kicks-in.

The major security innovation in IE8 and Chrome was the introduction of sandboxing: fundamentally changing the design of the web-browser to run the dangerous code in low-privileged processes, adding defense-in-depth to contain the fallout from successful exploitation of a vulnerability that somehow eluded all attempts to uncover it before shipping. This is a fundamental change to the design and architecture of the web browser. Unlike fixing the yet-another-buffer-overrun problem, it was not “discovered” by a security researcher staring at the code or running an automated fuzzer. Nor is it implemented by making a small, local change in the code base. It calls for significant modifications across the board– pieces of functionality that used to be reside locally in memory, now are managed by another process, subject to security restrictions around access. Because of far-reaching consequences, security features such as sandboxing must be carefully accounted for in the product plans from the beginning, as part of a fundamental design criteria (or even more difficult re-design exercise, in case of Internet Explorer, saddled with a legacy 10-year old architecture.)

Security and development lifecycle

Introduction of sandboxing in web browsers is one example of the more general pattern that security assurance has greater impact as it moves upstream in the development life-cycle. Instead of coming into evaluate and find flaws in an almost-finished product, security professionals participate in architecture and design phases. This also reduces costs for fixing issues: not only are some flaws easier to spot at the design level, they are also much cheaper to fix before any time has been wasted on writing code. There is no reason to stop at product design either: even before a project is formally kicked-off, security team can provide ad hoc consulting, give presentations on common security defects or develop reference implementations of core functionality such as authentication libraries to be incorporated into upcoming products. But the ultimate sign of maturity in risk-management is when security professional have a voice at the table when deciding what to ship. Incidentally having a vote is not the same as having veto power; risk management for information security is one of many perspectives for the business. A healthy organization provides an avenue for the security team to flag dubious product ideas early in the process before they gain momentum and acquire the psychological attachment of sunk-costs.

Whither LinkedIn?

Returning to the problem of LinkedIn Intro, the two questions are:

Does LinkedIn culture give the security team an opportunity for voicing objections to reckless product ideas before they are fully baked?
Assuming the answer is yes, did the team try– if unsuccessfully– to kill Intro in an effort to save the company from much public ridicule and embarrassment down the line? If yes, the security assurance process has worked, the team has done its job and the failure rests with decision-makers who overruled their objections and green-lighted this boondoggle.

We do not know the answer. The public record only reflects that the LinkedIn security team attempted to defend Intro, even going so far as to point out that it was reviewed by iSEC Partners. (Curiously, iSec itself did not step forward to defend their client and their brilliant product idea.) But that does not necessarily imply unequivocal support. This would not be the first time that a security team is put in the awkward position of publicly trying to defend a questionable design they unsuccessfully lobbied to change before.

HCE vs embedded secure element: attack surface (part III)

In this post we continue the security comparison of mobile NFC scenarios using dedicated hardware (such as UICC or embedded SE on Android) against the same use-case implemented on vanilla Android application with host card-emulation. In both cases, there is sensitive information entrusted to the application such as credentials required to complete an EMV transaction. This time we will focus on attack surface and amount of code exposure these sensitive secrets have against software attacks. Borrowing the definition from MSDN:

The attack surface of an app is the union of code, interfaces, services, protocols, and practices available to all users, with a strong focus on what is accessible to unauthenticated users.

Here again it is clear that a mobile application is at a distinct disadvantage because the application and the platform it is built on-top is feature rich with connectivity and sharing functionality. Looking at the typical inputs an Android application may process:

Inputs from the UI– relevant when the phone lands into the hands of an attacker who can now press any button or otherwise try to bypass some access control
Network connectivity. At a minimum the application will need Internet access to provision payment instruments and interface with a back-end system for retrieving activity data.
Inter-process communication mechanisms from other applications installed on the same device. In Android this means intents, listeners etc. that the application responds to.
Filesystem. If the application relies on any data stored on disk, it must
NFC, inevitably for an application relying on NFC as transport.

By contrast the typical secure element has only 2 interfaces:

Wired or contact. For an ordinary smart card or SIM, this would be the physical connection attached to the brass plate on the surface, which receives power and signals form the reader. In the case of secure elements on mobile devices, these are typically hard-wired to the host device. When the SE is removable– for example with UICC or micro-SD based designs– it is possible to use an external card-reader to access this interface.
Contactless or NFC, same as before.

While these are the only direct input paths, in many cases there is indirect exposure to other channels. For example there is often a companion user-mode application running on the host Android OS, helping load new application on the SE and configuring them. Such provisioning is often done over-the-air with the Android app simply shuttling data back-and-forth between its network connection and the contact interface of the SE. This could be straight through the base-band processor, bypassing the main Android OS or a standard TCP/IP connection. Either way an indirect channel exists where inputs from a potentially hostile network reach the secure element.

This is where a different property of the SE comes in: it is a locked-down environment, where routine operations– such as installing applications or even listing the existing ones– requires authentication with keys that are commonly not available to the owner of the phone. (They are instead held by the trusted-services manager or TSM entity responsible for card management.) This stands in contrast with an ordinary iPhone or Android where the user is free to install applications. Just in case the user makes a bad decision, each app has to worry about other malicious apps trying to subvert their security model and access private data associated with that application. By contrast code running on SE is tightly controlled and much of the complex functionality for managing the introduction of new code is not even accessible to ordinary attackers without possession of diversified keys unique to that particular SE.

Similarly in common secure element designs based on Global Platform, there is no concept of an all-powerful root/administrator with full access to the system. For example even the TSM armed with necessary keys can not read out the contents of EEPROM at will. This means that once an application is installed and secret material provisioned to that application (such as cryptographic keys for EMV chip & PIN payments) even the TSM has no way to read back those secrets from the card if the application does not allow it. One could imagine that TSM can try a different tactic: replace the legitimate application with a back-doored version designed to extract the secrets previously provisioned. But there is no concept of “upgrade” in Global Platform: one can only delete an existing application instance– which also removes all of its associated data including the secrets– and install a new one. Barring vulnerabilities in the platform responsible for isolating multiple apps from each other (eg Java Card applet firewall) provisioned secrets are forward-secure. By contrast the author of ordinary mobile apps always have the option of upgrading their code in-place without losing data, creating another attack vector when code-signing process is subverted.

A similar risk applies on larger scale for the mobile operating system. Google effectively has root on Android devices, similarly for Apple on iOS and MSFT on Windows Phone. Often times the handset manufacturer and even wireless carrier subsidizing the device add their own apps running with system privileges, including remote update capabilities. These parties are the closest to a TSM incarnation (minus the “trusted” part) except they can snoop on any secret managed by any application on the device. Again the threat here is not that Google, Samsung or Verizon might go rogue, shipping malicious updates designed to steal information. The risk is that the platform includes this capability by-design, creating additional attack surface that others can attempt to exploit.

Finally there is the matter of sheer code-size: SE resources are highly limited, with EEPROM/flash storage on the order of 100KB– that includes code and data for all applications. That does not leave a lot of room for frivolous functionality, rarely used options or speculative features. This is a case where resource constraints help security by reducing total amount of code, and indirectly, opportunities for introducing bugs.