HCE vs embedded secure element: tamper resistance (part II)

Tamper-resistance

Tamper resistance refers to the ability of a system to resist attacks against its physical incarnation when in the hands of an attacker. From the perspective of the threat model, a key point is that we assume an adversary has gained physical access to the gadget. Duration of access and final condition of the device may vary depending on attack scenario:

  • Temporary vs. permanent. In the first case, the user may have temporarily left their device unattended in a hotel room, giving the attacker an opportunity to extract cryptographic keys or implant a backdoor. But the time allotted for accomplishing that task is limited. The adversary must ultimately return the device– or its functional clone– to avoid raising suspicion. In the second case, the device may have been lost or otherwise captured with no intention of being returned to its rightful owner, granting the attacker unlimited time to work.
  • Destructive vs. stealthy: This is mainly a concern for attackers who want to avoid detection, when returning the device in a completely mangled or damaged state will not do. This may either limit the range of attacks possible or it may require an elaborate substitution scheme following a successful attack. For example, if the goal is extracting secret keys from a smart-card, it may be acceptable to destroy the card in the process, as long as an identical-looking card can be created to behave functionally when it is provisioned with  same secret keys extracted out of the original. The user will not be any wiser.

Lasers and spectrometers

Attacks are commonly grouped into three categories, using the terminology from Tunstall et al:

  1. Non-invasive. These approaches attempt to extract information without modifying the card. For example simply measuring the time taken to complete various operations with a high-precision timer or introducing faults by varying the power-supply (without frying the chip) do not cause permanent damage to the card but may result in disclosure of information that was supposed to be contained in the card.
  2. Semi-invasive. Moving one step closer to the hardware, in this category are attacks that require the surface of the chip to be exposed. This includes passively monitoring electro-magnetic emanations from the chip as well as introducing temporary glitches, for example using precisely aimed laser pulses.
  3. Invasive. In the final group are attacks directly targeting the internal architecture of the chip, using sophisticated lab equipment. For example it may involve placing probes to monitor bus lines or even creating new wiring that alters the logic.

Comparison to off-the-shelf mobile hardware

Standard mobile device architecture does not even attempt at resisting physical attacks. For example reading any secrets in storage is as easy as removing the flash and using a standard connector to mount the same disk from another device. (Incidentally, disk encryption does little to hinder data recovery: Android disk encryption keys are derived from a user-chosen secret subject to brute-forcing.**) By contrast extracting data stored in the EEPROM or flash storage of a modern secure element– while not impossible by any means– requires significantly more work and dedicated laboratory equipment.

Similarly there is no attempt to reduce side-channel emissions on a standard ARM CPU such as shielding to reduce emanations or hide power consumption patterns. As simple experiments demonstrate, there is no reason to zap anything with lasers or carry out elaborate attacks to reverse engineer the circuitry: an ARM processor radiates so much in the way of EF radiation that meaningful signals can be picked up without even opening the case to reveal information about cryptographic operations. By contrast resistance to physical attacks are core part of the Common Criteria (CC) and FIPS 140-2 verification standards around cryptographic hardware. For example the SmartMX embedded secure element present in most NFC-enabled Android devices is part of a family of processors with EAL5+ level assurance according to CC.

Bottom line: for the purposes of resisting hardware attacks when the device lands in the hands of an attacker, there is no contest between an application implemented storing secrets on the main Android systems– such as a payment application using HCE– versus one implemented on dedicated cryptographic hardware such as the embedded secure element.

[continued]

CP

**  Android made matters worse with a design blunder: it forces the disk-encryption secret to be identical to the screen-unlock one. In other words that pattern/PIN/passphrase used to unlock the screen is the only secret input to a “slow” key-generation scheme that outputs the disk encryption key. Because unlocking the screen is a very common operation, this all but guarantees that a low-entropy, easily brute-forced secret will be used. This may have been a usability trade-off based on the assumption that asking users to juggle two different secrets– one only entered during boot and one used frequently for screen unlock– is too much.

Goto fail and more subtle ways to mismanage vulnerability response

As security professionals we are often guilty of focusing single-mindedly mindedly on one aspect of risk management, namely preventing vulnerabilities, to the exclusion of others: detection and response. This bias seems to have dominated discussion of the recent “goto fail” debacle in iOS/OS X and its wildly improbable close-cousin in GnuTLS. Apple has been roundly criticized and mocked for this self-explanatory flaw in SecureTransport, its homebrew SSL/TLS implementation. The bug voided all security guarantees the SSL/TLS protocol provides, rendering supposedly “protected” communications vulnerable to eavesdropping.

But much of the conversation and unofficial attempts at post-mortems (true to its secretive nature, Apple never published an official explanation, but conveniently created a well-timed distraction in the form of a whitepaper touting iOS security) focused on the low-level implementation details as root cause. Why is anyone using goto statements in this day-and-age, when the venerable Edsger Dijsktra declared way back in 1968 that they ought to be considered harmful? Why did they not adopt a coding convention requiring braces around all if/else conditionals? How could any intelligent compiler not flag the remainder of the function as unreachable code when the spurious goto statement was causing?** Why was the duplicate line missed in code reviews when it stands out blatantly in the delta? Did Apple not have a good change-control system for introducing code changes? Speaking of sane software engineering practices, how is it possible that code-flow jumps to a point labelled “fail” and yet still returns  success, misleading callers into believing that the function completed successfully? To step back one more level, why did Apple decide to maintain its own SSL/TLS implementation instead of leveraging open-source libraries such as NSS or openssl which have benefited from years of collective improvement and cryptographic expertise that Apple does not have in-house?

All good questions, partly motivated by a righteous indignation that such a catastrophic bug could be hiding in plain sight. But what about the aftermath? Once we accept the premise that a critical vulnerability exists, the focus shifts to response. Putting aside questions around why the flaw existed in the first place, let’s ask how well Apple handled its resolution.

  • There was no prior announcement that an important update was about to be released. Compare this to the advance warning MSFT provides for upcoming bulletins.
  • A passing mention in the release notes about the vulnerability, with an ominous statement to the effect that “an attacker with a privileged network position may capture or modify data in sessions protected by SSL/TLS.” Not a word about the critical nature of the flaw or a pleas for users to upgrade urgently. One would imagine that an implementation error that defeats SSL– the most widely deployed protocol for protecting communications on the Internet– and allows eavesdropping on millions of users’ traffic would hit a raw nerve in this post-Snowden world of  global surveillance. Compare Apple’s nonchalance and brevity to the level of detail in a past critical security update from Debian or even routine MSFT bulletins released every month.
  • The update was released on a Friday afternoon Pacific-time. This is the end of the work-week in Northern America, and well into the weekend in Europe. Due to lack of upfront disclosure by Apple, the exact nature of the vulnerability was not reverse-engineered publicly until several hours later. That is suboptimal timing to say the least for dropping a critical fix, especially in a managed enterprise IT environment with a large Mac fleet and security team tasked with trying to ensure that all employees upgrade their devices. (Granted, Apple never seems to have cared much for the enterprise market, as evidenced by weak support for centralized management compared to Windows or even Linux with third-party solutions.)
  • The update addressed the vulnerability only for iOS, leaving Mavericks, the latest and greatest desktop operating system vulnerable. In other words, Apple 0-dayed its own desktop/laptop users with an incomplete update aimed at mobile users. Why? At least three possibilities come to mind.
    1. Internal disconnect: Apple may not have realized the exact same bug existed in the OS X code base– but this is a stretch, given the extent of code sharing between them.
    2. Optimism/naiveté: Perhaps they were aware of the cross-platform nature of the vulnerability but assumed nobody would figure out exactly what had been fixed, giving Apple a leisurely time-frame to prepare an OS X update before the issue poses a risk to users. To anyone familiar with shrinking time-windows between patch release and exploit development, this is delusional thinking. There is 10 years worth of research on reverse-engineering vulnerabilities from patches, even when the vendor remains silent on details of the vulnerability or even existence of any vulnerabilities in the first place.
    3. Deliberate risk-taking / cost-minimization: The final possibility is Apple did not care or prioritized mobile platforms over traditional laptop & desktops. Some speculated that Apple was already planning to release an update to Mavericks incorporating this fix and saw no reason to rush an out-of-band patch. (Compare this to approach MSFT has taken towards critical vulnerabilities. When there is evidence of ongoing or imminent exploitation in the wild, the company has departed from the monthly cycle to deliver updates immediately as with MS13-008.)
  • No explanation after the fact about the root-cause of the vulnerability or steps taken to reduce chances of similar mistakes in the future. This is perhaps the most damning part. The improbable nature of the bug– one line of code mysteriously duplicated, looking so obviously incorrect on even the most cursory review– fueled much speculation and conspiracy theories around whether it had been a deliberate attempt to introduce a backdoor into Apple products. Companies are understandably reluctant to release internal postmortem out of fear that they may reveal proprietary information or portray individual employees in an unflattering light. But in this case even an official blog post summarizing the results of an investigation could have sufficed to quell  wild theories.

Coincidentally the same Friday this bug was exposed, this blogger gave a presentation at Airbnb arguing that OS X is a mediocre platform for enterprise security, citing lack of TPM, compatibility issues with smart-cards and dubious track record in delivering security updates. For the next four days of goto-fail fiasco, Apple piled on the evidence supporting that last point. In some ways the continuing silence out of Cupertino represents an even bigger failure to comprehend what it takes to maintain trust when vulnerabilities, even critical ones, are inevitable.

CP

** It turns out in this case the blame goes to gcc. By contrast MSVC does correctly flag the code as unreachable.

HCE vs embedded secure element: comparing risks (part I)

As described in earlier posts, Android 4.4 “Kitkat” has introduced host-based card emulation or HCE for NFC as a platform feature, opening this functionality up to third-party developers in ways that were not quite possible with the embedded secure element. In tandem with the platform API change, Nexus 5 launched without an embedded secure element, ending a run going back to the Nexus S where the hardware spec included that chip coupled to the NFC controller. Google Wallet was one of the first applications to migrate from using the eSE to HCE for its NFC use case, namely contactless payments.

An earlier four-part series compared HCE and hardware secure elements from a functional perspective, concluding that the current Android implementation is close (but not 100%) to feature parity with previous architecture when card-emulation route points to the eSE. The next set of posts will focus on security, looking at what additional risks are introduced by using HCE instead of dedicated hardware coupled to the NFC controller.

Another way to phrase this question: what did embedded SE buy in terms of security and what was lost when Android gave up on the SE due to opposition from wireless carriers? Can HCE achieve similar level of security assurance or are there scenarios inherently depending on special hardware incorporated into the device, regardless of its form factor as eSE, UICC or micro-SD?

Broadly speaking, there are 4 significant benefits ranging from the obvious to more subtle:

  1. Physical tamper resistance
  2. Reduced attack surface
  3. Taking Android out of the trusted computing base (TCB)
  4. Interface separation

Each of the following posts will tackle one of these aspects.

[continued]

CP