Reading the US passport using Android — overview

Seems like it was only yesterday that the privacy community was up in arms about the perceived evils of the new US passport, equipped with RFID chip. There was that confrontation at the 2005 Computers, Freedom & Privacy conference in Seattle between a State Dept representative and Barry Steinhardt of the ACLU on the maximum reading range for RFID passports. From the tin-foil wrapper suggestions (having inspired a cottage industry at this point, also thanks to contactless payments) to Hollywood-esque scenarios of explosives rigged to go off when a particular individual walks by with their RFID passport, conspiracy theories carried the day.

Couple of years later with the new passports having become standard, it is possible to experiment directly with the technology. No special gadgets required: any Android phone with an NFC controller will do. That includes a wide-range of models from the purist “Google experience device” Nexus S to Samsung’s flagship Galaxy S3.

While there are many applications for scanning NFC tags, NFC TagInfo from NFC Research Lab stands out fpr having built-in logic for recogizing the data layout for several common types of cards, including passports.

Tapping the passport against the phone will not automatically bring up the application, as it does not contain any NDEF tags that Android applications typically use to configure auto-launch. Instead we have to start NFC TagInfo and then scan the passport. This will bring up an overview of the tag structure:

This screen already tells us a bunch of things:

1. There is an NFC chip in the passport. Near Field Communication is a type of RFID, operating at the 13.56 Mhz frequency. This is the only type of RFID that Android devices support. The more common RFID transponders such as garage door openers and key-fobs operate at a different frequency and can not be detected by the phone, because its radio does not operate at that frequency.

2. More specifically the NFC tag is an ISO 14443 smart-card, which Android also calls IsoDep technology. This is also how identification cards such as US government PIV card or contactless credit cards appear to the system.

3. “MRTD” stands for Machine Readable Travel Document, a reference to the international standard for encoding information about individuals for use in cross-border travel in a smartcard.

Clicking on that gray button is when things getting interesting, because the application will try– and most likely, fail– to access the contents of that MRTD. It will fail because the cryptographic keys required to access the data are initially missing:

This is where one of the properties of the MRTD protocol comes into play: decrypting contents of the passport requires cryptographic keys, which are derived from information printed on the passport itself. By supplying this information to the Android app, it is possible to get past this error. This is exposed via menu / “set up access keys” option.

[continued in part II]

CP

CVV1, CVV2, CVV3: Demystifying credit card data (1/2)

[This is a series of posts dedicated to describing the card-validation code (CVC) or card-validation value (CVV) for credit cards.]

Swiping a credit card through a magnetic stripe reader is perhaps the most common way of using a plastic card for payments. At the implementation view, involves reading the data encoded in the magnetic stripe on the back. In a pinch when there are no point-of-sale terminals present, getting an imprint of the card by pressing a carbon paper over it will do. When the merchant and card-holder are not in the same place,  the purchase is instead conducted by relaying the card number, expiration date, perhaps the billing address and an additional number printed on the card dubbed CVV2. More fashionable recently are contactless payments, where the card is tapped against a reader, as in Mastercard Paypass, Visa PayWave or Discover Zip. Each of these involves a slightly different protocol, relying on different characteristics of the card data to authenticate the card.

Swipe transaction are perhaps easiest to describe. The data encoded on the magnetic stripe is static, formatted according to ISO7813 in three tracks, with the third one typically unused. One of the fields in this track layout is the Card Validation Code (CVC) or CVC1. which serves as a cryptographic integrity check on the track contents. Much like a message authentication code, the CVC simplifies the process of authenticating track data when it is received by the issuing bank. It also prevents easy fabrication of credit cards: while track data is relatively predictable given the card number, expiration date and other fields, CVC1 does not have any predictable pattern that allows derivation from the other pieces.

CVC2 serves a similar purpoes but  is used in conjunction with card-not-present or “CNP” transactions such as ecommerce when the user types card information into a web browser.  While CVC1 is encoded in the magnetic stripe, CVC2 is only printed on the card itself– three-digits on the back under the magnetic stripe for Visa, Mastercard and Discover, and four-digits on the front for American Express. (The extra digit can be viewed as balancing out the fact that AmEx cards have 15 digits, one less than other major brands.) PCI standards impose stringent constraints on handling of CVC2. For example: while card numbers, expiration date and billing address can be saved for future use to simplify later transactions, CVC2 can not be stored by the merchant. It is only intended for authenticating the card owner during the purchase.

CVC2 and CVC1 are by design incompatible. It is not possible to use the CVC1 for making a purchase online, or encode CVC2 into a magnetic stripe for a successful swipe transaction. This has important ramifications on managing risks due to theft of payment information. It effectively creates a “firewall” between virtual and in-store fraud. Suppose a waiter has taken to swiping all customer credit cards through his very own mag-stripe reader to save a copy of the track data. The resulting cache of contraband information can be used to forge additional cards and used to make in-store payments compliments of unsuspecting diners. But unless our enterprising waiter also remembered to write down or photograph the CVC2 from those cards, they can not be used for any online purchase where the merchant validates CVC2. (Surprisingly some leading retailers including Amazon do not require CVC2, so this turns out not to be major impediment for the aspiring criminal.) Going in the other direction, when yet another website processing credit cards experiences a data breach, the spoils from this stunt can be used for additional online/mail-order/phone-order transactions. But they are not useful for minting actual plastic cards with valid magnetic stripe to use at an old-fashioned bricks-and-mortar store, due to the absence of CVC1.

Updated: 12.18.13 to correct CVC1 / CVC2 mix-up in last paragraph

[continued]

CP

Unanswered questions about the Flame certificate forgery (2/2)

Second question: why bother with MD5 collision in the first place?

As explained in the SummerCon presentation, this particular forgery depended on exquisite timing. First the expiration date of the certificate is exactly one year from the moment it issued, measured in seconds. Second, the serial number of the certificate issued by the TS licensing server is a function of two semi-predictable variables:

  • Number of other certificates issued before
  • Current time, in millisecond resolution

This poses quite a challenge for an attacker seeking to exploit a collision against MD5. Recall that the attack depends on crafting two certificates with identical hash– one is the certificate that the attacker predicts the licensing server will issue, the second one is the certificate that the attacker actually wants to obtain. Ability to find collisions in MD5 ensures that the signature on the first one is as good as a signature on the second one. But “predict” is the operative keyword here: after all it is up to the issuing authority to decide on the serial number and expiration date on the certificate that will be issued. Randomly chosen serial numbers would have trivially defeated the attack. (In fact this is such a good idea that randomization is required for so-called extended validation class of certificates that breathed new life into the defunct CA business model by allowing companies a lot more money to perform the same basic due-diligence they should have performed for every SSL certificate.) Instead the licensing server used current time and an incrementing counter to generate the serial number.

That is a lucky break for the attacker: guess the values correctly when crafting the collision, and the licensing server has unwittingly issued a code-signing certificate made out in the name of MSFT. Making life easier is the ability to do dry-runs of the attack, acquiring  licenses freely to observe the counter and current “time” according to the server: TS license server will happily issue “licenses” to any enterprise user in a certain Active Directory group, eg every employee of the company that has a valid business reason to access Windows server. But even with this advantage, it is a fragile process because it requires making projections about the variables above, namely:

  • How many other users will have obtained a license between now and when the collision is going to be submitted.
  • Exactly what time– down to the millisecond– that collision will be processed. Note this is not when the attacker submits the request to the licensing server; it’s when the server gets around to issuing the certificate. All types of latency, from simple network jitter, to OS scheduling delays could throw this off.

The first one is easy to work around: plan the attack over a weekend or official holiday, and perhaps at a time of day when few legitimate users are going to be requesting licenses to interfere with the attack. Assuming long term persistence on the enterprise network, one can observe the fluctuation in demand over time to spot such opportunities. A different strategy is to target an organization where the TS licensing server is idle by design– perhaps it has just been set up but is not yet activated, or it is being decommissioned in favor of a different licensing model. In this case the assumption is that the creators of Flame have resources to compromise lots of different organizations, so they can pick one with the right TS licensing setup.) In all cases, timing the attack to take place against an idle server can partially help with the second problem as well– if the server is only responding to the attacker, there is no concern about high load on the server leading to variable processing times. But the jury is out on whether milli-second accuracy can be obtained this way, so the attacker may still have to try multiple times. As a comparison point: even with a perfectly sequential serial ID and one-second time resolution, the forgery attack from 2008 required several tries.

The other side of the equation is the cost for each MD5 collision– this is the computational resources “wasted” as a result of guessing incorrectly. Latest estimates put this on the order of $10K-$100K per collision given publicly known techniques. It is likely that the Flame authors had access to novel cryptanalytic attacks lowering that cost, in addition to large amounts of computing power that might wash it away altogether. (As economists like to point out, CPU cycles are a non-replenishable resource. Once the upfront investment in a couple million servers is paid for, one might as well put them to use spinning on a problem.)

Another option is to mess with the licensing server’s notion of “time.” In most modern systems including Windows, the current clock is obtained from the network using a protocol such as NTP. If the Flame authors had access to exploits against the time synchronization scheme, they could “roll-back” the time to try again after each failed attempt with same timestamp. But this will not necessarily reduce the number of collisions required, since the number of issued certificates still increments regardless of success/fail, requiring a different collision to pair up with the expected serial number the CA will choose.

So far this discussion has only considered attacks that treat the licensing server as a black box– attacker interactions are restricted to submitting seemingly-valid license requests to the server and perhaps attacking its surrounding environment, such as disrupting clock synchronization. What they are not doing is outright break into that machine, exercise the signing capability directly on any chosen message or export the private key for future use. Why? Licensing servers are not a particularly sensitive/critical part of infrastructure. They are a regular Windows server, configured in a specific role. They are not specially hardened for better security or closely monitored for any sign of trouble. (After all the raison d’etre of the system is to guard MSFT revenue source, it has no intrinsic value to the enterprise.) Few enterprises use hardware security modules or other key-management techniques.

In addition to maintaining long-term presence in the enterprise network, it is also a safe bet that the Flame creators have access to a significant collection of vulnerabilities, both public and zero-day. Why would they not go after the target directly by compromising the licensing server? Recall that due to the flawed setup of the MSFT trust chain, any organization anywhere in the world operating a TS licensing server will do. In the highly unlikely event that the first enterprise they tried has a clue about security and runs bullet-proof Windows servers, the attackers need only move on to a different one. Surely some enterprise somewhere in the world is running a vulnerable licensing server ripe for the extraction of private keys? Armed with the key, the attackers need not waste any time trying to find MD5 collisions, they can issue the certificate directly. (The conspiratorially minded could argue this is exactly what happened, with the additional twist that a single MD5 collision was chosen *after the fact* to create the appearance that it was an interactive attack. This has the nice property of providing misdirection: actual timestamp on the certificate with colliding hash is no longer meaningful. It could have been back- or forward-dated to the point that trying to mine the logs from the CA during the alleged incident turns up noise.)

Finally there is the question of disclosing offensive capabilities. By attacking the licensing server directly, attackers risk burning a 0-day in case they are found out. (In the worst case scenario– more likely is that the server has known unpatched vulnerabilities with readily available weaponized exploits.) This is not exactly the end of the world. Stuxnet contained multiple 0-days and the authors presumably took into calculation the possibility that one day the malware samples will be reverse engineered. Not to mention that thanks to the likes of VUPEN, everyone and their uncle has access to Windows zero-days these days. Finding one being used in the wild might prompt an out-of-band patch from MSFT and some temporary indignation, but then everyone moves on.

Using an MD5 collision and embedding such a certificate in Flame on the other hand reveals an entirely capability: access to novel cryptographic techniques that are unknown in the civilian world. For an organization interested in trying to hide its capabilities, this is more revealing than the loss of a couple of Windows zero-days that could have blended into the background-noise of vulnerability research.

CP

Unanswered questions about the Flame certificate forgery (1/2)

1. Which enterprise was it?

The authors of Flame exploited a series of design flaws in the way MSFT operated terminal services licensing to obtain a code-signing certificate impersonating MSFT. This step involved interacting with some TS licensing server that was already setup to issue these licenses, which also double as code-signing certificate due to a blatant violation of least-privilege principle. Typically such licensing servers are operated by large enterprises, to simplify the problem of granting licenses to their users to connect to Windows server.

That raises an obvious question: which organization was it? While each licensing server receives a certificate with the same non-descript name Microsoft LSRA PA (that does not in fact identify the organization it belongs to, in yet another example of bad design)  they each have unique signing keys. As long as MSFT was keeping logs for the subordinate CA certificates issued, it is possible to identify conclusively which enterprise the forged certificate chains up through. So far MSFT has not publicly named the organization, nor have the implicated parties come forward of their own accord. It is entirely plausible that the organization did not realize it was their TS licensing infrastructure used to facilitate the Flame attack. This is similar to the pair of semi-conductor firms that had to be alerted their signatures were found on Stuxnet— how many organizations proactively checked their own CA against the forged Flame certificate chain?  But it is equally likely that MSFT or perhaps a law-enforcement agency would have reached out by now (keeping in mind this could be an organization anywhere in the world) and let these folks know they were the unlucky ones. So far this appears to be handled quietly, perhaps to protect the “guilty”– for most enterprises, having experienced a security breach is something to be swept under the rug. This is unfortunate, because it would have been possible to infer something about the modus operandi of the Flame creators based on why they picked that organization. That brings us to the second question.

[continued]

CP

Taking security seriously, while failing spectacularly at it

It’s become nearly impossible to state “we take security [of our users] seriously” with a straight face.

This week witnessed three different companies suffer three unrelated incidents (two of them sharing the same underlying root cause) and all of them resorting to the same damage control cliches.

Here is the MSFT mea culpa on the MD5 collision debacle covered earlier on this blog:

“Microsoft takes the security of its customers seriously”

Here is LinkedIn, responding to 6.5M unsalted password hashes floating around– as an aside, nothing like starting the day discovering your own password hash in a contraband file shared by colleagues before receiving a data breach notification from the clueless web site in question:

“We take the security of our members very seriously.”

Here is the online dating site eHarmony spinning the leak of 1.5 million passwords:

“The security of our customers’ information is extremely important to us, and we do not take this situation lightly.”

With companies taking security so seriously, attackers hardly need anyone to take a more light-hearted approach. One can imagine the Joker asking: “Why so serious?”

It is difficult to know from the outside how these vulnerabilities came about. (Full disclosure: the author is ex-MSFT employee,  but was not involved in terminal services and possesses no information about the incident beyond what is available from public sources.) Were they unknown to the organization? Is that because that aspect of the system was never reviewed by qualified personnel? Or was it missed because the reviewers assumed this was an acceptable solution? Or perhaps the issue was properly flagged by the security team but postponed/neglected by an engineering organization single-mindedly focused on schedule? It is safe to say there will be a post-mortem at MSFT. If LinkedIn and eHarmony have a culture of accountability and learning from mistakes, perhaps they will also conduct their own internal investigations. But the useful information and frank conclusions reached in such exercises rarely leave the safe confines of the company– and that is assuming the leadership can resist the temptation to turn that post-mortem into an exercise in whitewashing.

So we can only draw conclusions based on public information, without the benefit of mitigating circumstances to absolve the gift. And these conclusions are not flattering for any of the companies involved.

LinkedIn and Harmony committed a major design flaw in storing passwords using unsalted hashes with SHA1. This is a clear-cut case of bad judgment that even the CISSP-friendly excuse “we follow industry best practices” is not applicable–  the importance of salting to frustrate dictionary attacks is well known. For comparison the crypt scheme used for storing UNIX passwords dates to the 1980s and has salting. (Both sites could have also chosen intentionally slow hashes– for example by iterating the underlying cryptographic primitive– to further reduce the rate of password recovery, but this is a relatively small omission in comparison.) The fact that both sites a experienced a breach resulting in access to passwords is almost less of a PR black-eye given the frequency of such mishaps across the industry, compared to the skeletons in the closet revealed as a result of the breach.

By comparison, the MSFT incident is far more nuanced and complex compared to the LinkedIn/eHarmony basic failure to understand cryptography. It is not a single mistake, but a pattern of persistently poor judgment that resulted in the ability of Flame authors to create a forged signing certificate:

  • Using a subordinate CA chaining up the MSFT product root for terminal server licensing. The product root is extremely valuable because it is trusted for code signing. As explained by Ryan Hurst, there was already another MSFT root that would have been perfectly suitable for TS purpose.
  • Leaving the code-signing EKU in the certificate, even though terminals server licensing appears to have no conceivable scenario that requires this feature.
  • Attempting to mitigate the excessive privilege from aforementioned mistake by using critical extensions, such as the Hydra OID. (As an aside: “Hydra” was the ancient codename for terminal services in NT4 days.) This would have worked, if Windows XP correctly implemented the concept of critical extension– if the application does not recognize it, it must reject the certificate as invalid. On XP Authenticode certificates still past muster even in the presence of unknown critical extensions– that is almost one third of Windows installations, given the relatively “recent” age of Windows 7 and abysmal uptake of its predecessor Vista.
  • Issuing certificates using the MD5 algorithm in 2010 (!!)– that is nearly two years after a dramatic demonstration that certificate forgery is feasible against CAs using MD5 and six years after the publication of first results of MD5 collision.
  • Using a sequential serial number for each certificate issue, which is critical to enabling the MD5 collision attack by allowing attackers to predict exactly what the CA will be signing.
Individually any of these five blunders could be attributed to routine oversight. By itself any mistake would have been survivable, as long as remaining defenses were correctly implemented. But collectively botching all of them undermines the credibility of the  “taking security seriously” PR spin.
CP

Economics and incentives: terminal services licensing vulnerability

Or: “Don’t confuse the interests of the user with those of the software vendor.”

Researchers recently identified one of the zero days used by the ultra-sophisticated malware Flame, likely created by nation states for the express purpose of industrial-espionage. It turns out to be  yet another PKI vulnerability that allows attackers to create malicious code seemingly signed by MSFT. Carrying the MSFT signature is the ultimate seal of approval for software on Windows. Such applications typically get unique privileges or are given a free-pass on security checks due to the implicit trust placed in their pedigree. In response MSFT swiftly moved to revoke the issuing certificate authorities associated with these forged signatures. (The cruel irony, as pointed out by many commentators, is that Windows maintains those trust roots via Windows Update.)

While PKI vulnerabilities in Windows have devastating consequences, they are not new. Moxie’s find in 2002 regarding the failure to check key usage properly during chain validation was equally damaging, although it was reported the old-fashioned way by a security researcher instead of being reverse-engineered out of malware in the wild. What is unusual about this case is that a large chunk of the problem can be attributed to operational errors in the way MSFT handled licensing. My former colleague Ryan Hurst gives a great breakdown of the root-causes behind the security advisory #2718704 . Consistent with other epic failures of security on this magnitude, it was not a single isolated mistake but a series of engineering incompetence, poor design judgment and operational laziness (in choosing the path of least resistance for managing a new certificate authority) that culminated in the current crisis.

There is a different way to look at this vulnerability in terms of incentives. It’s axiomatic that software is not perfect– not in functionality or in security. Once this is acknowledged, the problem becomes one of managing risks rather than trying to eliminate the completely. Given that every piece of code we deploy might harbor some unknown but catastrophic vulnerability, the question comes down to whether the benefits derived from that application outweigh the risks.

Case in point: web browsers are one of the most closely scrutinized and heavily targeted pieces of software. In the most recent Pwn2Own competition at CanSecWest in Vancouver BC, every single major web browser was successfully exploited. Yet few security professionally would realistically  suggest that users stop visiting websites altogether. (More cautious among us may advise disabling a few bells-and-whistles such as Flash– a notoriously buggy component whose main “value proposition” is the promise of animated hamsters on every web page.) The benefits of the web applications are so compelling and obvious that we have invested massively in building more resilient browsers, so that users can continue to accrue those benefits with lower risk.

That brings up the question of what great value users derive from the software implicated in the latest debacle: Terminal Services Licensing Server. That calls for a slight detour into the arcane world of Windows pricing for enterprises. Most home users’ experience with Windows software licensing is thankfully limited to occasionally entering the 25-letter product activation code when installing a new copy of the operating system. (There is also the occasional false-positive nag when the OS decides that its environment has changed because the user installed some new hardware for example, and subtly accuses the user of pirating software.) Each such key corresponds to one license, either included as part of the purchase of the machine or perhaps of buying a new copy of the OS at retail price.

Enterprises have far more complex models when it comes to paying for software, especially “server class” software where a set of centralized, shared machines is offering applications– such as printing, file sharing or remote desktop– to a population of users. In these scenarios the enterprise pays not just for installing the server, but also for each user connecting to that server to access its functionality.

It’s instructive comparing this with an open-source model: licensing schemes appear to be fundamentally incompatible with “free” software models, where the adjective is used in the sense of rights accorded the user rather than monetary terms. A vendor could not charge users more to run that software on a machine with 16 processors compared to 4, or to serve a thousand users instead of a hundred. Any such artificial restrictions would be “edited” out of the source code in a matter of minutes. Open-source software is only limited by the capabilities of the hardware it runs on and fully “saturates” them to their full potential. Proprietary software on the other hand can be artificially throttled to implement “price discrimination” where two customers can end up paying a different amount for the exact same service. (Note the software distributed is identical whether it is set to serving 10 or 100 users. One could argue that scaling the application to handle the higher load itself is a cost for the developer. In that case this extra cost is being recouped by over-charging the heaviest users and under-charging the lower volume ones.)

The catch is, doing that requires additional machinery. Left to its own devices, software will run at full speed and service all requests. It takes extra effort to slow it down or limit it to only doing its job after the check has cleared. This is where the likes of Terminal Services Licensing comes in. Its mission in life is to make sure that a Windows server  product only does its “serving” to those clients that have paid for it. This is undoubtedly valuable to MSFT, in terms of enforcing the desired pricing model and collecting additional revenue. From the customer perspective, it could go either way. Economically it may lead to a more efficient outcome for the enterprise if paying based on actual server demand measured in real-time is cheaper than trying to estimate peak load and buy licenses upfront. Of course it could also be construed as nickel-and-diming these users: pay several thousand dollars for the server and then pay another $10-$50 for each client.

The Flame zero-day incident shows that economics is not the only dimension to this problem. Confronted with the risk of getting the enterprise 0wned, the prudent CSO would opt for paying more for software upfront, instead of worrying about one more useless component that creates additional opportunties for attack without any redeeming value– if they had the choice. All but the largest enterprises lack bargaining power when negotiating such deals. In fact lack of meaningful choice is a recurring theme: licensing software is proprietary (although the protocol has been disclosed, compliments of the consent decree) and there are no “better” or “more secure” alternatives that can be deployed as alternative.

More importantly, the flaws in Terminal Server Licensing create negative externalities for everyone,  due to the way MSFT implemented licensing by granting sub-ordinate CA status to enterprises. If some enterprise were to mismanage its signing privileges, it is no surprise that its own users can be compromised. That part is expected, and even creates proper incentive for each enterprise to secure its own signing infrastructure. But due to the stunning design flaw, malware signed by one misbehaving enterprise certificate authority (or “licensing server” to use the preferred terminology of software toll-extraction) can be used to every other enterprise and even home users who have no business accessing any server software.

That is a suboptimal risk tradeoff.

CP

Bringing cloud identity to the PC (1/2)

The recent announcement from an official MSFT blog that Windows 8 will allow sign-in to the machine with a Windows Live ID marks the final ascendancy of cloud identity systems. Until recently, there were two principal notions of  identity recognized by Windows.

  1. Local accounts, maintained by that PC and recognized by no other machine. This is what a consumer might have on their machine at home: if there are 5 people in the household, each one gets a different account . Note that passwords are not  required; the accounts can be setup such that merely clicking on the user tile is enough. In this model accounts are used purely for customization: each person can pick a different background, shortcuts, have different applications start up when they login, their web browser remembers their unique history, etc. The identity can not be asserted anywhere else: in order to read email for example, the user has to type a different password to sign-in to their mail provider. (There is a subtlety here in that this password for Hotmail/GMail/Yahoo etc. can be cached, making the local Windows account act as a “key ring” or “credential store” for other cloud identities, but the Windows identity itself is not recognized in the cloud.)
  2. Domain accounts: This is the enterprise scenario, where an IT department sets up an Active Directory system for central management of all computers in the company. Identities are then issued by the centralized system and recognized at by all machines that are members of the AD domain.  The user types in a username/password to logon to their laptop; this part of the experience is not changed. But the protocol used under the hood ensures that the same identity authenticated by those credentials is also good for accessing resources on a file share, sending a print job to the expensive color printer or downloading email from the local Exchange server the IT department runs.

Enterprise identity systems were the first to confront the challenge of integrating with the cloud. As basic functionality such as email, CRM, content management etc. were increasingly outsourced to web providers in the name of efficiency, some solution became mandatory to allow use of existing enterprise identity to authenticate to the cloud. Asking users to manage different passwords for each outsourced service quickly makes IT departments very unpopular– not that they have any political capital to burn, for the most part. Typically this is achieved by using a “bridging” solution that converts the brand of identity expression used in the enterprise eg crusty-old Kerberos, into a different format that is more in-tune with the fashionable standards on the web, which means typically angle-brackets and XML eg SAML.

While several companies market such bridging solutions for the enterprise market, until now the most sophisticated form of integration available to home users remained the old-school password manager: the local device will store all your passwords for websites, much like a keyring, the story goes, and once you login to that device you can use any key to unlock other doors out there. Variations on this theme, such as cloud-based password managers that allow roaming the keyring between different machines, remained the state of the art for PC and Mac platform.

In a sense this was understandable: being an open ecosystem, the PC had to cope with a myriad of different authentication systems, proprietary schemes vying to become standards  (mostly dead-ends it turned out) and incompatible usage models across the board: from the security conscious paranoid user to the convenience-driven casual web surfer.  Instead most of the innovation in simplifying the identity experience happened on relatively closed-platforms, what Jonathan Zittrain might have termed “appliances” with less-stringent requirements for integration beyond the services provided by the platform owner: iPhone, Android and ChromeOS proved how easy it is to use cloud identity when there is exactly one identity provider.

[continued]

CP

All Flash, no substance: returning to a purist web

The announcement by MSFT that Internet Explorer 10 will have one browsing mode without plug-ins has naturally raised eyebrows. The move can be interpreted from two different angles:

1. Strategic strike aimed squarely against Adobe. By far the most dominant extension during the past decade has been Flash. Whatever limitations HTML and Javascript had– perceived or real– Flash was there to provide developers the escape hatch to add the all-important dancing squirrels to their website. MSFT made an ill-fated attempt to displace Flash with Silverlight. This crusade ended like many other homebrew technologies emerging out of Redmond in the past decade, in yet another confirmation  that MSFT is too constrained by regulatory attention and no longer freely wields the immense market power it once held for single-handedly introducing new de facto standards. Silverlight tanked. Its one prominent customer Major League Baseball– which may motivated porting the technology to OS X, lest Mac using fans were left out– dropped Silverlight after the 2008 season to go with Flash for online broadcasts.

It is not surprising to see MSFT embrace HTML5 in response. This is standard operating procedure in platform wars. If a company can not force its own technology (eg Silverlight) on consumers, the next best thing is for an open standard (HTML5) to win– verses ceding the ground to a different proprietary offering (Flash) from a major competitor.

2. A less cynical reading of the move is that it represents a return to a simpler, “purist” interpretation of the web. After going through several iterations in the span of a few short years, HTML became stagnant at version 4.01– that recommendation was published in 1999. Meanwhile the demands of web applications continue to grow, particularly after the dotcom bubble exploded and the remains gave rise to web 2.0. Into this void stepped Flash. There had been earlier attempts to “enhance” the web experience: ActiveX to bring the full power and perils of Windows native programming, Java before that with the promise of write-once-run-anywhere. For the first time with Flash developer demand and technology had a happy meeting in the middle.

The downside is Flash fabricated whole new “conventions” out of thin air, and resurrected privacy and security problems that had been already solved before in the context of web standards. Web browsers greatly limited interaction between websites to prevent security risks, the so-called same origin policy that underlies web security model. Flash invented its own cross-domain access rules, creating vulnerabilities on websites– even on sites that did not use Flash, a cardinal sin for a technology.

Privacy also suffered: tracking cookies were the big scare in 2000, a time that seems positively innocent by contemporary standards. Eventually better cookie management functionality in the web browser and a half-hearted at a new policy language tamed the problem– for “cookies” as they were defined at the time. Flash introduced its own notion of client-side storage which could be used to achieve the same tracking capability as regular cookies, yet remained outside the purview of privacy enhancements implemented over the years to manage regular HTTP cookies. This was a clear boon to web services with dubious intentions for tracking users. Sure enough a study in 2009 found that Flash cookies are commonly used as back-up measure by many popular websites, to recreate regular cookies that are deleted by users. (Granted HTML5 also introduced its own notion of local storage, but at least web browsers provided users the control over this functionality. For the longest time the only way to delete Flash cookies was to visit a web page hosted by Adobe, in sharp contrast from centralized browser settings.)

From this perspective, disabling plugins and imposing strict HTML5 semantics on the chaos happening inside a web browser is a good development. HTML5 has come a long way– much of the functionality (cross-domain communication with access control, multiple threads, better graphics, video support etc.) is being incorporated into the standards. The need for an escape hatch whether in the form of Flash, Java or Silverlight to enable some hitherto impossible scenario weakens by day. For security professionals and privacy conscious users, the good news is there is one major place to focus efforts, instead of multiple surprises hidden in a homebrew design a developer implemented without the benefit of public scrutiny a standard receives.

CP

DigiNotar: surveying the damage with OCSP

Heaving earned the dubious distinction as the first certificate authority ever to get booted from the root CA program due to a security breach, it is no surprise that DigiNotar is going into damage control mode. The parent company Vasco has solicited a third-party (FoxIt) to investigate their security incident and published the report. It is not clear if they were planning to score brownie points for transparency– as the report still withholds crucial information in the name of protecting confidential details about DigiNotar’s internal setup. The authors are also careful to dance around issues of culpability:

“Since the investigation has been more of a fact finding mission thus far, we will not draw any conclusions with regards to the network-setup and the security management system.”

Luckily readers are free to draw their own conclusions. In particular, there appears to be a clear-cut case of negligence: the company noticed and investigated a security breach as early as July, but their tepid response ended at revoking a few certificates along the way. In particular they did not bother notifying Mozilla, Microsoft, Apple or any other major company maintaining root CA programs.

To their credit, FoxIt  tried to investigate the extent of the damage by monitoring OCSP logs for users checking on the status of the forged Google certificate. There is a neat YouTube video showing the geographic distribution of locations around the world over time. Unfortunately while this half-baked attempt at forensics makes for great visualization, it presents a very limited picture of impacted users.

First not all clients check for revocation. The settings often depend on the browser versions. For example starting with Internet Explorer 7, IE enables revocation checking by default— but the user can always override this setting.

Second even those web browser configured to check may not be capable of using OCSP. In particular Windows XP does not support that feature, which means that clients that rely on the platform crypto functionality such as IE and Chrome, will fall back on using certificate revocation lists instead. The forged certificate contained both an OCSP responder and a CRL distribution point listed; in principle there are web server logs from the CDP– assuming DigiNotar was logging these. While the FoxIt report makes no reference to these additional records, there is an additional problem in that a CRL download is not specific to a particular certificate. It is known that the attackers successfully obtained bogus certificates capable of impersonating several popular websites including Tor and Microsoft. Nothing in the protocol reveals which bogus certificate a client is trying to ascertain the status for. In fact it may even be a “legitimate” user looking up the status for a valid certificate.

Diving into the gritty details of certificate revocation in Windows, we discover even more twists: it turns out that even the platforms capable of OCSP checking (namely, Vista and Windows 7) may opt for a CRL in certain situations. While OCSP is more efficient for finding out about one certificate, downloading the CRL may in the aggregate become worth the upfront cost when dozens of certificates by the same issuer are being validated. In the case of Windows, “dozens” is configured by a registry key set to 50 by default but this can be changed. In fact even the preference for OCSP over CRL can be inverted owing to the enterprise-friendly management capabilities of Windows, but it is unlikely that home-users were fine tweaking such settings.

Next the inherent cacheability of both OCSP responses and CRLs create more false negatives: during its validity period the response can be cached either by the user (suppressing multiple lookups) or even an intermediate caching proxy on the network, hiding additional hits to the OCSP responder even when users are being subjected to MITM attack repeatedly.

There is an even more bizzare possibility that some of the hits against CRL or OCSP responder may in fact be false positives due to a feature known as pre-fetching.  Windows keeps track of certificates validated in the past and can download CRLs or cache OCSP responses preemptively, before the certificate is encountered again. In the case of OCSP this is not a true false positive; the user would have to have encountered the forged certificate several times in the past before the heuristics kick-in– but it does suggest that not every blip on the map corresponds to an attack having taken place at that instant.

Finally there is a conceptual problem with FoxIt forensics: TLS protocol supports an optimization known as OCSP stapling. In this model, the web server itself obtains an OCSP response from the certificate authority attesting to the freshness of the credential. This response is sent down to the client during the TLS handshake, freeing the client from having to  do its own OCSP lookup– since these responses are signed, the client can be assured that the answer is authentic. (Modulo the inclusion of a nonce, to be precise.) In this case until DigiNotar noticed the bogus Google certificate, the OCSP responder would have returned status “good” on the certificate– even though it was never issued in the first place. Depending on one’s perspective, this is either bad design on the part of OSCP protocol or yet another instance of DigiNotar incompetence. As such even if we assumed that 100% of clients are inclined to consult the OCSP responder, an attacker with man-in-the-middle capabilities can render that unnecessary by providing the answer as part of the same connection they intercepted. Without knowing the capabilities of the attacker and whether they took this extra step, there is no way to know if the OCSP logs only represent hits from users whose web browser does not grok the stapling extension. (There is also the more crude attack that involves dropping or degrading communications sent to the OCSP responder, again resulting in a very large number of false negatives.) The forensics implicitly assume limited resourcefulness on the part of attacker. This assumption is not warranted. On the contrary, everything in this picture suggests that whatever mistakes the attackers made– such as getting caught by Chrome root pinning feature– are dwarfed by the sheer incompetence and gross negligence of DigiNotar itself.

CP

The limits of certificate revocation

In the wake of the DigiNotar debacle, it is time to revisit a question that inevitably comes up each time another certificate authority makes a mistake: does certificate revocation help? The short-answer is it turns out, probably not.

Briefly, revocation checks refer to additional steps used to verify the validity of a digital certificate that involves communicating with the issues over the network. These are in addition to the local checks, such as verifying the signature, checking that the certificate is not expired and comparing the name on the certificate to the expected ID. Local checks are cheap from a computational perspective, but they are also static: if a certificate passes these checks once, it will continue to pass them until the expiration date. If the assertions made on the certificate are invalidated at a later time– for example, the user loses their private key– we can not find out about such changes merely by inspecting the certificate one more time. Instead we have to go

There are two ways revocation status can be checked. Simplifying somewhat:

  • Certificate revocation lists or CRLs. CRLs are giant lists of all revoked certificates, periodically published by the issuer. Anyone can download these and check if a particular certificate is on the list.
  • Online Certificate Status Protocol, OCSP.  In this model one queries the issuer directly about one particular certificate, asking in effect “what is the latest news on this certificate?”

OCSP addresses one of the main challenges of CRLs. Verifying the validity of a single certificate using a CRL involves downloading a massive registry containing thousands of other, completely unrelated revoked certificates. This is problematic because often certificate validation is a performance bottleneck for latency. For example when connecting to a website using SSL, the certificate of the server must be validated. If the user is being extra cautious and includes revocation checking, then any communication with that website is now blocked on the completion of that step. While CRLs can be cached for future use and do not have to be downloaded each time, the cost to “bootstrap” from an empty cache can be prohibitive. In these situations a single OCSP query can be more efficient than an extended CRL download. On the other hand, if hundreds of certificates need to be verified from the same issuer, we reach a cross-over point where economies of scale confer an advantage on the bulk-mode operation with CRLs.

Armed with these options, it looks at first sight that incidents along the lines of DigiNotar can be contained by promptly revoking the improperly issued certificates, such as the bogus GMail certificate discovered in the wild for intercepting traffic to Google. The problem is the revocation model itself assumes a certain pattern of limited, isolated “mistakes” on the part of the certificate authority. Failure modes beyond that are outside the scope of the threat model, and can not be mitigated using either CRLs or OCSP.

The standard example of a certificate authority “mistake” is issuing a certificate to the wrong person. To sketch a hypothetical scenario: someone calls up Acme CA, introducing himself as a Microsoft employee. They request a certificate for login.live.com, the authentication service used by virtually all MSFT online services. Acme CA does not vet the identity of this requestor properly (and why should they? they are getting paid to issue certificates, not for saying “no” as pointed out in earlier post on misaligned economic incentives) issuing the requested certificate to this unauthorized person, who turns out to be working for a repressive government trying to eavesdrop on citizens’ communications.

Both CRLs and OCSP are tailor-made for this scenario. Once the clueless CA realizes their mistake (“what do you mean MSFT has their own cross-signed CA and has no reason to get one-off certificates from us?”) they can blacklist this certificate. It will appear in the next CRL published, and for those who can not wait that long, the OCSP responder will immediately start reporting a revoked status to anyone that asks. Admittedly this best-case scenario is still glossing over the subtleties of which pieces of commonly used software are in fact checking for revocation by default or for that matter what happens if revocation checks fail for unrelated reasons such as network flakiness, or even active attacks as pointed out in 2009 by Moxie Marlinspike in a Blackhat talk. One can argue that suboptimal decisions by client implementations can not be blamed on the protocol itself.

But there is a different failure mode for CAs that does not fit the convenient pattern described above. In the earlier example we assumed that:
1. CA remains in control of their private key– the key itself has not been shipped off to China
2. CA remains in control of the certificate fields being signed; for example the serial number, key usage, expiration dates etc. are all set according to the usual procedures. The only wrong field is the so-called “distinguished name” identifying the purported owner.
Clearly #2 implies #1, as attackers can fill in the blanks in the certificate fields to their heart’s content if they had direct posession of the signing key.  In that sense #2 is a strong requirement, and it turns out that if this is violated both CRL and OCSP are toast.

Some of the issues lie in the protocol details: as pointed out, OCSP uses serial numbers to identify certificates and so does the CRL format, as explained in this MSDN article. Serial numbers are just a field in the certificate. A sufficiently misguided CA could end up reusing the same serial number, one for issuing a healthy certificate to its rightful owner, and a different completely unrelated certificate to unauthorized persons. This means that the very nature of an OCSP query is ambiguous, and in the best case scenario revoking a forged certificate will have “collateral damage” on other benign certificates.

But even if we had better identifiers to uniquely identify certificate (the hash would have been an obvious choice) there are structural problems in the design.

The first problem is that both CRL and OCSP responses are digitally signed, either directly by that CA, or in delegation scenarios, by another certificate issued by the CA. If the attackers had free reign to obtain arbitrary certificates from the issuer, they were in a position to obtain the credentials required to also forge CRL and OCSP responses on behalf of the CA. When the end user decides to check on the status of that bogus GMail certificate being used to intercept their private communications, our attackers would substitute an equally bogus response that appears to originate from the OCSP responder in effect saying, “move along, these are not the revoked certificates you are looking for.” (Incidentally the same logical circularity applies to the “CA compromise” status code defined in CRL: if the certificate authority itself has been compromised, the client has no expectation of being able to trust the information in a CRL and the attacker can make up arbitrary CRLs.)

The second problem is that the location of OCSP responder and  the CLR distribution point are themselves fields in the certificate. If the miscreants had freedom to craft their own certificate and get it signed (instead of conforming to a template defined by the CA) they can simply omit the Authority Information Access field containing the OCSP responder’s location, or point the CDP at a random location controlled by the attacker.

Third the CA often has no idea what certificates have been issued, if the attack has circumvented the usual enrollment process. The serial number, distinguished name or other details required to blacklist the certificate via revocation may not have been logged. In this case the CA only knows that some fraudulent certificates were issued, but they have no idea which sites can be targetted. Until the certificate is observed in the wild, there is nothing to revoke.

Finally– and this plagues the recovery efforts– often it is not possible to determine conclusively whether the mishap experienced by the CA indeed amounts to a few isolated cases conforming to the pattern anticiapted by revocation designs, or if attackers managed to breach the process itself at a deeper level beyond repair. Both ineptitude and economic incentives are at work in this uncertainty. On the one hand, logs maybe incomplete, or the forensics inconclusive to say either way. On the other hand, PR pressures motivate organizations to minimize the perceived damage and hope for the best, until hard evidence proves otherwise. One look no further than the persistant denial by RSA that SecurID breach would have no impact on customers– until Lockheed Martin incident forced them to admit otherwise. DigiNotar set a shining example of transparency here by remaining quiet about the breach until the MITM attack in Iran surfaced. Days after the incident they still lacked a coherent story on what exactly went on. Faced with such negligence, it is better to assume the worst and not depend on revocation for piecemeal mitigation.

CP