Incompetent by design: why certificate authorities fail

(And more importantly, why we can not do anything to fix that.)

It sounds like a great business model, one that ought to be advertised on late-night TV infomercials. It might work this way– imagine loud, over enthusiastic pitchman reciting:

“Become a certificate authority! For a few pennies a day, you can sign largely meaningless statements that users around the world depend on to protect their Internet communications. Forget about hard work and all those complicated problems of identity verification. Let your servers figure out the trustworthiness of the customers. And if they get it wrong? Hey, who cares– your certification policy statement indemnifies you from any damages resulting from gross negligence.

And if you call in the next 10 minutes, we will even allow you to issue EV certificates!”

There is an abundance of evidence that certificate authorities are by and large incompetent, clueless and outright bone-headed:

  • Bogus Microsoft code-signing certificate, valid until 2017. (Yes they are revoked but look in your Windows cert store, and under “untrusted certificate” you will see these two shipping in every version of Windows out of the box marked with the equivalent of danger-will-Robinson sign, just in case revocation check failed for some reason.)
  • The real Mozilla: bogus Mozilla.com certificate issued for the distribution website of Firefox.
  • Circular dependency on DNS. Dan Kaminksy’s work on DNS vulnerabilities revealed that some certificate authorities verified ownership of a domain simply by sending email to that domain– a security solution designed to be resilient in the face of a completely 0wned network is rooted on the assumption of email routing being trusted?
  • Having an email address with the word “admin” in the tiel does not entitle user to SSL certificates for that domain. This is a lesson email-hosting providers found out the hardware when free email accounts with authoritative-sounding names were enough to convince CAs to issue certificates to the user who squatted on that account name.
  • The crowning achievement: RapidSSL continues to use MD5 for issuing new certificates in 2008– four years after the pioneering work of Xioyung Wang made it clear that MD5 can not be used for new signatures due to possibility of hash collisions. Resulting attack, in spite of being entirely predictable as train wreck in slow motion, is spectacular enough  to earn best-paper at Crypto 2009.

It’s not that every certificate authority is clueless: but the abundance of these examples make it clear that somewhere somone is bound to do something remarkably dumb. That brings us to problem #1:

It takes only one CA to bring down the system. To simplify the picture a bit, operating systems have a set of “trust anchors”– list of the certificate authorities they are willing to trust for important assertions such as the identity of a website they are about to share some personal information with. These anchors are interchangeable. A certificate issued by one is trusted as any other. Any one of them is good enough to show that padlock icon for IE or yellow address-bar for Chrome or … (In an ingenious marketing scheme, a new category called extended validation or EV was created to separate the “competent” CAs from the hoi-polloi. We will address in the second part of this post why EV is a very profitable type of delusion.)

This is a classic case of the weakest link in the chain. Successfully compromising any one CA is enough to defeat the security guarantees offered by the public-key infrastructure (PKI) system they constitute collectively.

Quick peek at the Windows certificate store in XP reveals somewhere north of 100 trusted certificates there. Among these, Verisign and Microsoft are probably the only recognized brands. How many users heard of Starfield Class 2? UserTrust based out of Salt Lake City, Utah? For some international flavor try Skaitmeninio sertifikavimo centras, based in Lithuania. There is also “NO LIABILITY ACCEPTED (c) 97 Verisign Inc”– yes that is the name of the issuer on one of the root certificates. (Complete with all capital words; Verisign could use some help from Danah Boyd on capitalization it appears.)

That list is actually a conservative estimate. Windows has a feature to auto-update new root certificate from Windows Update on demand during chain building. That’s why the sparse appearance of roots in Vista and Windows 7 out-of-the-box is misleading. They are still implicitly trusted, waiting to be triggered during a search for roots. MSFT has a KB article pointing to the full list of current roots. 0wn one of these 100+ entities, and you can mint certificates for any website. 0wn one of the couple dozen trusted for code signing and you can start writing malware “signed” by Microsoft or Google.

What does it take to join this privileged club? It’s not yet being advertised in infomercials but both Microsoft and Mozilla publish the criteria. MSFT being under the scrutiny of regulators is under particular pressure to keep a level playing-field and keep the ranks of membership relatively open. Mozilla has its own requirements to allow low-cost issuers since the open source community generally views PKI as no less than a cabal of rent-seeking oligopoly. (Google Chrome simply picks up the trust-anchors in the platform, namely Windows root from CAPI on Windows and NSS/Mozilla roots on Linux.) In reality since an SSL certificate that only works for some fraction of users is largely worthless, the criteria CAs live up to is the intersection of both programs.

The interchangeable nature of CAs for end-user security brings us to the fundamental economic problem: there is no incentive for better security in a commodity product. CAs are competing on a commodity and market dynamics will drive prices closer towards zero– along with that the quality of the product. If one CA decides to do a better job of verifying customer identity and charge extra for this, customers will simply move on to the next CA who rubber-stamps everything sent its way. Competence is self-defeating.

There is a more subtle economic problem in the model: the paying “customer” is the website or company that needs a digital identity. (or as the examples above demonstrate, the miscreants trying to impersonate the legitimate owner– CA gets paid either way.) But the end-users who depend on that certificate authority doings its job correctly are not privy to the transaction. This externality is not factored into the price.

One could argue the direct customer is on the hook: if a company suffers an attack because of a forged certificate in its name, their own reputation is on the line and this provides economic incentive to do the right thing. But this is deceptive. Even if Alice cares enough about her reputation to shell-out extra $$$ for a high-quality certificate from the most diligent CA, she can not fend off an attack from Mallory who tricks the dirt-cheap and careless certificate authority into issuing a bogus “Alice” identity to Mallory. The existing installed base of web browser does not provide a way for Alice to instruct her current customers to only trust certificates from the competent CA and ignore any other identities. (Even assuming Alice wanted to do this– it would mean creating lock-in for the CA and voluntarily relinquishing the competitive pressure on pricing.)

[continued]

CP

Unlinkable identifiers on the web: rearanging deck chairs (2/3)

CardSpace boasts a limited degree of unlinkability, based on a weak attack model: for self-signed cards, user can generate two assertions for two different websites that appear independent. (My colleague Ben from Google security team disputes even that weak guarantee, arguing that only assertions that can not be linked even with help from the identity provider qualify as “unlinkable”)

OpenID gets a bad reputation for allowing linkability but in fact there is no requirement of universal identifier in the specification. OpenID provider could choose to assert two different “names” for the same user to two dfiferent websites. (Of course they are still linkable in the sense that the ID provider knows what is going on, even if the sites can not put the picture together on their own– sort of, see next two points.)

The problem is even this weak guarantee of “unlinkable” identities at multiple websites breaks down in the real world, for two reasons.

First problem is that websites insist on email address or other unique identifier– and they want this at authentication time. When an inherently PII information such as email address is shared, unlinkability of the underlying protocol becomes largely irrelevant since there is another, even more universal identifier to go by. Same email address would appear even when user authenticates via two different identity provides– this is linkage across independent providers.

Federated ID providers are not in a position to say no: they are trying to convince relying sites to interoperate. Everyone already has a proprietary identity management system already, requiring users to sign up. This registration process collects some basic information and availability of that information is firmly embedded in the business logic. Going from a model where the site has an email address to one where they know the user as “pq2t45x” is not an appealing proposition. Similarly any time the user shares a global identifier such as address, real name or credit card number they void any privacy guarantees from the identity model.

As a matter of architecture, authentication systems should strive for minimum disclosure– more identifying information can always be added after the fact, but it is impossible to go back in the direction of greater privacy. Even if majority of transactions ended up with the user sharing PII during  at one point (making them very linkable regardless of authentication) it’s fair to argue that underlying protocols need to optimize for the best case of no disclosure and casual browsing. But the reliance on email address in existing scenarios means that redesigning basic protocols in this fashion to disclose less will be an exercise in rearranging deck chairs.

In many ways email address is the easiest attribute to fix, especially when ID provider is also the email provider– true for the three largest ID providers, Windows Live/Passport, GMail/Google and Yahoo– they could simply fabricate email aliases that forward to the original. Unfortunately that still breaks support scenarios because when alice@gmail.com calls asking for help, the system has her records files under the very private dx4r2p6@gmail.com. Other identifiers have their own private versions depending on the provider: some credit card companies support issuing one-time card numbers billing to the original. Mail services allow signing up for a PO box and hide original physical address (although good luck getting many ecommerce merchants to deliver to one due to their high incidence of fraud) and conceivably they could start algorithmically generating those PO box numbers to break linkage.

Even if every instance of linkable PII could be replaced by a pairwise unique variant, there is a second problem: linkage between identifiers is possible when user is authenticated to multiple sites at the same time.

[continued]

CP

MD2: reply to comments

Glad to see a comment from Dan Kaminksy on the last post about the severity of MD2 issues.

Follow-ups:

  • There is no denying the problem with MD2. Discontinuing its use (eg rejecting certificates signed with MD2, as OpenSSL already has and t he upcoming MSFT patch will implement) is the right response.
  • Point argued in the post is  that severity and urgency of the problem is low.  Compared to other X509 problems disclosed by Dan Kaminsky and Moxie Marlinspike — including the null handling, OID confusions and even more deadly remote code execution in NSS– MD2 issue is a distant second. The sky is not (yet) falling.
  • It’s not clear  the MD5 parallel holds: When Wang and her colleagues found actual collisions people were widely using MD5 for new signatures. In fact the forgery of an intermediate CA cert in December 2008 proved some certificate authorities are so clueless that they continued using MD5 for new signatures after 4 years and several improved attacks. (The fact that SSL CAs are bound to be incompetent and clueless as the expected competitive outcome deserves its own blog post.) MD2 has long been retired for new signatures, leaving only past signatures to exploit.
  • Basic birthday attacks are enough to exploit new signatures. Advances in the types of collisions possible– such as controlling the prefix– only improve the odds. But leveraging  past signatures in a hash function that is no longer used requires a second pre-image attack. Nobody has managed to produce even a single one for MD2.
  • As of this writing, best second preimage attacks have time complexity comparable to 2**73 MD2 invocations and storage complexity of 2**73. And that second number makes this attack impractical . Eight billion terabytes– an awful lot of spare disk drives. (As an aside– Daniel Bleichenbacher looked into this and did not see any low-hanging improvements to the storage requirement either.)

Bottom line: yes there is a problem with MD2. It never presented an immediate danger. Cryptographic attacks are fascinating but the more mundane X509 parsing bugs disclosed around the same time– and continuing tradition of CA incompetence– are far more fatal to PKI.

CP

MD2: hash-collision scare of the day

Overshadowed by the far more serious X509 parsing vulnerabilities disclosed at BlackHat, one of the problems noted by Dan Kaminsky et al. was the existence of an MD2-signed root certificate.

On the surface it looks bad. If MD2 preimage collision is possible, an enterprising attacker could forge other certificates chaining up to this one, and “transfer” the signature from the root to the bogus certificate, complements of the MD2 collision. Root certificates are notoriously difficult to update– Verisign can not afford (for business reasons, even if it is the “right thing” for the Internet) to risk revoking all certificates chaining up to the root. Re-publishing the root signed with a better hash function is a noop: the existing signature will not be invalidated. Only option is to not trust any certificate signed with MD2 except for the roots.

But looked from another perspective, the MD2 problem is tempest in a teapot. Luckily no CA is using MD2 to issue new certificates. (At least as far as anyone can determine– CA incompetence is generally unbounded.) This is important because the MD5 forgery from last December depended on a bone-headed CA continuing to use MD5 to sign new certificate requests. That means a second preimage collision is necessary; simple birthday attacks will not work. Finding a second message that hashes to a given one is a much harder problem than finding two meaningful, but partially unconstrained messages that collide.

Eager to join in the fray against PKI, the researchers point to a recent result, An improved preimage attack on MD2, to argue that such a possibility is indeed around the corner. It turns out the feasibility of this attack and the 0wnership of MD2 was slightly exaggerated, to paraphrase Mark Twain. The paper in fact does quote 2**73 applications of MD2 hash function as the amount of time required to find a second pre-image. This is an order of magnitude above what any previous brute-force attack has succeeded in breaking but Moore’s law can fix that. What the paraphrase seems to have neglected is a far more severe resource constraint, stated bluntly in the original paper and mysteriously neglected in the Kaminsky et al summary: the attack also requires 2**73 bytes of space. Outside the NSA nobody likely has this type of storage lying around. None of the existing distributed cryptographic attacks have come anywhere near this limit– in fact most of them made virtually no demands on space from participants. To put this in context, if one hundred million people were participating, each would have to dedicate more than a thousand terabytes of disk space. Not happening. This does not even take into account the communication and network overhead now required between the different users each holding one fragment of this massive table as they need to query other fragments.

CP

Pairwise identifiers and linkability online (1/3)

There has been a lot of talk about “unlinkability” in the context of web authentication systems. For example Cardspace is touted as being somehow more privacy friendly compared to OpenID because it can support different identifiers for each site the user is interacting with. This post is a first attempt to summarize some points around that– complete with very rusty blogging skills.

To start with, the type of unlinkability envisioned here is a very weak form compared to the more elaborate cryptographic protocols involving anonymous credentials. It comes down to a simple feature: when a user participates in a federated authentication system (which is to say, they have an account with an identity provider that allows the user to authenticate to websites controlled by different entitites) does the user appear to have a single, consistent identifier everywhere he/she goes?
It is not a stretch to see that when such a universal identifier is handed out to all of the sites, it enables tracking. More precisely it allows correlating information known by different sites. Netflix might know the movie rental history and iTunes might know their music preferences– if the user is known to both sides by the same consistent identifier, the argument goes, the two can collude and build an even more comprehensive dossier about the user. This is a slightly contrived example because movie rentals are uniquely identifying already (as the recent deanonymization paper showed) and chances are so is the music collection, but it is easy to imagine scenarios where neither site has enough information to uniquely identify a user but when they can collude and put all of the data together, a single candidate emerges. Consider Latanya Sweeney’s discovery in 2000 that 87% of the US population can be identified by birthdate, five-digit zipcode and gender. It does not require very many pieces of information– if they can be assembled together with the help of a unique ID each one is associated with– to pick out individuals from the online crowd.

The obvious solution is to project a different identity to each website. Alice might appear as user #123 to Netflix but iTunes remembers her as user #987. With a little help from cryptography it is easy to design schemes where such distinct IDs can be generated easily by the authentication system, and virtually impossible to correlated by the websites receiving them even when they are trying to collude.

[continued]

CP