DigiNotar: surveying the damage with OCSP

Heaving earned the dubious distinction as the first certificate authority ever to get booted from the root CA program due to a security breach, it is no surprise that DigiNotar is going into damage control mode. The parent company Vasco has solicited a third-party (FoxIt) to investigate their security incident and published the report. It is not clear if they were planning to score brownie points for transparency– as the report still withholds crucial information in the name of protecting confidential details about DigiNotar’s internal setup. The authors are also careful to dance around issues of culpability:

“Since the investigation has been more of a fact finding mission thus far, we will not draw any conclusions with regards to the network-setup and the security management system.”

Luckily readers are free to draw their own conclusions. In particular, there appears to be a clear-cut case of negligence: the company noticed and investigated a security breach as early as July, but their tepid response ended at revoking a few certificates along the way. In particular they did not bother notifying Mozilla, Microsoft, Apple or any other major company maintaining root CA programs.

To their credit, FoxIt  tried to investigate the extent of the damage by monitoring OCSP logs for users checking on the status of the forged Google certificate. There is a neat YouTube video showing the geographic distribution of locations around the world over time. Unfortunately while this half-baked attempt at forensics makes for great visualization, it presents a very limited picture of impacted users.

First not all clients check for revocation. The settings often depend on the browser versions. For example starting with Internet Explorer 7, IE enables revocation checking by default— but the user can always override this setting.

Second even those web browser configured to check may not be capable of using OCSP. In particular Windows XP does not support that feature, which means that clients that rely on the platform crypto functionality such as IE and Chrome, will fall back on using certificate revocation lists instead. The forged certificate contained both an OCSP responder and a CRL distribution point listed; in principle there are web server logs from the CDP– assuming DigiNotar was logging these. While the FoxIt report makes no reference to these additional records, there is an additional problem in that a CRL download is not specific to a particular certificate. It is known that the attackers successfully obtained bogus certificates capable of impersonating several popular websites including Tor and Microsoft. Nothing in the protocol reveals which bogus certificate a client is trying to ascertain the status for. In fact it may even be a “legitimate” user looking up the status for a valid certificate.

Diving into the gritty details of certificate revocation in Windows, we discover even more twists: it turns out that even the platforms capable of OCSP checking (namely, Vista and Windows 7) may opt for a CRL in certain situations. While OCSP is more efficient for finding out about one certificate, downloading the CRL may in the aggregate become worth the upfront cost when dozens of certificates by the same issuer are being validated. In the case of Windows, “dozens” is configured by a registry key set to 50 by default but this can be changed. In fact even the preference for OCSP over CRL can be inverted owing to the enterprise-friendly management capabilities of Windows, but it is unlikely that home-users were fine tweaking such settings.

Next the inherent cacheability of both OCSP responses and CRLs create more false negatives: during its validity period the response can be cached either by the user (suppressing multiple lookups) or even an intermediate caching proxy on the network, hiding additional hits to the OCSP responder even when users are being subjected to MITM attack repeatedly.

There is an even more bizzare possibility that some of the hits against CRL or OCSP responder may in fact be false positives due to a feature known as pre-fetching.  Windows keeps track of certificates validated in the past and can download CRLs or cache OCSP responses preemptively, before the certificate is encountered again. In the case of OCSP this is not a true false positive; the user would have to have encountered the forged certificate several times in the past before the heuristics kick-in– but it does suggest that not every blip on the map corresponds to an attack having taken place at that instant.

Finally there is a conceptual problem with FoxIt forensics: TLS protocol supports an optimization known as OCSP stapling. In this model, the web server itself obtains an OCSP response from the certificate authority attesting to the freshness of the credential. This response is sent down to the client during the TLS handshake, freeing the client from having to  do its own OCSP lookup– since these responses are signed, the client can be assured that the answer is authentic. (Modulo the inclusion of a nonce, to be precise.) In this case until DigiNotar noticed the bogus Google certificate, the OCSP responder would have returned status “good” on the certificate– even though it was never issued in the first place. Depending on one’s perspective, this is either bad design on the part of OSCP protocol or yet another instance of DigiNotar incompetence. As such even if we assumed that 100% of clients are inclined to consult the OCSP responder, an attacker with man-in-the-middle capabilities can render that unnecessary by providing the answer as part of the same connection they intercepted. Without knowing the capabilities of the attacker and whether they took this extra step, there is no way to know if the OCSP logs only represent hits from users whose web browser does not grok the stapling extension. (There is also the more crude attack that involves dropping or degrading communications sent to the OCSP responder, again resulting in a very large number of false negatives.) The forensics implicitly assume limited resourcefulness on the part of attacker. This assumption is not warranted. On the contrary, everything in this picture suggests that whatever mistakes the attackers made– such as getting caught by Chrome root pinning feature– are dwarfed by the sheer incompetence and gross negligence of DigiNotar itself.

CP

The limits of certificate revocation

In the wake of the DigiNotar debacle, it is time to revisit a question that inevitably comes up each time another certificate authority makes a mistake: does certificate revocation help? The short-answer is it turns out, probably not.

Briefly, revocation checks refer to additional steps used to verify the validity of a digital certificate that involves communicating with the issues over the network. These are in addition to the local checks, such as verifying the signature, checking that the certificate is not expired and comparing the name on the certificate to the expected ID. Local checks are cheap from a computational perspective, but they are also static: if a certificate passes these checks once, it will continue to pass them until the expiration date. If the assertions made on the certificate are invalidated at a later time– for example, the user loses their private key– we can not find out about such changes merely by inspecting the certificate one more time. Instead we have to go

There are two ways revocation status can be checked. Simplifying somewhat:

  • Certificate revocation lists or CRLs. CRLs are giant lists of all revoked certificates, periodically published by the issuer. Anyone can download these and check if a particular certificate is on the list.
  • Online Certificate Status Protocol, OCSP.  In this model one queries the issuer directly about one particular certificate, asking in effect “what is the latest news on this certificate?”

OCSP addresses one of the main challenges of CRLs. Verifying the validity of a single certificate using a CRL involves downloading a massive registry containing thousands of other, completely unrelated revoked certificates. This is problematic because often certificate validation is a performance bottleneck for latency. For example when connecting to a website using SSL, the certificate of the server must be validated. If the user is being extra cautious and includes revocation checking, then any communication with that website is now blocked on the completion of that step. While CRLs can be cached for future use and do not have to be downloaded each time, the cost to “bootstrap” from an empty cache can be prohibitive. In these situations a single OCSP query can be more efficient than an extended CRL download. On the other hand, if hundreds of certificates need to be verified from the same issuer, we reach a cross-over point where economies of scale confer an advantage on the bulk-mode operation with CRLs.

Armed with these options, it looks at first sight that incidents along the lines of DigiNotar can be contained by promptly revoking the improperly issued certificates, such as the bogus GMail certificate discovered in the wild for intercepting traffic to Google. The problem is the revocation model itself assumes a certain pattern of limited, isolated “mistakes” on the part of the certificate authority. Failure modes beyond that are outside the scope of the threat model, and can not be mitigated using either CRLs or OCSP.

The standard example of a certificate authority “mistake” is issuing a certificate to the wrong person. To sketch a hypothetical scenario: someone calls up Acme CA, introducing himself as a Microsoft employee. They request a certificate for login.live.com, the authentication service used by virtually all MSFT online services. Acme CA does not vet the identity of this requestor properly (and why should they? they are getting paid to issue certificates, not for saying “no” as pointed out in earlier post on misaligned economic incentives) issuing the requested certificate to this unauthorized person, who turns out to be working for a repressive government trying to eavesdrop on citizens’ communications.

Both CRLs and OCSP are tailor-made for this scenario. Once the clueless CA realizes their mistake (“what do you mean MSFT has their own cross-signed CA and has no reason to get one-off certificates from us?”) they can blacklist this certificate. It will appear in the next CRL published, and for those who can not wait that long, the OCSP responder will immediately start reporting a revoked status to anyone that asks. Admittedly this best-case scenario is still glossing over the subtleties of which pieces of commonly used software are in fact checking for revocation by default or for that matter what happens if revocation checks fail for unrelated reasons such as network flakiness, or even active attacks as pointed out in 2009 by Moxie Marlinspike in a Blackhat talk. One can argue that suboptimal decisions by client implementations can not be blamed on the protocol itself.

But there is a different failure mode for CAs that does not fit the convenient pattern described above. In the earlier example we assumed that:
1. CA remains in control of their private key– the key itself has not been shipped off to China
2. CA remains in control of the certificate fields being signed; for example the serial number, key usage, expiration dates etc. are all set according to the usual procedures. The only wrong field is the so-called “distinguished name” identifying the purported owner.
Clearly #2 implies #1, as attackers can fill in the blanks in the certificate fields to their heart’s content if they had direct posession of the signing key.  In that sense #2 is a strong requirement, and it turns out that if this is violated both CRL and OCSP are toast.

Some of the issues lie in the protocol details: as pointed out, OCSP uses serial numbers to identify certificates and so does the CRL format, as explained in this MSDN article. Serial numbers are just a field in the certificate. A sufficiently misguided CA could end up reusing the same serial number, one for issuing a healthy certificate to its rightful owner, and a different completely unrelated certificate to unauthorized persons. This means that the very nature of an OCSP query is ambiguous, and in the best case scenario revoking a forged certificate will have “collateral damage” on other benign certificates.

But even if we had better identifiers to uniquely identify certificate (the hash would have been an obvious choice) there are structural problems in the design.

The first problem is that both CRL and OCSP responses are digitally signed, either directly by that CA, or in delegation scenarios, by another certificate issued by the CA. If the attackers had free reign to obtain arbitrary certificates from the issuer, they were in a position to obtain the credentials required to also forge CRL and OCSP responses on behalf of the CA. When the end user decides to check on the status of that bogus GMail certificate being used to intercept their private communications, our attackers would substitute an equally bogus response that appears to originate from the OCSP responder in effect saying, “move along, these are not the revoked certificates you are looking for.” (Incidentally the same logical circularity applies to the “CA compromise” status code defined in CRL: if the certificate authority itself has been compromised, the client has no expectation of being able to trust the information in a CRL and the attacker can make up arbitrary CRLs.)

The second problem is that the location of OCSP responder and  the CLR distribution point are themselves fields in the certificate. If the miscreants had freedom to craft their own certificate and get it signed (instead of conforming to a template defined by the CA) they can simply omit the Authority Information Access field containing the OCSP responder’s location, or point the CDP at a random location controlled by the attacker.

Third the CA often has no idea what certificates have been issued, if the attack has circumvented the usual enrollment process. The serial number, distinguished name or other details required to blacklist the certificate via revocation may not have been logged. In this case the CA only knows that some fraudulent certificates were issued, but they have no idea which sites can be targetted. Until the certificate is observed in the wild, there is nothing to revoke.

Finally– and this plagues the recovery efforts– often it is not possible to determine conclusively whether the mishap experienced by the CA indeed amounts to a few isolated cases conforming to the pattern anticiapted by revocation designs, or if attackers managed to breach the process itself at a deeper level beyond repair. Both ineptitude and economic incentives are at work in this uncertainty. On the one hand, logs maybe incomplete, or the forensics inconclusive to say either way. On the other hand, PR pressures motivate organizations to minimize the perceived damage and hope for the best, until hard evidence proves otherwise. One look no further than the persistant denial by RSA that SecurID breach would have no impact on customers– until Lockheed Martin incident forced them to admit otherwise. DigiNotar set a shining example of transparency here by remaining quiet about the breach until the MITM attack in Iran surfaced. Days after the incident they still lacked a coherent story on what exactly went on. Faced with such negligence, it is better to assume the worst and not depend on revocation for piecemeal mitigation.

CP

DigiNotar fail: “this time is different”

Incompetent certificate authorities endangering users with fraudulent certificates is nothing new. There have even been allegations of willful corruption, including accusations that at least one CA is a front for the Chinese government, a nation not exactly known for following due process when it comes to intercepting communications. From that perspective, the DigiNotar debacle would have been yet another anecdote in the growing repertoire of inept CA stories. Yet there were a few notable aspects to the incident that truly made this one different, and perhaps a welcome sign that the situation is improving.

First it was caught by end users because  of a feature in Chrome that “pins” root certificates for Google properties. This is surprising: a man-in-the-middle attack against SSL, armed with a valid certificate from a trusted issuer used to be transparent for all intents and purposes. After all, CAs are interchangeable: it does not matter whether Verisign or Honest Achmed issued the certificate for GMail– to the web browser they are equally trusted. This is great news for attackers; even if your website obtains its certificates from a semi-competent CA, they can simply go after any one of the remaining 100+ issuers in order to successfully impersonate your company. Root-pinning in Chrome is an experimental feature to mitigate this risk. It is not based on any standard, although it can be viewed as an extension to HTTP Strict Transport Security (HSTS): in addition to asserting that a website is available over SSL only, the website can assert that it only sources certificates from a small number of root CAs, working around the weakest-link-in-the-chain problem alluded to above. At least that is the idea; as implemented currently, this list of pinned websites is hard-coded into the Chrome web-browser source code. Sites can only take advantage of this by submitting a request to Google to get added to that whitelist. This is clearly a model that will not scale long term.

But we digress– the other surprise in this debacle was the swift response from Mozilla– who maintains the Firefox web browser– and Microsoft. In the past, cases of “mistaken identity” were dealt with by individually blacklisting the issued certificates. In the case of Windows such certificates are usually these are shipped as part of a monthly security update, because revocation checking can be unreliable. For example a look at the certificate store on a Windows 7 installation shows about a dozen certificates marked “untrusted,” including those impersonating Microsoft, Google, Skype, Yahoo and Mozilla.

What about the CAs responsible for these mistakes? They are still in business, hopefully more enlightened from the experience and following more stringent identity checks procedures. But they are still listed as a trusted authority by Windows and Mozilla. Revoking that privilege must have seemed unthinkable: all certificates issued by that particular CA would be instantly invalidated. Any user trying to connect to a website using such a certificate would get an ominous error message displayed by their web browser, designed to be difficult to work around. Imagine the confusion and user-support costs if this were done for Verisign, one of the largest issuers on the planet. Even for a CA with relatively small market share (“UTN-USERFirst-Hardware”– responsible for a batch of fraudulent certificates, note the delicious irony in the name of putting users first) there is the risk that the offending CA may get upset (read: litigous) and go after MSFT for this affront to their self-esteem. The inherent asymmetry in resources makes this a difficult position for MSFT: they have deep pockets and very large market share, it would be easy to paint this as a case of big-bad Microsoft effectively putting a struggling CA out of business. With tortious interference claims and hefty damages on one side, and an unappreciative press on the other side, it hardly seemed worth it to begin this fight.

Except that in the case of DigiNotar the unthinkable happened: first Mozilla and then Microsoft removed DigiNotar from its trusted roots. Mozilla prefaced their argument with: “… because the extent of the mis-issuance is not clear…” For the first time this reflects a conservative approach lacking in past incidents. Before it was the norm to assume that fraudulent certificates were isolated mistakes and unlucky days for an otherwise sound CA– this time neither Mozilla nor MSFT were willing to take that on faith, and instead removed the CA altogether. (Incidentally Chrome uses the system roots on Windows and NSS roots on other operating systems, effectively inheriting both of these.)

Apparently once the ineptitude reaches an egregious level, even certificate authorities face the consequences. Except that the “consequences” in this case are not of a financial nature: VASCO, the parent company, helpfully notes in their press release for the security incident that they “expects that the cost of [helping existing customers] will be minimal” and “… the first six months of 2011, revenue from the SSL and EVSSL business was less than Euro 100,000.” In other words, complete loss of the CA business due to gross negligence presents no serious risk to the company– even as their actions endangered the privacy of users around the world.

CP