All Flash, no substance: returning to a purist web

The announcement by MSFT that Internet Explorer 10 will have one browsing mode without plug-ins has naturally raised eyebrows. The move can be interpreted from two different angles:

1. Strategic strike aimed squarely against Adobe. By far the most dominant extension during the past decade has been Flash. Whatever limitations HTML and Javascript had– perceived or real– Flash was there to provide developers the escape hatch to add the all-important dancing squirrels to their website. MSFT made an ill-fated attempt to displace Flash with Silverlight. This crusade ended like many other homebrew technologies emerging out of Redmond in the past decade, in yet another confirmation that MSFT is too constrained by regulatory attention and no longer freely wields the immense market power it once held for single-handedly introducing new de facto standards. Silverlight tanked. Its one prominent customer Major League Baseball– which may motivated porting the technology to OS X, lest Mac using fans were left out– dropped Silverlight after the 2008 season to go with Flash for online broadcasts.

It is not surprising to see MSFT embrace HTML5 in response. This is standard operating procedure in platform wars. If a company can not force its own technology (eg Silverlight) on consumers, the next best thing is for an open standard (HTML5) to win– verses ceding the ground to a different proprietary offering (Flash) from a major competitor.

2. A less cynical reading of the move is that it represents a return to a simpler, “purist” interpretation of the web. After going through several iterations in the span of a few short years, HTML became stagnant at version 4.01– that recommendation was published in 1999. Meanwhile the demands of web applications continue to grow, particularly after the dotcom bubble exploded and the remains gave rise to web 2.0. Into this void stepped Flash. There had been earlier attempts to “enhance” the web experience: ActiveX to bring the full power and perils of Windows native programming, Java before that with the promise of write-once-run-anywhere. For the first time with Flash developer demand and technology had a happy meeting in the middle.

The downside is Flash fabricated whole new “conventions” out of thin air, and resurrected privacy and security problems that had been already solved before in the context of web standards. Web browsers greatly limited interaction between websites to prevent security risks, the so-called same origin policy that underlies web security model. Flash invented its own cross-domain access rules, creating vulnerabilities on websites– even on sites that did not use Flash, a cardinal sin for a technology.

Privacy also suffered: tracking cookies were the big scare in 2000, a time that seems positively innocent by contemporary standards. Eventually better cookie management functionality in the web browser and a half-hearted at a new policy language tamed the problem– for “cookies” as they were defined at the time. Flash introduced its own notion of client-side storage which could be used to achieve the same tracking capability as regular cookies, yet remained outside the purview of privacy enhancements implemented over the years to manage regular HTTP cookies. This was a clear boon to web services with dubious intentions for tracking users. Sure enough a study in 2009 found that Flash cookies are commonly used as back-up measure by many popular websites, to recreate regular cookies that are deleted by users. (Granted HTML5 also introduced its own notion of local storage, but at least web browsers provided users the control over this functionality. For the longest time the only way to delete Flash cookies was to visit a web page hosted by Adobe, in sharp contrast from centralized browser settings.)

From this perspective, disabling plugins and imposing strict HTML5 semantics on the chaos happening inside a web browser is a good development. HTML5 has come a long way– much of the functionality (cross-domain communication with access control, multiple threads, better graphics, video support etc.) is being incorporated into the standards. The need for an escape hatch whether in the form of Flash, Java or Silverlight to enable some hitherto impossible scenario weakens by day. For security professionals and privacy conscious users, the good news is there is one major place to focus efforts, instead of multiple surprises hidden in a homebrew design a developer implemented without the benefit of public scrutiny a standard receives.

DigiNotar: surveying the damage with OCSP

Heaving earned the dubious distinction as the first certificate authority ever to get booted from the root CA program due to a security breach, it is no surprise that DigiNotar is going into damage control mode. The parent company Vasco has solicited a third-party (FoxIt) to investigate their security incident and published the report. It is not clear if they were planning to score brownie points for transparency– as the report still withholds crucial information in the name of protecting confidential details about DigiNotar’s internal setup. The authors are also careful to dance around issues of culpability:

“Since the investigation has been more of a fact finding mission thus far, we will not draw any conclusions with regards to the network-setup and the security management system.”

Luckily readers are free to draw their own conclusions. In particular, there appears to be a clear-cut case of negligence: the company noticed and investigated a security breach as early as July, but their tepid response ended at revoking a few certificates along the way. In particular they did not bother notifying Mozilla, Microsoft, Apple or any other major company maintaining root CA programs.

To their credit, FoxIt tried to investigate the extent of the damage by monitoring OCSP logs for users checking on the status of the forged Google certificate. There is a neat YouTube video showing the geographic distribution of locations around the world over time. Unfortunately while this half-baked attempt at forensics makes for great visualization, it presents a very limited picture of impacted users.

First not all clients check for revocation. The settings often depend on the browser versions. For example starting with Internet Explorer 7, IE enables revocation checking by default— but the user can always override this setting.

Second even those web browser configured to check may not be capable of using OCSP. In particular Windows XP does not support that feature, which means that clients that rely on the platform crypto functionality such as IE and Chrome, will fall back on using certificate revocation lists instead. The forged certificate contained both an OCSP responder and a CRL distribution point listed; in principle there are web server logs from the CDP– assuming DigiNotar was logging these. While the FoxIt report makes no reference to these additional records, there is an additional problem in that a CRL download is not specific to a particular certificate. It is known that the attackers successfully obtained bogus certificates capable of impersonating several popular websites including Tor and Microsoft. Nothing in the protocol reveals which bogus certificate a client is trying to ascertain the status for. In fact it may even be a “legitimate” user looking up the status for a valid certificate.

Diving into the gritty details of certificate revocation in Windows, we discover even more twists: it turns out that even the platforms capable of OCSP checking (namely, Vista and Windows 7) may opt for a CRL in certain situations. While OCSP is more efficient for finding out about one certificate, downloading the CRL may in the aggregate become worth the upfront cost when dozens of certificates by the same issuer are being validated. In the case of Windows, “dozens” is configured by a registry key set to 50 by default but this can be changed. In fact even the preference for OCSP over CRL can be inverted owing to the enterprise-friendly management capabilities of Windows, but it is unlikely that home-users were fine tweaking such settings.

Next the inherent cacheability of both OCSP responses and CRLs create more false negatives: during its validity period the response can be cached either by the user (suppressing multiple lookups) or even an intermediate caching proxy on the network, hiding additional hits to the OCSP responder even when users are being subjected to MITM attack repeatedly.

There is an even more bizzare possibility that some of the hits against CRL or OCSP responder may in fact be false positives due to a feature known as pre-fetching. Windows keeps track of certificates validated in the past and can download CRLs or cache OCSP responses preemptively, before the certificate is encountered again. In the case of OCSP this is not a true false positive; the user would have to have encountered the forged certificate several times in the past before the heuristics kick-in– but it does suggest that not every blip on the map corresponds to an attack having taken place at that instant.

Finally there is a conceptual problem with FoxIt forensics: TLS protocol supports an optimization known as OCSP stapling. In this model, the web server itself obtains an OCSP response from the certificate authority attesting to the freshness of the credential. This response is sent down to the client during the TLS handshake, freeing the client from having to do its own OCSP lookup– since these responses are signed, the client can be assured that the answer is authentic. (Modulo the inclusion of a nonce, to be precise.) In this case until DigiNotar noticed the bogus Google certificate, the OCSP responder would have returned status “good” on the certificate– even though it was never issued in the first place. Depending on one’s perspective, this is either bad design on the part of OSCP protocol or yet another instance of DigiNotar incompetence. As such even if we assumed that 100% of clients are inclined to consult the OCSP responder, an attacker with man-in-the-middle capabilities can render that unnecessary by providing the answer as part of the same connection they intercepted. Without knowing the capabilities of the attacker and whether they took this extra step, there is no way to know if the OCSP logs only represent hits from users whose web browser does not grok the stapling extension. (There is also the more crude attack that involves dropping or degrading communications sent to the OCSP responder, again resulting in a very large number of false negatives.) The forensics implicitly assume limited resourcefulness on the part of attacker. This assumption is not warranted. On the contrary, everything in this picture suggests that whatever mistakes the attackers made– such as getting caught by Chrome root pinning feature– are dwarfed by the sheer incompetence and gross negligence of DigiNotar itself.

The limits of certificate revocation

In the wake of the DigiNotar debacle, it is time to revisit a question that inevitably comes up each time another certificate authority makes a mistake: does certificate revocation help? The short-answer is it turns out, probably not.

Briefly, revocation checks refer to additional steps used to verify the validity of a digital certificate that involves communicating with the issues over the network. These are in addition to the local checks, such as verifying the signature, checking that the certificate is not expired and comparing the name on the certificate to the expected ID. Local checks are cheap from a computational perspective, but they are also static: if a certificate passes these checks once, it will continue to pass them until the expiration date. If the assertions made on the certificate are invalidated at a later time– for example, the user loses their private key– we can not find out about such changes merely by inspecting the certificate one more time. Instead we have to go

There are two ways revocation status can be checked. Simplifying somewhat:

Certificate revocation lists or CRLs. CRLs are giant lists of all revoked certificates, periodically published by the issuer. Anyone can download these and check if a particular certificate is on the list.
Online Certificate Status Protocol, OCSP. In this model one queries the issuer directly about one particular certificate, asking in effect “what is the latest news on this certificate?”

OCSP addresses one of the main challenges of CRLs. Verifying the validity of a single certificate using a CRL involves downloading a massive registry containing thousands of other, completely unrelated revoked certificates. This is problematic because often certificate validation is a performance bottleneck for latency. For example when connecting to a website using SSL, the certificate of the server must be validated. If the user is being extra cautious and includes revocation checking, then any communication with that website is now blocked on the completion of that step. While CRLs can be cached for future use and do not have to be downloaded each time, the cost to “bootstrap” from an empty cache can be prohibitive. In these situations a single OCSP query can be more efficient than an extended CRL download. On the other hand, if hundreds of certificates need to be verified from the same issuer, we reach a cross-over point where economies of scale confer an advantage on the bulk-mode operation with CRLs.

Armed with these options, it looks at first sight that incidents along the lines of DigiNotar can be contained by promptly revoking the improperly issued certificates, such as the bogus GMail certificate discovered in the wild for intercepting traffic to Google. The problem is the revocation model itself assumes a certain pattern of limited, isolated “mistakes” on the part of the certificate authority. Failure modes beyond that are outside the scope of the threat model, and can not be mitigated using either CRLs or OCSP.

The standard example of a certificate authority “mistake” is issuing a certificate to the wrong person. To sketch a hypothetical scenario: someone calls up Acme CA, introducing himself as a Microsoft employee. They request a certificate for login.live.com, the authentication service used by virtually all MSFT online services. Acme CA does not vet the identity of this requestor properly (and why should they? they are getting paid to issue certificates, not for saying “no” as pointed out in earlier post on misaligned economic incentives) issuing the requested certificate to this unauthorized person, who turns out to be working for a repressive government trying to eavesdrop on citizens’ communications.

Both CRLs and OCSP are tailor-made for this scenario. Once the clueless CA realizes their mistake (“what do you mean MSFT has their own cross-signed CA and has no reason to get one-off certificates from us?”) they can blacklist this certificate. It will appear in the next CRL published, and for those who can not wait that long, the OCSP responder will immediately start reporting a revoked status to anyone that asks. Admittedly this best-case scenario is still glossing over the subtleties of which pieces of commonly used software are in fact checking for revocation by default or for that matter what happens if revocation checks fail for unrelated reasons such as network flakiness, or even active attacks as pointed out in 2009 by Moxie Marlinspike in a Blackhat talk. One can argue that suboptimal decisions by client implementations can not be blamed on the protocol itself.

But there is a different failure mode for CAs that does not fit the convenient pattern described above. In the earlier example we assumed that:
1. CA remains in control of their private key– the key itself has not been shipped off to China
2. CA remains in control of the certificate fields being signed; for example the serial number, key usage, expiration dates etc. are all set according to the usual procedures. The only wrong field is the so-called “distinguished name” identifying the purported owner.
Clearly #2 implies #1, as attackers can fill in the blanks in the certificate fields to their heart’s content if they had direct posession of the signing key. In that sense #2 is a strong requirement, and it turns out that if this is violated both CRL and OCSP are toast.

Some of the issues lie in the protocol details: as pointed out, OCSP uses serial numbers to identify certificates and so does the CRL format, as explained in this MSDN article. Serial numbers are just a field in the certificate. A sufficiently misguided CA could end up reusing the same serial number, one for issuing a healthy certificate to its rightful owner, and a different completely unrelated certificate to unauthorized persons. This means that the very nature of an OCSP query is ambiguous, and in the best case scenario revoking a forged certificate will have “collateral damage” on other benign certificates.

But even if we had better identifiers to uniquely identify certificate (the hash would have been an obvious choice) there are structural problems in the design.

The first problem is that both CRL and OCSP responses are digitally signed, either directly by that CA, or in delegation scenarios, by another certificate issued by the CA. If the attackers had free reign to obtain arbitrary certificates from the issuer, they were in a position to obtain the credentials required to also forge CRL and OCSP responses on behalf of the CA. When the end user decides to check on the status of that bogus GMail certificate being used to intercept their private communications, our attackers would substitute an equally bogus response that appears to originate from the OCSP responder in effect saying, “move along, these are not the revoked certificates you are looking for.” (Incidentally the same logical circularity applies to the “CA compromise” status code defined in CRL: if the certificate authority itself has been compromised, the client has no expectation of being able to trust the information in a CRL and the attacker can make up arbitrary CRLs.)

The second problem is that the location of OCSP responder and the CLR distribution point are themselves fields in the certificate. If the miscreants had freedom to craft their own certificate and get it signed (instead of conforming to a template defined by the CA) they can simply omit the Authority Information Access field containing the OCSP responder’s location, or point the CDP at a random location controlled by the attacker.

Third the CA often has no idea what certificates have been issued, if the attack has circumvented the usual enrollment process. The serial number, distinguished name or other details required to blacklist the certificate via revocation may not have been logged. In this case the CA only knows that some fraudulent certificates were issued, but they have no idea which sites can be targetted. Until the certificate is observed in the wild, there is nothing to revoke.

Finally– and this plagues the recovery efforts– often it is not possible to determine conclusively whether the mishap experienced by the CA indeed amounts to a few isolated cases conforming to the pattern anticiapted by revocation designs, or if attackers managed to breach the process itself at a deeper level beyond repair. Both ineptitude and economic incentives are at work in this uncertainty. On the one hand, logs maybe incomplete, or the forensics inconclusive to say either way. On the other hand, PR pressures motivate organizations to minimize the perceived damage and hope for the best, until hard evidence proves otherwise. One look no further than the persistant denial by RSA that SecurID breach would have no impact on customers– until Lockheed Martin incident forced them to admit otherwise. DigiNotar set a shining example of transparency here by remaining quiet about the breach until the MITM attack in Iran surfaced. Days after the incident they still lacked a coherent story on what exactly went on. Faced with such negligence, it is better to assume the worst and not depend on revocation for piecemeal mitigation.

DigiNotar fail: “this time is different”

Incompetent certificate authorities endangering users with fraudulent certificates is nothing new. There have even been allegations of willful corruption, including accusations that at least one CA is a front for the Chinese government, a nation not exactly known for following due process when it comes to intercepting communications. From that perspective, the DigiNotar debacle would have been yet another anecdote in the growing repertoire of inept CA stories. Yet there were a few notable aspects to the incident that truly made this one different, and perhaps a welcome sign that the situation is improving.

First it was caught by end users because of a feature in Chrome that “pins” root certificates for Google properties. This is surprising: a man-in-the-middle attack against SSL, armed with a valid certificate from a trusted issuer used to be transparent for all intents and purposes. After all, CAs are interchangeable: it does not matter whether Verisign or Honest Achmed issued the certificate for GMail– to the web browser they are equally trusted. This is great news for attackers; even if your website obtains its certificates from a semi-competent CA, they can simply go after any one of the remaining 100+ issuers in order to successfully impersonate your company. Root-pinning in Chrome is an experimental feature to mitigate this risk. It is not based on any standard, although it can be viewed as an extension to HTTP Strict Transport Security (HSTS): in addition to asserting that a website is available over SSL only, the website can assert that it only sources certificates from a small number of root CAs, working around the weakest-link-in-the-chain problem alluded to above. At least that is the idea; as implemented currently, this list of pinned websites is hard-coded into the Chrome web-browser source code. Sites can only take advantage of this by submitting a request to Google to get added to that whitelist. This is clearly a model that will not scale long term.

But we digress– the other surprise in this debacle was the swift response from Mozilla– who maintains the Firefox web browser– and Microsoft. In the past, cases of “mistaken identity” were dealt with by individually blacklisting the issued certificates. In the case of Windows such certificates are usually these are shipped as part of a monthly security update, because revocation checking can be unreliable. For example a look at the certificate store on a Windows 7 installation shows about a dozen certificates marked “untrusted,” including those impersonating Microsoft, Google, Skype, Yahoo and Mozilla.

What about the CAs responsible for these mistakes? They are still in business, hopefully more enlightened from the experience and following more stringent identity checks procedures. But they are still listed as a trusted authority by Windows and Mozilla. Revoking that privilege must have seemed unthinkable: all certificates issued by that particular CA would be instantly invalidated. Any user trying to connect to a website using such a certificate would get an ominous error message displayed by their web browser, designed to be difficult to work around. Imagine the confusion and user-support costs if this were done for Verisign, one of the largest issuers on the planet. Even for a CA with relatively small market share (“UTN-USERFirst-Hardware”– responsible for a batch of fraudulent certificates, note the delicious irony in the name of putting users first) there is the risk that the offending CA may get upset (read: litigous) and go after MSFT for this affront to their self-esteem. The inherent asymmetry in resources makes this a difficult position for MSFT: they have deep pockets and very large market share, it would be easy to paint this as a case of big-bad Microsoft effectively putting a struggling CA out of business. With tortious interference claims and hefty damages on one side, and an unappreciative press on the other side, it hardly seemed worth it to begin this fight.

Except that in the case of DigiNotar the unthinkable happened: first Mozilla and then Microsoft removed DigiNotar from its trusted roots. Mozilla prefaced their argument with: “… because the extent of the mis-issuance is not clear…” For the first time this reflects a conservative approach lacking in past incidents. Before it was the norm to assume that fraudulent certificates were isolated mistakes and unlucky days for an otherwise sound CA– this time neither Mozilla nor MSFT were willing to take that on faith, and instead removed the CA altogether. (Incidentally Chrome uses the system roots on Windows and NSS roots on other operating systems, effectively inheriting both of these.)

Apparently once the ineptitude reaches an egregious level, even certificate authorities face the consequences. Except that the “consequences” in this case are not of a financial nature: VASCO, the parent company, helpfully notes in their press release for the security incident that they “expects that the cost of [helping existing customers] will be minimal” and “… the first six months of 2011, revenue from the SSL and EVSSL business was less than Euro 100,000.” In other words, complete loss of the CA business due to gross negligence presents no serious risk to the company– even as their actions endangered the privacy of users around the world.

RSA breach redux: a well-deserved pwnie

The Pwnie award winners were announced yesterday at the Blackhat Briefings in Las Vegas. In some categories, there was little suspense, with the winners almost a complete lock-in. Not surprisingly RSA got the nod for Lamest Vendor Response. This is a good time to revisit what made thxe RSA incident particularly significant.

The issue is not the breach itself: security incidents are a part of life. Even organizations with strong culture of risk management are not prefect. It’s instructive to note that RSA did not win for Most Epic Fail; surely they had formidable competition in Sony, almost untouchable for its string of remarkably bone-headed moves in 2011 culminating with the PlayStation network breach. What made the SecurID case stand-out was deliberate, pernicious and persistent attempts by RSA to conflate company bottom-line with customer best-interest, in the face of clear cut evidence to the contrary.

For background: RSA has presence in several product lines, ranging from licensing their cryptographic library BSAFE (which used to power Windows crypto API in the ancient days before it was retired in favor of Peter Montgomery’s native implementation) to providing a data-loss prevention suite. Yet the 2-factor authentication product SecurID stands out as one of the main revenue generators. SecurID is a small, tamper-resistant gadget which stores a secret key (the “seed”) and uses that seed to generate so-called one-time passwords or OTPs. Typically an OTP is sequence of 6 digit numbers that change over time or with the press of a button on the gadget. These numbers are used to authenticate users, usually in combination with a password, providing an additional degree of assurance that the user is who they claim to be. Unlike passwords, OTPs change constantly. This is good news for the user and bad news for prospective attackers: tricking the legitimate user into revealing one OTP does not help to predict the next one that would be required to masquerade as that user.

In reality SecurID model involves two very different and completely orthogonal businesses bundled into one:

Selling hardware that generates OTPs according to a sound cryptographic algorithm. Here we have a traditional market in physical goods. One unit is sold for each user and it is strictly a transactional business. Short of replacements for lost/malfunctioning tokens, there is no ongoing relationship with the customer.
Managing the life-cycle of identities associated with the dongles. This is the catch associated with OTP generation: the same secret sauce used to generate an OTP is also required to verify when a user submits one. That means whoever is charged with verifying OTPs must have access to the same secrets. This is an ongoing service, not a one-time sale. Each time a user shows up to authenticate and submits what they claim to be their OTP, there must be code running on some machine which accepts the user claim as input, compares it against the correct OTP generated from a copy of the secret key and responds with yay/nay.

In the case of SecurID, it is always RSA programming the SecurID tokens with seeds that are generated by RSA and it is always RSA running the servers that can authoritatively declare whether an OTP submission is correct or not. Customers have no choice in the matter. Yet this bundling is far from a foregone conclusion. It is perfectly possible to produce tamper-resistant OTP hardware which can be programmed in the field by customers, and provisioned with new keys that are not known to the manufacturer. Case in point is the Yubikey, a USB-based token that “injects” OTP directly into an application as a sequence of keystrokes. In this model, the hardware provider simply ships the gadgets to an organization. The IT department in the organization is responsible for programming them with keys, assigning the gadgets to their users, keeping track of which user got which gadget and the associated seeds.

Why did RSA insist on doing both? Simply because selling hardware is a commodity business where vendors can only compete on price. Granted, not all hardware is identical and there are indeed varying qualities of tamper resistance. Extracting the seed from the OTP generator by cracking open the case and inspecting the circuity inside would be considered a fatal attack on the system. Over the years different designs have exhibited different levels of difficulty in resisting such attacks. But few customers worry about such nuances: given an agreed-upon scheme for generating these OTPs (for example the open-standard OATH— incidentally not used by SecurID which has a proprietary algorithm) most enterprises would be interested in the cheapest hardware that can do the job. That would be a very competitive market where prices decline and no-vendor has lock in power. If vendor Bob decided to over-charge for their tokens, vendor Alice could undercut him with cheaper units that generated the same OTPs and functioned identically. Even worse, selling hardware is a one-time transaction. Short of replacing the gadget, there are few other revenue opportunities, certainly no lock-in effects. That means the vendor is faced with the choice of either shipping low-quality hardware that breaks down often– preferably just outside the warranty period– and requires frequent replacements, leading to unhappy customers (keeping in mind, this is critical authentication infrastructure: if the OTP generator breaks down, the user can not get their work done) or shipping a bullet-proof product that all but guarantees they will never hear from that customer for another 5 years.

The RSA solution to this dilemma is to bundle the service with the hardware. The nice thing about a service is that the customer must continue paying for it every year. By inserting themselves into the middle of every authentication event between an organization and one of their employees, RSA ensures a steady revenue, artificially rendering itself “indispensable” to the organization in question. But at what cost? The downsides to this arrangement are obvious, and in retrospect did not require a spectacular security breach with suspected nation-state sponsor:

Availability: there is a massive single point-of-failure in the service maintained by RSA for verifying every OTP transactions from every SecurID token in existence. If for some reason, that service is not accessible or experiences an outage, every single customer of SecurID is impacted. In effect work grinds to a halt because nobody can authenticate. (Note there is an entirely different deployment problem in trying to use SecurID inside an air-gapped network disconnected from the Internet: in this case RSA services are not even reachable.) This is not purely a question of whether RSA can build services with higher uptime than a particular customer. For some customers it may well be the case that they are better off outsourcing this to RSA. In other cases it is not clear which side can maintain higher availability and lower latency: when this blogger was working on Windows Live ID, one of the primary concerns leading to the rejection of SecurID centered around taking a dependency on a third party for user authentication. If employees in an organization can not get their work done because authentication is failing, the organization bears the cost directly. RSA may refund some of their SecurID service fees, but the liability remains limited compared to the unbounded cost for the customer.
Security: More importantly, RSA becomes a critical security dependency on an ongoing basis. A vendor delivering hardware must be trusted to have built decent tamper-resistance, not have any secret backdoors installed into the unit etc. But this is a one-time event. Once the hardware has been etched, potted, molded and put in a crate for delivery, any news that the manufacturer got 0wned can be met with a shrug. By contrast if the verification service with a copy of all OTP seeds ever gets breached– demonstrated in dramatic fashion by RSA– it is an emergency for all customers.

It is difficult to fault RSA for acting in the interests of their shareholders, and pushing the bundled service model as long as customers were willing to go along. The major miscue came when the incident threatened to make it clear that RSA revenue and customer interests are far from being aligned. Instead of acting contrite, issuing a mea culpa and trying to appease customers for cheap PR points, RSA pursued a remarkably clueless strategy of vague press statements and unconvincing denial. By the time, Lockheed Martin and Level3 provided the smoking gun, it was too late for the company to have a shred of credibility.

Kudos to the Pwnie judges for crowning this over-qualified candidate for Lamest Vendor Response.

Windows Phone 7: dropping a generation of developers

One of the less discussed aspects of the Nokia-MSFT deal is impact on developers. After all platforms stand or fall on the strength of their applications. (Steve Ballmer wanted every MSFT employee to take this message to heart.) Windows was able to leverage this virtuous cycle to deliver a stunning upset to Apple in the 1990s, by creating a very attractive environment for developers to enrich the platform one application at a time. Windows API was stable, through two decades as everything changed about the basic PC architecture. CPUs went from 16 to x64, multiple cores and SMP became common, GPUs gained a prominent role, the Network became a critical component of writing code. Windows programming still looked the same. In fact “app compat” became one of the major costs in operating system development– through heroic engineering effort, buggy applications relying undocumented APIs in some archaic version of the operating system were coaxed into working properly on the latest and greatest kernel, lest the incompatibility deter some customer from upgrading.

The same approach applied to mobile programming. Even before smartphones, MSFT pursued handhelds with PocketPC. A subset of the venerable windows API was still present, and later expanded into Windows Mobile. Any developer familiar with programming desktop applications could, with a little effort, write code for mobile devices. To a large extent Apple took the same approach to allow its developers to transition from writing Mac OS-X applications to targetting the iPhone/iPad.

So it came as a major surprise that WP7 dropped backwards compatibility. Native code is now verboten, only .NET applications can be written. In one bold stroke, MSFT may have lost a generation of developers who grew up scrutinizing the MSDN documentation for the subtleties of the classic Windows API. An even bigger question is whether they will be able to court new developers and gain mindshare among those contemplating a career in development. From the perspective of a newly minted computer science graduate trying to decide which programming language/environment to excel in, the options are:

Learn C/C++ and take up any systems programming task. (This includes traditional Windows applications, an admittedly endangered species.)
Learn Java and program either server applications, or dive into mobile development with Blackberry or Android– the new hotness.
Learn Objective C and write OS X or iPhone applications. Also known as “objectionable-C” this was an attempt (emphasis on attempt) to add object-oriented features to C before Stroustrup did it the right way with C++. Outside of Cupertino few people care about it.
Learn C# and .NET to program… what exactly? For all the work on promoting the idea that .NET is cross-platform and beats Java at its own game of write-once-run-everywhere, the technology remains very much tied to MSFT platforms. This used to be seen only for enterprise line-of-business applications, and web applications running on Windows server variants: before Windows Phone 7 came along and mandated managed code. The problem is these are highly specialized segment of the market. Much like learning Objective-C and Cocoa development, it is not a portable skill useful in any other context. Unlike OS-X/iPhone, WP7 does not have a commanding presence in the market and proven revenue model with an app store.

It is difficult to justify taking that last option, except as a way to capitalize on lack of competition. In other words, since every one else is writing for iPhone and Android, one viable strategy for ISVs may be churning out copy-cat WP7 applications styled after the popular ones on the leading platforms. But this is hardly a sustainable model, or an appealing proposition for a new developer seeking challenging work.

HP: over-engineering and under-delivering support

Vignettes from setting up a new PC.

Starting point: new HP machine, reimaged from scratch with Windows 7. Naturally most of the devices are not recognized, because the drivers do not ship out of the box with the OS. The biggest problem is the network card: once the machine can reach out to the Internets, remaining drivers will become easy to download. Only problem: there is no indication anywhere about the type of network card used. It is not on the HP website: “integrated network card” is about as specific as the documentation gets– ashamed of the brand? It is not listed anywhere on the paperwork that arrived with the machine. Windows itself provides no clues, after the OS attempts loading the generic Ethernet card driver, which predictably fails to start the device correctly or get any information such as manufacturer ID out of it.

First plan: look for drivers on the HP website, under the support section. No drivers for the specific model. Luckily this is a part of a series of models, virtually identical except for different processor/memory options. One of the other models has 11 drivers posted. Minor problem: they are all packaged as executables named SP<random number>.exe. There is no reason for device drivers to be delivered this way: presumably it simplifies installation. Except it does not work. Every single one fails with a complaint that the package can not be installed because the machine fails to meet minimum requirements. (This is W7 ultimate edition– the machine shipped from the factory with home SKU, presumably deemed worthy of the driver packages. What is it about the higher version that HP considers insufficient?) This is an example of making the support brittle by trying to get too fancy: if HP had made the raw drivers available instead of holding them hostage inside buggy installers, it would have been the end of the story.

Second plan: time to contact support. The idea of a customer flattening the box and reinstalling a new OS from scratch is clearly an unusual scenario that stymies the eager support representative. (Not to mention that the IM conversation is taking place on a different computer because there is no networking on the machine under investigation.) He continues sending a series of links to the driver-packages, none of which install correctly because HP tried being too smart about detecting when a version of Windows was worthy of the update. Eventually this blogger manages to get across the message that raw drivers are necessary. At this point the Macbook Pro freezes, but the support rep dutifully sends an email with the links, along with the disclaimer that the links will be taking this blogger to a non-HP website where they are not responsible for content– that can only be good news considering the HP site added exactly zero value to this endeavour. This is enough to reveal the brand of the mysterious network card: Realtek (which does indeed have some detractors that could have given HP a pause) and locate the correct drivers.

Friendly spam: account hijacking and unintended consequences

In the past week, this blogger received links from two friends hawking shady pharmaceutical products: one was sent from a GMail account, and the other directly scrawled on the Facebook wall. This was odd, to say the least. Both friends remain gainfully employed, and unlikely to dabble in direct marketing on the sides: one is at MSFT, and the other works in financial services in Manhattan. Instead they had become victims of an account takeover, perhaps falling for a phishing scam, maybe logging into their accounts from a public computer infected by malware, or perhaps in the worst case scenario one of their personal machines had been 0wned.

So far, nothing new: in modern society, phishing attacks and large-scale machine compromise, compliments of Adobe, Sun/Oracle and MSFT are par for the course. What is unusual is the way the attackers are trying to leverage access: sending spam to other email address on the contact list. All things considered, this is a very mild outcome. A couple of factors may be at work:

Spam is economically viable. So much that attackers do not bother with trying to extract more value from compromised accounts. The revenue opportunity in spam has been well-studied in the security literature. The novel twist here is that the message is coming from a friend, and may have even higher click-through rates. (Keeping in mind that spamming is a very noisy activity. Eventually one of the friends on that contact list is bound to reply and inform the victim that their account has been 0wned.)
There is a surplus of compromised accounts out there, so much that attackers do not have the time to manually sort through each one and identify interesting ones. Presumably the personal email account of a financial analyst is worth more than that of an average Hotmail user. Even though it is not their work email, there may still be connections, interesting messages or stepping stones to other accounts. Using that account for indiscriminate spam seems inefficient, a waste of opportunity.
Attackers have not been able to automate the classification of each account as high/low value target. If so this is only a temporary roadblock. Given the profile information from an account (very likely includes the real name) it would be relatively easy for an individual to run a Google search for that person. Facebook accounts makes this easier by identifying networks/groups/past employers. Even running simple keyword searches in mail eg for names of banks, phrases appearing in legal briefings, could be used as the basis of heuristics to locate accounts with useful information.

Finally the proliferation of spam from friendly channels could be an encouraging sign that spam filters have gotten very good– to the point that attackers find it necessary to take over legitimate accounts and exploit existing trust relationships to their contacts as more reliable delivery mechanism. In that case the war on spam would have the highly ironic side-effect of increasing the pressure on existing user accounts.

Imperfect censorship: making sense of web blocking statistics

Compare the availability of YouTube in 2 different countries around same time-frame, during the past year:

(These graphs are from the government transparency website, which also contains information about legal subpoenas for information.)

Both countries had been blocking YouTube, among other Google services but the charts appear to indicate that the censorship in Turkey has been less than watertight. There are noticeable dips at the beginning of April and June, with occasional spikes that may either be a temporary failure in the blocking or perhaps an organic spike in volume. The blocks appear to be lifted in November, and traffic once again recovers. What is unusual is that the normalized volume never flatlines, does not go down to zero or even hover around the single digit percentages.

Contrast that with Iran, where the percentages never climb above a few percent and hover below 1% after what appears to be a tweak to the censorship implementation around August.

Case study on the perils of identity federation

This forceful critique (to put it mildly) of OpenID from a website/business owner perspective highlights one of the main leaps of faiths involved in federation: taking a dependency on a third party for the well-being of your own business.

There is a lot going on in the debacle described in the original post. Some of them could be attributed to “implementation issues,” the vague catch-all category that is the equivalent of “pilot error” we fall back on to explain away incidents without attributing a systematic cause: JanRain randomly changing APIs without proper communication, Google changing the identifier returned, inconsistency between user profiles returned by different OpenID providers etc. These are not supposed to happen– better change tracking could have prevented some of the bone-headed mistakes involved. Instability of the OpenID standard and general lack of interoperability among implementations is an unfortunate outcome of the highly politicized standards process that results from reluctant bringing together avowed enemies to the negotiating table. (Inexplicably the US government has decided to throw its weight behind this already hobbled standard, by empowering the Nationals Institute of Health to work on a pilot program for federal adoption.) But again this is business as usual in trying to forge consensus for Internet standards, and not intrinsic to the problem of OpenID in particular or interoperability in general.

At the same time there are deeper issues at play, and these are inimical to any identity federation scheme . To quote the metaphor used by the original author:

[…] of all the failure points in your business – you really don’t want the door to be locked while you stand behind the counter waiting for business. No, let me rephrase that: you don’t want the door jammed shut, completely unopenable while your customers wait outside – irate that you won’t let them in.

Put simply, when users login to your website using a third-party identity provider (“IDP”) your business is at the mercy of that provider. If they experience a service outage, users can not login to your website either. If they decide to experiment with brand new user interface that confuses half the users, your website loses traffic.

Some of the risks can be mitigated contractually. For example the IDP could commit to a particular service level agreement, saying they have an expected uptime of 99.99%. But no IDP in existence is willing to shoulder the burden of full liability for losses incurred at relying party sites. Your website can make a compelling case that the inability to authenticate new users for an hour has resulted in loss of a thousand dollars, going by historic traffic patterns. The most you are likely to get out of the IDP are profuse, heartfelt apologies and at best a refund for that month. The incentives are highly asymmetric.

One could argue that specialization and economies of scale will compensate for this: JanRain is presumably handling authentication for thousands of web sites. So they are in a position to invest in very high-reliability infrastructure and maintain strong security posture. In principle then they are less likely to experience an outage (compared to what each relying party is capable of) less likely to get breached in an embarrassing manner as Gawker recently managed to, and more likely to respond to security incidents quickly in the worst case scenario. On the other hand, as probabilities for catastrophic failure decrease, the damage potential from that failure is going way up. An outage or breach at JanRain impacts not just the author of that blog post, but every other business using the OpenID interop service. More importantly, this is not a linear function of number of users: scale attracts scrutiny, both from white hat researchers and black-hats looking to capitalize on a lucrative target.

The above scenario only considered unintentional outages. What about cases where service is withheld on purpose? Presumably the IDP is getting paid by the site for their service. What happens when it is time to renew the contract? What if negotiations with the IDP go south and they decide to hold your users “hostage,” by refusing to authenticate them to any RP except yours until you agree to the higher price? If users are only known by their external identity, it is going to be very difficult to reestablish the link. The article quoted above describes the escape hatch required: collecting email address from users, so they can be authenticated independently, presumably by verifying their email. Of course that obviates one of the arguments for OpenID, namely individual websites no longer have to worry about the complexity and cost of operating their own authentication system. It turns out this is what the original post concluded, changing the site to nudge new users to their in-house authentication system instead of promoting OpenID.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Random Oracle

Building and breaking systems

Author: Cem Paya