DigiNotar fail: “this time is different”

Incompetent certificate authorities endangering users with fraudulent certificates is nothing new. There have even been allegations of willful corruption, including accusations that at least one CA is a front for the Chinese government, a nation not exactly known for following due process when it comes to intercepting communications. From that perspective, the DigiNotar debacle would have been yet another anecdote in the growing repertoire of inept CA stories. Yet there were a few notable aspects to the incident that truly made this one different, and perhaps a welcome sign that the situation is improving.

First it was caught by end users because of a feature in Chrome that “pins” root certificates for Google properties. This is surprising: a man-in-the-middle attack against SSL, armed with a valid certificate from a trusted issuer used to be transparent for all intents and purposes. After all, CAs are interchangeable: it does not matter whether Verisign or Honest Achmed issued the certificate for GMail– to the web browser they are equally trusted. This is great news for attackers; even if your website obtains its certificates from a semi-competent CA, they can simply go after any one of the remaining 100+ issuers in order to successfully impersonate your company. Root-pinning in Chrome is an experimental feature to mitigate this risk. It is not based on any standard, although it can be viewed as an extension to HTTP Strict Transport Security (HSTS): in addition to asserting that a website is available over SSL only, the website can assert that it only sources certificates from a small number of root CAs, working around the weakest-link-in-the-chain problem alluded to above. At least that is the idea; as implemented currently, this list of pinned websites is hard-coded into the Chrome web-browser source code. Sites can only take advantage of this by submitting a request to Google to get added to that whitelist. This is clearly a model that will not scale long term.

But we digress– the other surprise in this debacle was the swift response from Mozilla– who maintains the Firefox web browser– and Microsoft. In the past, cases of “mistaken identity” were dealt with by individually blacklisting the issued certificates. In the case of Windows such certificates are usually these are shipped as part of a monthly security update, because revocation checking can be unreliable. For example a look at the certificate store on a Windows 7 installation shows about a dozen certificates marked “untrusted,” including those impersonating Microsoft, Google, Skype, Yahoo and Mozilla.

What about the CAs responsible for these mistakes? They are still in business, hopefully more enlightened from the experience and following more stringent identity checks procedures. But they are still listed as a trusted authority by Windows and Mozilla. Revoking that privilege must have seemed unthinkable: all certificates issued by that particular CA would be instantly invalidated. Any user trying to connect to a website using such a certificate would get an ominous error message displayed by their web browser, designed to be difficult to work around. Imagine the confusion and user-support costs if this were done for Verisign, one of the largest issuers on the planet. Even for a CA with relatively small market share (“UTN-USERFirst-Hardware”– responsible for a batch of fraudulent certificates, note the delicious irony in the name of putting users first) there is the risk that the offending CA may get upset (read: litigous) and go after MSFT for this affront to their self-esteem. The inherent asymmetry in resources makes this a difficult position for MSFT: they have deep pockets and very large market share, it would be easy to paint this as a case of big-bad Microsoft effectively putting a struggling CA out of business. With tortious interference claims and hefty damages on one side, and an unappreciative press on the other side, it hardly seemed worth it to begin this fight.

Except that in the case of DigiNotar the unthinkable happened: first Mozilla and then Microsoft removed DigiNotar from its trusted roots. Mozilla prefaced their argument with: “… because the extent of the mis-issuance is not clear…” For the first time this reflects a conservative approach lacking in past incidents. Before it was the norm to assume that fraudulent certificates were isolated mistakes and unlucky days for an otherwise sound CA– this time neither Mozilla nor MSFT were willing to take that on faith, and instead removed the CA altogether. (Incidentally Chrome uses the system roots on Windows and NSS roots on other operating systems, effectively inheriting both of these.)

Apparently once the ineptitude reaches an egregious level, even certificate authorities face the consequences. Except that the “consequences” in this case are not of a financial nature: VASCO, the parent company, helpfully notes in their press release for the security incident that they “expects that the cost of [helping existing customers] will be minimal” and “… the first six months of 2011, revenue from the SSL and EVSSL business was less than Euro 100,000.” In other words, complete loss of the CA business due to gross negligence presents no serious risk to the company– even as their actions endangered the privacy of users around the world.

RSA breach redux: a well-deserved pwnie

The Pwnie award winners were announced yesterday at the Blackhat Briefings in Las Vegas. In some categories, there was little suspense, with the winners almost a complete lock-in. Not surprisingly RSA got the nod for Lamest Vendor Response. This is a good time to revisit what made thxe RSA incident particularly significant.

The issue is not the breach itself: security incidents are a part of life. Even organizations with strong culture of risk management are not prefect. It’s instructive to note that RSA did not win for Most Epic Fail; surely they had formidable competition in Sony, almost untouchable for its string of remarkably bone-headed moves in 2011 culminating with the PlayStation network breach. What made the SecurID case stand-out was deliberate, pernicious and persistent attempts by RSA to conflate company bottom-line with customer best-interest, in the face of clear cut evidence to the contrary.

For background: RSA has presence in several product lines, ranging from licensing their cryptographic library BSAFE (which used to power Windows crypto API in the ancient days before it was retired in favor of Peter Montgomery’s native implementation) to providing a data-loss prevention suite. Yet the 2-factor authentication product SecurID stands out as one of the main revenue generators. SecurID is a small, tamper-resistant gadget which stores a secret key (the “seed”) and uses that seed to generate so-called one-time passwords or OTPs. Typically an OTP is sequence of 6 digit numbers that change over time or with the press of a button on the gadget. These numbers are used to authenticate users, usually in combination with a password, providing an additional degree of assurance that the user is who they claim to be. Unlike passwords, OTPs change constantly. This is good news for the user and bad news for prospective attackers: tricking the legitimate user into revealing one OTP does not help to predict the next one that would be required to masquerade as that user.

In reality SecurID model involves two very different and completely orthogonal businesses bundled into one:

Selling hardware that generates OTPs according to a sound cryptographic algorithm. Here we have a traditional market in physical goods. One unit is sold for each user and it is strictly a transactional business. Short of replacements for lost/malfunctioning tokens, there is no ongoing relationship with the customer.
Managing the life-cycle of identities associated with the dongles. This is the catch associated with OTP generation: the same secret sauce used to generate an OTP is also required to verify when a user submits one. That means whoever is charged with verifying OTPs must have access to the same secrets. This is an ongoing service, not a one-time sale. Each time a user shows up to authenticate and submits what they claim to be their OTP, there must be code running on some machine which accepts the user claim as input, compares it against the correct OTP generated from a copy of the secret key and responds with yay/nay.

In the case of SecurID, it is always RSA programming the SecurID tokens with seeds that are generated by RSA and it is always RSA running the servers that can authoritatively declare whether an OTP submission is correct or not. Customers have no choice in the matter. Yet this bundling is far from a foregone conclusion. It is perfectly possible to produce tamper-resistant OTP hardware which can be programmed in the field by customers, and provisioned with new keys that are not known to the manufacturer. Case in point is the Yubikey, a USB-based token that “injects” OTP directly into an application as a sequence of keystrokes. In this model, the hardware provider simply ships the gadgets to an organization. The IT department in the organization is responsible for programming them with keys, assigning the gadgets to their users, keeping track of which user got which gadget and the associated seeds.

Why did RSA insist on doing both? Simply because selling hardware is a commodity business where vendors can only compete on price. Granted, not all hardware is identical and there are indeed varying qualities of tamper resistance. Extracting the seed from the OTP generator by cracking open the case and inspecting the circuity inside would be considered a fatal attack on the system. Over the years different designs have exhibited different levels of difficulty in resisting such attacks. But few customers worry about such nuances: given an agreed-upon scheme for generating these OTPs (for example the open-standard OATH— incidentally not used by SecurID which has a proprietary algorithm) most enterprises would be interested in the cheapest hardware that can do the job. That would be a very competitive market where prices decline and no-vendor has lock in power. If vendor Bob decided to over-charge for their tokens, vendor Alice could undercut him with cheaper units that generated the same OTPs and functioned identically. Even worse, selling hardware is a one-time transaction. Short of replacing the gadget, there are few other revenue opportunities, certainly no lock-in effects. That means the vendor is faced with the choice of either shipping low-quality hardware that breaks down often– preferably just outside the warranty period– and requires frequent replacements, leading to unhappy customers (keeping in mind, this is critical authentication infrastructure: if the OTP generator breaks down, the user can not get their work done) or shipping a bullet-proof product that all but guarantees they will never hear from that customer for another 5 years.

The RSA solution to this dilemma is to bundle the service with the hardware. The nice thing about a service is that the customer must continue paying for it every year. By inserting themselves into the middle of every authentication event between an organization and one of their employees, RSA ensures a steady revenue, artificially rendering itself “indispensable” to the organization in question. But at what cost? The downsides to this arrangement are obvious, and in retrospect did not require a spectacular security breach with suspected nation-state sponsor:

Availability: there is a massive single point-of-failure in the service maintained by RSA for verifying every OTP transactions from every SecurID token in existence. If for some reason, that service is not accessible or experiences an outage, every single customer of SecurID is impacted. In effect work grinds to a halt because nobody can authenticate. (Note there is an entirely different deployment problem in trying to use SecurID inside an air-gapped network disconnected from the Internet: in this case RSA services are not even reachable.) This is not purely a question of whether RSA can build services with higher uptime than a particular customer. For some customers it may well be the case that they are better off outsourcing this to RSA. In other cases it is not clear which side can maintain higher availability and lower latency: when this blogger was working on Windows Live ID, one of the primary concerns leading to the rejection of SecurID centered around taking a dependency on a third party for user authentication. If employees in an organization can not get their work done because authentication is failing, the organization bears the cost directly. RSA may refund some of their SecurID service fees, but the liability remains limited compared to the unbounded cost for the customer.
Security: More importantly, RSA becomes a critical security dependency on an ongoing basis. A vendor delivering hardware must be trusted to have built decent tamper-resistance, not have any secret backdoors installed into the unit etc. But this is a one-time event. Once the hardware has been etched, potted, molded and put in a crate for delivery, any news that the manufacturer got 0wned can be met with a shrug. By contrast if the verification service with a copy of all OTP seeds ever gets breached– demonstrated in dramatic fashion by RSA– it is an emergency for all customers.

It is difficult to fault RSA for acting in the interests of their shareholders, and pushing the bundled service model as long as customers were willing to go along. The major miscue came when the incident threatened to make it clear that RSA revenue and customer interests are far from being aligned. Instead of acting contrite, issuing a mea culpa and trying to appease customers for cheap PR points, RSA pursued a remarkably clueless strategy of vague press statements and unconvincing denial. By the time, Lockheed Martin and Level3 provided the smoking gun, it was too late for the company to have a shred of credibility.

Kudos to the Pwnie judges for crowning this over-qualified candidate for Lamest Vendor Response.

Windows Phone 7: dropping a generation of developers

One of the less discussed aspects of the Nokia-MSFT deal is impact on developers. After all platforms stand or fall on the strength of their applications. (Steve Ballmer wanted every MSFT employee to take this message to heart.) Windows was able to leverage this virtuous cycle to deliver a stunning upset to Apple in the 1990s, by creating a very attractive environment for developers to enrich the platform one application at a time. Windows API was stable, through two decades as everything changed about the basic PC architecture. CPUs went from 16 to x64, multiple cores and SMP became common, GPUs gained a prominent role, the Network became a critical component of writing code. Windows programming still looked the same. In fact “app compat” became one of the major costs in operating system development– through heroic engineering effort, buggy applications relying undocumented APIs in some archaic version of the operating system were coaxed into working properly on the latest and greatest kernel, lest the incompatibility deter some customer from upgrading.

The same approach applied to mobile programming. Even before smartphones, MSFT pursued handhelds with PocketPC. A subset of the venerable windows API was still present, and later expanded into Windows Mobile. Any developer familiar with programming desktop applications could, with a little effort, write code for mobile devices. To a large extent Apple took the same approach to allow its developers to transition from writing Mac OS-X applications to targetting the iPhone/iPad.

So it came as a major surprise that WP7 dropped backwards compatibility. Native code is now verboten, only .NET applications can be written. In one bold stroke, MSFT may have lost a generation of developers who grew up scrutinizing the MSDN documentation for the subtleties of the classic Windows API. An even bigger question is whether they will be able to court new developers and gain mindshare among those contemplating a career in development. From the perspective of a newly minted computer science graduate trying to decide which programming language/environment to excel in, the options are:

Learn C/C++ and take up any systems programming task. (This includes traditional Windows applications, an admittedly endangered species.)
Learn Java and program either server applications, or dive into mobile development with Blackberry or Android– the new hotness.
Learn Objective C and write OS X or iPhone applications. Also known as “objectionable-C” this was an attempt (emphasis on attempt) to add object-oriented features to C before Stroustrup did it the right way with C++. Outside of Cupertino few people care about it.
Learn C# and .NET to program… what exactly? For all the work on promoting the idea that .NET is cross-platform and beats Java at its own game of write-once-run-everywhere, the technology remains very much tied to MSFT platforms. This used to be seen only for enterprise line-of-business applications, and web applications running on Windows server variants: before Windows Phone 7 came along and mandated managed code. The problem is these are highly specialized segment of the market. Much like learning Objective-C and Cocoa development, it is not a portable skill useful in any other context. Unlike OS-X/iPhone, WP7 does not have a commanding presence in the market and proven revenue model with an app store.

It is difficult to justify taking that last option, except as a way to capitalize on lack of competition. In other words, since every one else is writing for iPhone and Android, one viable strategy for ISVs may be churning out copy-cat WP7 applications styled after the popular ones on the leading platforms. But this is hardly a sustainable model, or an appealing proposition for a new developer seeking challenging work.

HP: over-engineering and under-delivering support

Vignettes from setting up a new PC.

Starting point: new HP machine, reimaged from scratch with Windows 7. Naturally most of the devices are not recognized, because the drivers do not ship out of the box with the OS. The biggest problem is the network card: once the machine can reach out to the Internets, remaining drivers will become easy to download. Only problem: there is no indication anywhere about the type of network card used. It is not on the HP website: “integrated network card” is about as specific as the documentation gets– ashamed of the brand? It is not listed anywhere on the paperwork that arrived with the machine. Windows itself provides no clues, after the OS attempts loading the generic Ethernet card driver, which predictably fails to start the device correctly or get any information such as manufacturer ID out of it.

First plan: look for drivers on the HP website, under the support section. No drivers for the specific model. Luckily this is a part of a series of models, virtually identical except for different processor/memory options. One of the other models has 11 drivers posted. Minor problem: they are all packaged as executables named SP<random number>.exe. There is no reason for device drivers to be delivered this way: presumably it simplifies installation. Except it does not work. Every single one fails with a complaint that the package can not be installed because the machine fails to meet minimum requirements. (This is W7 ultimate edition– the machine shipped from the factory with home SKU, presumably deemed worthy of the driver packages. What is it about the higher version that HP considers insufficient?) This is an example of making the support brittle by trying to get too fancy: if HP had made the raw drivers available instead of holding them hostage inside buggy installers, it would have been the end of the story.

Second plan: time to contact support. The idea of a customer flattening the box and reinstalling a new OS from scratch is clearly an unusual scenario that stymies the eager support representative. (Not to mention that the IM conversation is taking place on a different computer because there is no networking on the machine under investigation.) He continues sending a series of links to the driver-packages, none of which install correctly because HP tried being too smart about detecting when a version of Windows was worthy of the update. Eventually this blogger manages to get across the message that raw drivers are necessary. At this point the Macbook Pro freezes, but the support rep dutifully sends an email with the links, along with the disclaimer that the links will be taking this blogger to a non-HP website where they are not responsible for content– that can only be good news considering the HP site added exactly zero value to this endeavour. This is enough to reveal the brand of the mysterious network card: Realtek (which does indeed have some detractors that could have given HP a pause) and locate the correct drivers.

Friendly spam: account hijacking and unintended consequences

In the past week, this blogger received links from two friends hawking shady pharmaceutical products: one was sent from a GMail account, and the other directly scrawled on the Facebook wall. This was odd, to say the least. Both friends remain gainfully employed, and unlikely to dabble in direct marketing on the sides: one is at MSFT, and the other works in financial services in Manhattan. Instead they had become victims of an account takeover, perhaps falling for a phishing scam, maybe logging into their accounts from a public computer infected by malware, or perhaps in the worst case scenario one of their personal machines had been 0wned.

So far, nothing new: in modern society, phishing attacks and large-scale machine compromise, compliments of Adobe, Sun/Oracle and MSFT are par for the course. What is unusual is the way the attackers are trying to leverage access: sending spam to other email address on the contact list. All things considered, this is a very mild outcome. A couple of factors may be at work:

Spam is economically viable. So much that attackers do not bother with trying to extract more value from compromised accounts. The revenue opportunity in spam has been well-studied in the security literature. The novel twist here is that the message is coming from a friend, and may have even higher click-through rates. (Keeping in mind that spamming is a very noisy activity. Eventually one of the friends on that contact list is bound to reply and inform the victim that their account has been 0wned.)
There is a surplus of compromised accounts out there, so much that attackers do not have the time to manually sort through each one and identify interesting ones. Presumably the personal email account of a financial analyst is worth more than that of an average Hotmail user. Even though it is not their work email, there may still be connections, interesting messages or stepping stones to other accounts. Using that account for indiscriminate spam seems inefficient, a waste of opportunity.
Attackers have not been able to automate the classification of each account as high/low value target. If so this is only a temporary roadblock. Given the profile information from an account (very likely includes the real name) it would be relatively easy for an individual to run a Google search for that person. Facebook accounts makes this easier by identifying networks/groups/past employers. Even running simple keyword searches in mail eg for names of banks, phrases appearing in legal briefings, could be used as the basis of heuristics to locate accounts with useful information.

Finally the proliferation of spam from friendly channels could be an encouraging sign that spam filters have gotten very good– to the point that attackers find it necessary to take over legitimate accounts and exploit existing trust relationships to their contacts as more reliable delivery mechanism. In that case the war on spam would have the highly ironic side-effect of increasing the pressure on existing user accounts.

Imperfect censorship: making sense of web blocking statistics

Compare the availability of YouTube in 2 different countries around same time-frame, during the past year:

(These graphs are from the government transparency website, which also contains information about legal subpoenas for information.)

Both countries had been blocking YouTube, among other Google services but the charts appear to indicate that the censorship in Turkey has been less than watertight. There are noticeable dips at the beginning of April and June, with occasional spikes that may either be a temporary failure in the blocking or perhaps an organic spike in volume. The blocks appear to be lifted in November, and traffic once again recovers. What is unusual is that the normalized volume never flatlines, does not go down to zero or even hover around the single digit percentages.

Contrast that with Iran, where the percentages never climb above a few percent and hover below 1% after what appears to be a tweak to the censorship implementation around August.

Case study on the perils of identity federation

This forceful critique (to put it mildly) of OpenID from a website/business owner perspective highlights one of the main leaps of faiths involved in federation: taking a dependency on a third party for the well-being of your own business.

There is a lot going on in the debacle described in the original post. Some of them could be attributed to “implementation issues,” the vague catch-all category that is the equivalent of “pilot error” we fall back on to explain away incidents without attributing a systematic cause: JanRain randomly changing APIs without proper communication, Google changing the identifier returned, inconsistency between user profiles returned by different OpenID providers etc. These are not supposed to happen– better change tracking could have prevented some of the bone-headed mistakes involved. Instability of the OpenID standard and general lack of interoperability among implementations is an unfortunate outcome of the highly politicized standards process that results from reluctant bringing together avowed enemies to the negotiating table. (Inexplicably the US government has decided to throw its weight behind this already hobbled standard, by empowering the Nationals Institute of Health to work on a pilot program for federal adoption.) But again this is business as usual in trying to forge consensus for Internet standards, and not intrinsic to the problem of OpenID in particular or interoperability in general.

At the same time there are deeper issues at play, and these are inimical to any identity federation scheme . To quote the metaphor used by the original author:

[…] of all the failure points in your business – you really don’t want the door to be locked while you stand behind the counter waiting for business. No, let me rephrase that: you don’t want the door jammed shut, completely unopenable while your customers wait outside – irate that you won’t let them in.

Put simply, when users login to your website using a third-party identity provider (“IDP”) your business is at the mercy of that provider. If they experience a service outage, users can not login to your website either. If they decide to experiment with brand new user interface that confuses half the users, your website loses traffic.

Some of the risks can be mitigated contractually. For example the IDP could commit to a particular service level agreement, saying they have an expected uptime of 99.99%. But no IDP in existence is willing to shoulder the burden of full liability for losses incurred at relying party sites. Your website can make a compelling case that the inability to authenticate new users for an hour has resulted in loss of a thousand dollars, going by historic traffic patterns. The most you are likely to get out of the IDP are profuse, heartfelt apologies and at best a refund for that month. The incentives are highly asymmetric.

One could argue that specialization and economies of scale will compensate for this: JanRain is presumably handling authentication for thousands of web sites. So they are in a position to invest in very high-reliability infrastructure and maintain strong security posture. In principle then they are less likely to experience an outage (compared to what each relying party is capable of) less likely to get breached in an embarrassing manner as Gawker recently managed to, and more likely to respond to security incidents quickly in the worst case scenario. On the other hand, as probabilities for catastrophic failure decrease, the damage potential from that failure is going way up. An outage or breach at JanRain impacts not just the author of that blog post, but every other business using the OpenID interop service. More importantly, this is not a linear function of number of users: scale attracts scrutiny, both from white hat researchers and black-hats looking to capitalize on a lucrative target.

The above scenario only considered unintentional outages. What about cases where service is withheld on purpose? Presumably the IDP is getting paid by the site for their service. What happens when it is time to renew the contract? What if negotiations with the IDP go south and they decide to hold your users “hostage,” by refusing to authenticate them to any RP except yours until you agree to the higher price? If users are only known by their external identity, it is going to be very difficult to reestablish the link. The article quoted above describes the escape hatch required: collecting email address from users, so they can be authenticated independently, presumably by verifying their email. Of course that obviates one of the arguments for OpenID, namely individual websites no longer have to worry about the complexity and cost of operating their own authentication system. It turns out this is what the original post concluded, changing the site to nudge new users to their in-house authentication system instead of promoting OpenID.

Temporarily using a Nexus S in Istanbul

Some pitfalls for the unwary, before popping in a new SIM:

Switching SIMs will remove passwords from saved accounts and break existing sync. This is a general property of Android and perhaps someone can explain the reason for this “feature.” Conspiracy-minded critics are likely to cry “carrier-humping surrender monkeys!” again. SIM is the instrument of customer lock-in for carriers; why create one more hurdle for switching providers, even when the switch is temporary? Replacing the original SIM does not recreate the lost credentials. Granted this is not irreversible, account names are still persisted and one can retype passwords– although it can be quite frustrating to enter symbols and punctuation marks on the inane virtual keyboard. Let’s not even get started on the difficulty of obtaining access-codes for accounts set up with new 2-step verification feature. It is not clear what threat this is defending against; merely removing the SIM without replacing it does not have this effect. Only inserting a new SIM appears to trigger the behavior, so it is useless in theft scenarios where the adversary removes the SIM to prevent remote wipe instructions. Incidentally it would be a real security feature if credentials were stored on the SIM card and never exported, with an applet on the SIM responsible for authentication. After all the SIM presents the only ubiquitous secure element found in every GSM phone. Carrier lock-in effects persists but at least there is a redeeming virtue in improved protection for credentials. Unfortunately contents of the SIM are tightly controlled by carriers and uploading your own Javacard applet there for other useful functionality has been a non-starter as far as business plans go. This is a major squandered opportunity for improving authentication across the board.
Configure the OS to not lock the SIM card. In the US most SIM cards do not require a PIN. At least in Turkey they appear to be; all the prepaid Turkcell cards I have seen had both the regular PIN and PIN2 for restricting call numbers. This adds one more step to the phone unlock process, on top of the pattern or existing passcode. A better design would have been for the operating system to realize that there is already an existing lock mechanism for the device, and cache the PIN automatically. (That said the screen locking is easier to by pass, as it is implemented in software; even the smudge patterns left on the screen have been shown vulnerable recently. By comparison the tamper-resistant SIM enforces its own lock out mechanism against guessing attempts.)
Mysteriously navigation does not work. Google Maps itself works like a charm– at least for now, Turkey does have a track record of blocking/unblocking Google services at seemingly random intervals. Also not surprisingly, GPS is very accurate and turn-by-turn directions are correct. But the device does not switch into navigation mode, hanging on “checking if navigation is available.” Fail.

Stuxnet and collateral damage

To update von Clausewitz’s maxim for contemporary times: “Malware is the continuation of politics by other means.” This is one of the lessons from the ongoing Stuxnet debate: targeted computer attacks has become part and parcel of nation states’ arsenal in carrying out foreign policy objective.

There have been solid technical analysis of Stuxnet’s complex inner workings, but the debate on policy implications is starting in earnest now. One question that has been overlooked is the extent of collateral damage tolerable from carrying out this type of attack.

Stuxnet was the odd combination of both being targeted very precisely and casting an extremely wide net. The malicious payload that infected industrial controllers only kicked into gear when it detected a very specific environment, believed to represent the uranium enrichment plant operating in Iran. On the other hand, because the software development for such critical facilities typically takes place behind air-gapped networks, the worm had to be released into the wild. Its humble beginnings were no different than the self-propagating malware that wreaked havoc in the past: Code Red, Nimda, Blaster, Slammer, … Except Stuxnet was light-years ahead of its predecessors in terms of sophistication and sheer number of different vectors used to infect new targets.

Because it was after a very specific target that would not be reachable directly from the Internet, the designers threw the kitchen sink at the problem, including an exploit that allowed the malware to propagate by USB drives between machines. This meant Stuxnet would eventually reach places that vanilla malware does not, including compartmentalized networks that been assumed to be isolated from the warzone that is the Internets. Stuxnet was designed to explore every nook and cranny in that space, in pursuit of its ultimate target, the programmable logic controllers destined to spin enrichment centrifuges. Given its non-discriminatory approach to spreading, it is surprising that most of the infections remained contained in Iran, with a smaller number in Indonesia and India– countries starting with “I” apparently did not fare well. By comparison the number of infections in the US were not significant. The first question then is what other systems are “fair game” on the way to reaching an objective. Stuxnet case is complicated by the fact that the presumed target is not directly reachable. Intermediate stepping stones are required to get there, which may end up being personal computers, Internet cafes, anything that is ultimately connected to the persons of interest in some unexpected six-degrees-of-separation logic. (This brings to mind the quote from Robert H. Morris Sr: “To a first approximation, every computer in the world is connected with every other computer.”) Worse the connections are not known in advance: it is a massively parallel search, exploring every possible path along the way in hopes that one may cross paths with the actual target. Such expansive views on scope risk turning every machine in the world into collateral damage in the name of reaching the destination.

The second dimension concerns damage. On most machines it infected, Stuxnet did nothing but propagate to other targets. Again there is a similarity to the massive worm outbreaks of good old days– with the exception of Witty, most contained no malicious payload. Even if it happened to land on a computer where some unlucky engineer had been tasked with developing software for industrial controllers for an unrelated industry, the tampered product would likely have worked flawlessly for its intended environment. This is not to say that there was no cost to Stuxnet for those in its path: there is still time and productivity wasted on removing the malware from the system, both for individuals and companies. On the other hand, economic impact for software vendors is murky. Antivirus vendors benefit from trumping up scare stories. This one fits the bill perfectly, complete with cloak-and-dagger nation state implications. Similarly it is difficult to argue that MSFT suffered great expense in addressing the vulnerabilities implicated in Stuxnet, considering their leisurely patch schedule in the presence of known 0-days.

In any case, it is misleading to focus on the designers’ intent in not harming systems– far from being a magnanimous gesture on their part, it was simply following best-practices in malware design. Noisy/buggy malware is the one that gets noticed and removed. Stealth is a survival strategy: even run-of-the-mill keystroker recorders designed to be steal credit cards in the name of petty theft strive to be very stable. Vandalizing user data, blue-screening the system or displaying in-your-face popup advertisments is the surefire way to get your malware noticed by an AV vendor. (Interesting enough Stuxnet was noticed by Kaspersky and filed away as vanilla malware a full year before its inner workings were properly understood.) The problem is that modern operating systems are incredibly complex, and it is not possible to ensure that malware lives up to its promise of zero collateral damage. When Robert Morris Jr. released the Internet worm, he intended it to propagate only, with no malicious payload and barely noticeable load on infected systems. But a slight miscalculation/bug in the logic caused it to overwhelm networks and machines. Even MSFT can not ship software updates without breaking users in some unexpected, obscure configuration– and they have much higher Q&A expertise and test matrix then organizations developing malware.

The network infrastructure has long been a battle ground, with participants of every scale from hobbyist vandals to organized crime groups and nation states, duking it out with packets. The question raised by Stuxnet is whether these frontlines will expand to includes the machines owned/used by ordinary citizens, turning them into dispensable pawns in pursuit of an elusive objective.

Choices and security: when designers can not decide

(Reflections on Joel Spolsky’s talk at Google NYC office previous week.)

Joel Spolsky has previously harped on the problem of arrogant UI design interrupting users with self-important questions on trivial settings– how many items to display under recently opened files, whether to upgrade to release R8 etc. This is one of the main themes in his 2001 book “User interface design for programmers”– the options/preferences menu to paraphrase Spolsky, is a record of all the design controversies the developers ever faces and failed to resolve decisively, punting it to the user.Given the mediocre quality of most UI design, it is difficult to argue with this. In fact finding hillariously awful examples of lame dialogs popping up at inopportunue moments is about as difficult as shooting fish in a barrel. But two of the points cited in the talk deserve closer scrutiny.

One example came from the Options dialog in Visual Studio. There are literally hundreds of possible settings to tweak in that particular application and bringing up that dialog must be like opening Pandora’s box. But there is a big difference between an element of the interface that the user intentionally seeks out verses one that interrupts the primary activity with a question that the user is likely not interested in at that point. This is similar to the “about:config” option in Firefox– no one would fault the Firefox developers for burying ultra-advanced options such as whether to enable ecdhe_ecdsa_des_ede3_sha cipher suite in TLS. It would rightly justify ridicule if Firefox asked this question in the middle of connecting to a website or even displayed a checkbox for it under the security-options tab; but they did not. Clicking past the semi-humorous warning about voiding your warranty implies an assumption of risk that complex beasts lie ahead.

Second example is the standard Authenticode dialog from Windows, the dreaded “do you want to install software published by Acme Inc?” question. A former colleague at MSFT who also worked on IE once joked that the text be replaced with “Do you feel lucky today?” (Being polite our software would drop the modifier from the original Dirty Harry version.) Because the user often has exactly zero context to make a decision more informed than flipping a coin. Let’s suspend disbelief for a moment and pretend that certificate authorities were competent. Company name displayed in the dialog accurately represented the identity of the software publisher with no misleading, sound-alike names. There are thousands of companies publishing software for Windows. A handful may have brand recognition: if the dialog claims ActiveX control is signed by Microsoft, chances are it is not intentionally malicious. (Ofcourse This does not mean that it is not buggy or contains an unintended security vulnerability that will still lead to grief– only that the developers started out with “good intentions” assuming their interests are aligned with that of the user.) Vast majority of developers are not household names. Worse the bundling of spyware means that even publishers with the benefit of name recognition such Kazaa and Morpheus etc. in the heyday of P2P file sharing had a dubious record of shipping adware.

In other words, Joel Spolsky is right: the user is not in a great position to make this security decision because they have very little information to go by. Unfortunately the designers of the software are in an even worse position: they are just as ignorant of the facts, and worse they do not share the user’s value judgments.

Going back to that Authenticode prompt: its designers are no more prescient than the user in divining the quality of software development practices or for that matter the integrity of the business model from the vendor name. MSFT provides the platform for independent software vendors; grading the efforts of those vendors has traditionally been a matter for customers voting with their dollars.

Most of the obvious security decisions are already settled by reasonable defaults. IE no longer prompts users to decide what to do about an expired certificate issued from a trusted authority with mismatched name. It practically dead-ends the user in a semi-threatening error page that is very difficult to get past. This is the easy case: designers can make a right call with high confidence. In this case case they made the call that SSL depends on certificates validating correctly and if you can not configure your website correctly, you deserve to lose traffic. The first one is a fact, the second a value judgment, a relatively new one at that: certainly did not used to be the case in the early days of the web when “making it work” took priority over security. Yet it is a sentiment most people will agree with today, except for the clueless website owners still struggling with their certificate setup. For most of the interesting trust decisions, there are no such clear cut answers.

Second designers may face significant legal concerns: if they favor installing software from Acme but not from its competitor, legal sparks will fly. This is why efforts to classify malware need air cover from watertight definitions of spyware, applied consistently to leave no room for allegations of playing favorites.

Finally designers and users differ in their values. This is a case where deciding on behalf of the user is the arrogant and presumptuous option. For a moment replace “Acme Inc” with “Government of China.” Do we want the publisher deciding that it is OK to trust software authored by the Chinese government for automatic install? One can decry the sad state of compartmentalization in modern operating systems, but current reality is that installing an application has significant consequences. This is not a cosmetic change to the appearance of a seldom-used menu or the color of background: confidentiality and integrity of everything the user has on that computer is at stake. Fundamentally this user is facing a trust decision. Designers can not make that decision for him/her because everyone has different values predisposing them to embrace certain institutions wholeheartedly while being inherently skeptical of others. They have different levels of risk tolerance– the Internet cafe user looking for the proverbial dancing squirrels clip verses the attorney with confidential documents to protect. This is one case where the decision belongs to the user.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Random Oracle

Building and breaking systems