October 2009 – Random Oracle

(Reflections on Joel Spolsky’s talk at Google NYC office previous week.)

Joel Spolsky has previously harped on the problem of arrogant UI design interrupting users with self-important questions on trivial settings– how many items to display under recently opened files, whether to upgrade to release R8 etc. This is one of the main themes in his 2001 book “User interface design for programmers”– the options/preferences menu to paraphrase Spolsky, is a record of all the design controversies the developers ever faces and failed to resolve decisively, punting it to the user.Given the mediocre quality of most UI design, it is difficult to argue with this. In fact finding hillariously awful examples of lame dialogs popping up at inopportunue moments is about as difficult as shooting fish in a barrel. But two of the points cited in the talk deserve closer scrutiny.

One example came from the Options dialog in Visual Studio. There are literally hundreds of possible settings to tweak in that particular application and bringing up that dialog must be like opening Pandora’s box. But there is a big difference between an element of the interface that the user intentionally seeks out verses one that interrupts the primary activity with a question that the user is likely not interested in at that point. This is similar to the “about:config” option in Firefox– no one would fault the Firefox developers for burying ultra-advanced options such as whether to enable ecdhe_ecdsa_des_ede3_sha cipher suite in TLS. It would rightly justify ridicule if Firefox asked this question in the middle of connecting to a website or even displayed a checkbox for it under the security-options tab; but they did not. Clicking past the semi-humorous warning about voiding your warranty implies an assumption of risk that complex beasts lie ahead.

Second example is the standard Authenticode dialog from Windows, the dreaded “do you want to install software published by Acme Inc?” question. A former colleague at MSFT who also worked on IE once joked that the text be replaced with “Do you feel lucky today?” (Being polite our software would drop the modifier from the original Dirty Harry version.) Because the user often has exactly zero context to make a decision more informed than flipping a coin. Let’s suspend disbelief for a moment and pretend that certificate authorities were competent. Company name displayed in the dialog accurately represented the identity of the software publisher with no misleading, sound-alike names. There are thousands of companies publishing software for Windows. A handful may have brand recognition: if the dialog claims ActiveX control is signed by Microsoft, chances are it is not intentionally malicious. (Ofcourse This does not mean that it is not buggy or contains an unintended security vulnerability that will still lead to grief– only that the developers started out with “good intentions” assuming their interests are aligned with that of the user.) Vast majority of developers are not household names. Worse the bundling of spyware means that even publishers with the benefit of name recognition such Kazaa and Morpheus etc. in the heyday of P2P file sharing had a dubious record of shipping adware.

In other words, Joel Spolsky is right: the user is not in a great position to make this security decision because they have very little information to go by. Unfortunately the designers of the software are in an even worse position: they are just as ignorant of the facts, and worse they do not share the user’s value judgments.

Going back to that Authenticode prompt: its designers are no more prescient than the user in divining the quality of software development practices or for that matter the integrity of the business model from the vendor name. MSFT provides the platform for independent software vendors; grading the efforts of those vendors has traditionally been a matter for customers voting with their dollars.

Most of the obvious security decisions are already settled by reasonable defaults. IE no longer prompts users to decide what to do about an expired certificate issued from a trusted authority with mismatched name. It practically dead-ends the user in a semi-threatening error page that is very difficult to get past. This is the easy case: designers can make a right call with high confidence. In this case case they made the call that SSL depends on certificates validating correctly and if you can not configure your website correctly, you deserve to lose traffic. The first one is a fact, the second a value judgment, a relatively new one at that: certainly did not used to be the case in the early days of the web when “making it work” took priority over security. Yet it is a sentiment most people will agree with today, except for the clueless website owners still struggling with their certificate setup. For most of the interesting trust decisions, there are no such clear cut answers.

Second designers may face significant legal concerns: if they favor installing software from Acme but not from its competitor, legal sparks will fly. This is why efforts to classify malware need air cover from watertight definitions of spyware, applied consistently to leave no room for allegations of playing favorites.

Finally designers and users differ in their values. This is a case where deciding on behalf of the user is the arrogant and presumptuous option. For a moment replace “Acme Inc” with “Government of China.” Do we want the publisher deciding that it is OK to trust software authored by the Chinese government for automatic install? One can decry the sad state of compartmentalization in modern operating systems, but current reality is that installing an application has significant consequences. This is not a cosmetic change to the appearance of a seldom-used menu or the color of background: confidentiality and integrity of everything the user has on that computer is at stake. Fundamentally this user is facing a trust decision. Designers can not make that decision for him/her because everyone has different values predisposing them to embrace certain institutions wholeheartedly while being inherently skeptical of others. They have different levels of risk tolerance– the Internet cafe user looking for the proverbial dancing squirrels clip verses the attorney with confidential documents to protect. This is one case where the decision belongs to the user.

[Final piece of the series, see first and second posts.]

The greater challenge with trying to create unlinkable user identifiers on the web is the ease of linking them online. Standard models of “linking” assume that the sites the user visited get together offline, long after the user has visited both of them, and try to ascertain whether two activity sequences they observed belong to the same users. It is relatively easy in this model to come up with ways of assigning identifiers to users that are deterministic, unique to each site and computationally difficult to link even when multiple sites collude.

Problem is, websites are not constrained to this simplistic attack model. Even today user tracking on the web involves a type collusion enabled by one of the elementary assumptions in HTML: any website is free to include content from any other website. That is by design. Any website can include an image, a frame, a video or song from another website. That means the web browser will automatically follow hypertext links crafted by one site and pointing to another website. That is a problem for privacy– a link can encode arbitrary information.

Consider an authentication system that assigns cryptographically unlinkable identifiers to users. The movie rental website knows this user as #123 while the bookstore knows her as #456. Enterprising marketing teams at these websites decide they want to collude and link user information. End goal is that the bookstore learns the users’ movie preferences and the video store gets an idea about her library, in the hopes they can create personalized offers. This is going to be a tall order when users are offline, because there is no unique identifier to key off. (Assuming we suspend disbelief– in reality credit card numbers or shipping addresses are the fly in the ointment as explained in the second part.) Instead they must capitalize on that window of opportunity when the user is online and logged into both sites.

Every page on the bookstore website has an image or other embedded content pointing to the video download site, and vice verse. Using transparent 1×1 images is customary for this purpose but such attempts at stealth are not required. The link for the embedded content contains the pairwise-unique identifier for the user observed by one site. When the user follows that link, they are going to be communicating two identifiers:

First one is implicitly encoded in that link crafted by the sender, say #123. This is the identifier observed by the originating site and it will be encoded in the URL, or other piece of the request such as the form fields.
Second identifier is explicitly asserted in the authentication protocol used by the destination. This is the identifier associated with the destination, say #456.

At this point the destination site has enough information to link the two: user #123 at the bookstore is same person as user #456 over here. Once that association is made, databases can be joined offline: everything about her book purchases can be joined against everything known about her tastes in bad 1980s cinema.

Granted this attack has some significant limitations: the user must be authenticated at both sites at simultaneously, or at least have some persistent identifier (such as a cookie) stored on both sides that encodes their identity. This turns out not to be a significant limitation, since users do authenticate to multiple sites in a single browser session, and in any case they need to fall into this trap just once for the permanent linkage to be created. A bigger problem is that linkage is limited to a pair of websites only. If 10 websites need to collude, there are 45 pairs of identifiers to sort out so this approach implemented naively would not scale.

Fortunately or unfortunately depending on the perspective, having each pair of websites in the conspiracy link identifiers is not necessary. A much simpler solution is to designate a single tracking agent against which everyone else’s identifiers are linked. Every website embeds content from this one site, which observes and stores all of the identity pairs observed together.

In the real world of course such tracking agents go by a more mundane name: advertising network. Display advertising networks have the unique benefit that they in fact appear as embedded content any number of websites, by design. Any time a user is authenticated to multiple sites and these sites contain third-party content hosted by the network, there is an opportunity to link the two identities together. In fact explicit authentication to the advertising network is not required: even a weak, temporary identity such as session cookie works: anytime more than one external ID is observed, that creates a permanent record. If the network observes #123 and #456 appearing in one session today, and sees #456 and #987 in an independent session tomorrow, the conclusion is all three identities are linked.

What this suggests is that until automatic loading of embedded content on pages is controlled better, unlinkable identities will be facing an uphill battle against one of the basic design principles behind the web.

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Random Oracle

Building and breaking systems

Month: October 2009

Choices and security: when designers can not decide

Unlinkable identifiers and web architecture: connecting the dots (3/3)