September 24, 2009 – Random Oracle

Overshadowed by the far more serious X509 parsing vulnerabilities disclosed at BlackHat, one of the problems noted by Dan Kaminsky et al. was the existence of an MD2-signed root certificate.

On the surface it looks bad. If MD2 preimage collision is possible, an enterprising attacker could forge other certificates chaining up to this one, and “transfer” the signature from the root to the bogus certificate, complements of the MD2 collision. Root certificates are notoriously difficult to update– Verisign can not afford (for business reasons, even if it is the “right thing” for the Internet) to risk revoking all certificates chaining up to the root. Re-publishing the root signed with a better hash function is a noop: the existing signature will not be invalidated. Only option is to not trust any certificate signed with MD2 except for the roots.

But looked from another perspective, the MD2 problem is tempest in a teapot. Luckily no CA is using MD2 to issue new certificates. (At least as far as anyone can determine– CA incompetence is generally unbounded.) This is important because the MD5 forgery from last December depended on a bone-headed CA continuing to use MD5 to sign new certificate requests. That means a second preimage collision is necessary; simple birthday attacks will not work. Finding a second message that hashes to a given one is a much harder problem than finding two meaningful, but partially unconstrained messages that collide.

Eager to join in the fray against PKI, the researchers point to a recent result, An improved preimage attack on MD2, to argue that such a possibility is indeed around the corner. It turns out the feasibility of this attack and the 0wnership of MD2 was slightly exaggerated, to paraphrase Mark Twain. The paper in fact does quote 2**73 applications of MD2 hash function as the amount of time required to find a second pre-image. This is an order of magnitude above what any previous brute-force attack has succeeded in breaking but Moore’s law can fix that. What the paraphrase seems to have neglected is a far more severe resource constraint, stated bluntly in the original paper and mysteriously neglected in the Kaminsky et al summary: the attack also requires 2**73 bytes of space. Outside the NSA nobody likely has this type of storage lying around. None of the existing distributed cryptographic attacks have come anywhere near this limit– in fact most of them made virtually no demands on space from participants. To put this in context, if one hundred million people were participating, each would have to dedicate more than a thousand terabytes of disk space. Not happening. This does not even take into account the communication and network overhead now required between the different users each holding one fragment of this massive table as they need to query other fragments.

There has been a lot of talk about “unlinkability” in the context of web authentication systems. For example Cardspace is touted as being somehow more privacy friendly compared to OpenID because it can support different identifiers for each site the user is interacting with. This post is a first attempt to summarize some points around that– complete with very rusty blogging skills.

To start with, the type of unlinkability envisioned here is a very weak form compared to the more elaborate cryptographic protocols involving anonymous credentials. It comes down to a simple feature: when a user participates in a federated authentication system (which is to say, they have an account with an identity provider that allows the user to authenticate to websites controlled by different entitites) does the user appear to have a single, consistent identifier everywhere he/she goes?
It is not a stretch to see that when such a universal identifier is handed out to all of the sites, it enables tracking. More precisely it allows correlating information known by different sites. Netflix might know the movie rental history and iTunes might know their music preferences– if the user is known to both sides by the same consistent identifier, the argument goes, the two can collude and build an even more comprehensive dossier about the user. This is a slightly contrived example because movie rentals are uniquely identifying already (as the recent deanonymization paper showed) and chances are so is the music collection, but it is easy to imagine scenarios where neither site has enough information to uniquely identify a user but when they can collude and put all of the data together, a single candidate emerges. Consider Latanya Sweeney’s discovery in 2000 that 87% of the US population can be identified by birthdate, five-digit zipcode and gender. It does not require very many pieces of information– if they can be assembled together with the help of a unique ID each one is associated with– to pick out individuals from the online crowd.

The obvious solution is to project a different identity to each website. Alice might appear as user #123 to Netflix but iTunes remembers her as user #987. With a little help from cryptography it is easy to design schemes where such distinct IDs can be generated easily by the authentication system, and virtually impossible to correlated by the websites receiving them even when they are trying to collude.

[continued]

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Random Oracle

Building and breaking systems

Day: September 24, 2009

MD2: hash-collision scare of the day

Pairwise identifiers and linkability online (1/3)