Spot the Fed: CAC/PIV card edition

Privacy leaks in TLS client authentication

Spot the Fed is a long-running tradition at DefCon. Attendees try to identify suspected members of law enforcement or intelligence agencies blending in with the crowd. In the unofficial breaks between scheduled content, suspects “denounced” by fellow attendees as potential Feds are invited on stage— assuming they are good sports about it, which a surprising number prove to be— and interrogated by conference staff to determine if the accusations have merit. Spot-the-Fed is entertaining precisely because it is based on crude stereotypes, on a shaky theory that the responsible adults holding button-down government jobs will stand out in a sea of young, irreverent hacking enthusiasts. While DefCon badges feature a cutting-edge mix of engineering and art, one thing they never have is identifying information about the attendee. No names, affiliation or location. (That is in stark contract to DefCon’s more button-down, corporate cousin, Blackhat Briefings which take place shortly before DefCon. BlackHat introductions often start with attendees staring at each others’ badge.)

One can imagine playing a version of Spot the Fed online: determine which visitors to a website are Feds. While there are no visuals to work with, there are plenty of other signals ranging from the visitor IP address to the specific configuration of their browser and OS environment. (EFF has a sobering demonstration on just how uniquely identifying some of these characteristics can be.) This blog post looks at a different type of signal that can be gleamed from a subset of US government employees: their possession of a PIV card.

Origin stories: HSPD 12

The story begins in 2004 during the Bush era with an obscure government edict called HSPD12: Homeland Security Presidential Directive #12:

There are wide variations in the quality and security of identification used to gain access to secure facilities where there is potential for terrorist attacks. In order to eliminate these variations, U.S. policy is to enhance security, increase Government efficiency, reduce identity fraud, and protect personal privacy by establishing a mandatory, Government-wide standard for secure and reliable forms of identification issued by the Federal Government to its employees and contractors […]

That “secure and reliable form of identification” became the basis for one of the largest PKI and smart-card deployments. Initially called CAC for “Common Access Card” and later superseded by PIV or “Personal Identity Verification,” these two programs combined for the issuance of millions of smart-cards bearing X509 digital certificates issued by a complex hierarchy of certificate authorities operated by the US government.

PIV cards were envisioned to function as “converged” credentials, combining physical access and logical access. They can be swiped or inserted into a badge-reader to open doors and gain access to restricted facilities. (In more low-tech scenarios reminiscent of how Hollywood depicts access checks, the automated badge reader is replaced by an armed sentry who casually inspects the card before deciding to let the intruders in.) But they can also open doors in a more virtual sense online: PIV cards can be inserted into a smart-card reader or tapped against a mobile device over NFC to leverage the credential online. Examples of supported scenarios:

Login to a PC, typically using Active Directory and the public-key authentication extension to Kerberos
Sign/encrypt email messages via S/MIME
Access restricted websites in a web browser, using TLS client authentication.

This last capability creates an opening for remotely detecting whether someone has a PIV card— and by extension, affiliated with the US government or one of its contractors.

Background on TLS client authentication

Most websites in 2024 use TLS to protect the traffic from their customers against eavesdropping or tampering. This involves the site obtaining a digital certificate from a trusted certificate authority and presenting that credential to bootstrap every connection. Notably the customers visiting that website do not need any certificates of their own. Of course they must be able validate the certificate presented by that website, but that validation step does not require any private, unique credential accessibly only to that customer. As far as the TLS layer is concerned, the customer or “client” in TLS terminology, is not authenticated. There may be additional authentication steps at a higher layer in the protocol stack, such as a web page where the customer inputs their email address and password. But those actions take place outside the TLS protocol.

While the majority of TLS interactions today are one-sided for authentication, the protocol also makes provisions for a mode where both sides authenticate each other, commonly called “mutual authentication.” This is typically done with the client also presenting an X509 certificate. (Being a complex protocol TLS has other options including a “preshared key” model but those are rarely deployed.) At a high level, client authentication adds a few more steps to the TLS handshake:

Server signals to the client that certificate authentication is required
Server sends a list of CAs that are trusted for issuing client certificates. Interestingly this list can be empty, which is interpreted as anything-goes
Client sends a certificate issued by one of the trusted anchors in that list, along with a signature on a challenge to prove that it is control of the associated private key

Handling privacy risks

Since client certificates typically contain uniquely identifying information about a person, there is an obvious privacy risk from authenticating with one willy-nilly to every website that demands a certificate. These risks have been long recognized and largely addressed by the design of modern web browsers.

A core privacy principle is that TLS client authentication can only take place with user consent. That comes down to addressing three different cases when a server requests a certificate from the browser:

User has no certificate issued by any of the trust anchors listed by the server. In this case there is no reason to interrupt the user with UI; there is nothing actionable. Handshake continues without any client authentication. Server can reject such connections by terminating the TLS handshake or proceed in unauthenticated state. (The latter is referred to as optional mode, supported by popular web servers including nginx.)
There is exactly one certificate meeting the criteria. Early web browsers would automatically use that certificate, thinking they were doing the user a favor by optimizing away an unnecessary prompt. Instead they were introducing a privacy risk: websites could silently collect personally identifiable information by triggering TLS client authentication and signaling that they will accept any certificate. (Realistically this “vulnerability” only affected a small percent of users because client-side PKI deployments were largely confined to enterprises and government/defense sectors. That said, those also happen to be among the most stringent scenarios where the customer cares a lot about operational security and privacy.)
Browser designers have since seen the error of their ways. Contemporary implementation are consistent in presenting some UI before using the certificate. This is an important privacy control for users who may not want to send identifying information:

There is still one common UX optimization to streamline this: users can indicate that they trust a website and are always willing to authenticate with a specific certificate there. Here is Firefox presenting a checkbox for making that decision stick:

Multiple matching certificates can be used. This is treated identically as case #2, with the dialog showing all available certificate for the user to choose from, or decline authentication altogether.

Detecting the existence of a certificate

Interposing a pop-up dialog appears to address privacy risks from websites attempting to profile users through client certificates. While any website visited can request a certificate, users remain in control of deciding whether their browser will go along with. (And if the person complies and sends along their certificate to a website that had no right to ask? Historically browser vendors react to such cases with a time-honored strategy: blame the user— “PEBCAC! It’s their fault for clicking OK.”)

But even with the modal dialog, there is an information leak sufficient to enable shenanigans in the spirit of spot-the-Fed. There is a difference between case #1—no matching certificates— and remaining cases where there is at least one matching certificate. In the latter cases some UI is displayed, disrupting the TLS handshake until the user interacts with that UI to express their decision either way. In the former case, TLS connection proceeds without interruption. That difference can be detected: embed a resource that requires TLS client authentication and measure its load time.

While the browser is waiting for the user to make a decision, the network connection for retrieving the resource is stalled. Even if the user correctly decides to reject the authentication request, page load time has been altered. (If they agree, timing differences are redundant: the website gets far more information than it bargained for with the full certificate.) The resulting delay is on the order of human reaction times—the time taken to process the dialog and click “cancel”— well within the resolution limits of web Performance API.

Proof of concept: Spot-the-Fed

This timing check suffices to determine whether a visitor has a certificate from any one of a group of CAs chosen by the server. While the server will not find out the exact identity of the visitor— we assume he/she will cancel authentication when presented with the certificate selection dialog— the existence of a certificate alone is enough to establish affiliation. In the case of the US government PKI program, the presence of a certificate signals that the visitor has a PIV card.

Putting together a proof-of-concept:

Collect issuer certificates for the US federal PKI. There are at least two ways to source this.
- Use a public site such as login.gov which accepts PIV cards via TLS client authentication. As the server conveniently enumerates all trusted issuers during the TLS handshake, we can simply copy that list.
- Download the certificate bundle from public sites. There are several options, including one official US government page on Github as well as personal sites with helpful information about CAC.
  This turns out to result in a surprisingly large number of CAs, attesting to the sheer scale of the federal PKI infrastructure. There is an overview of the program and trust graph of CA structure on the official IDManagement website.

Host a page on Github for the top-level document. It will include basic javascript to measure time taken for loading an embedded image that requires client authentication.
Because Github Pages do not support TLS client authentication, that image must be hosted somewhere else. For example one can use nginx running on an EC2 instance to serve a one-pixel PNG image.
Configure nginx for optional TLS client authentication, with trust anchors set to the list of CAs retrieved in step #1

There is one subtlety with step #4: nginx expects the full issuer certificates in PEM format. But if using option 1A above, only the issuer names are available. This turns out not to be a problem: since TLS handshake only deals in issuer names, one can simply create a dummy self-signed CA certificate with the same issuer but brand new RSA key. For example, from login.gov we learn there is a trusted CA with the distinguished name “C=US, O=U.S. Government, OU=DoD, OU=PKI, CN=DOD EMAIL CA-72.” It is not necessary to have the actual certificate for this CA (although it is present in the publicly availably bundles linked above); we can create a new self-signed certificate with the same DN to appease nginx. That dummy certificate will not work for successful TLS client authentication against a valid PIV card— the server can not validate a real PIV certificate without the real issuer public key of the issuing CA. But that is moot; we expect users will refuse to go through with TLS client authentication. We are only interested in measuring the delay caused by asking them to entertain the possibility.

Limitations and variants

Stepping back to survey what this PoC accomplishes: we can remotely determine if a visitor to the website has a certificate issued by one of the known US government certificate authorities. This check does not require any user interaction, but it also comes with some limitations:

Successful detection requires that the visitor has a smart-card reader connected to their machine, their PIV card is present in that reader and all necessary middleware required to use that card is present. In practice, no middleware is required for the common case of Windows: PIV support has been built into the OS cryptography stack since Vista. Browsers including Chrome & Edge can automatically pick up any cards without requiring additional configuration. On other platforms such as MacOS and Linux additional configuration may be required. (That said: if the user already has scenarios requiring use of their PIV card on that machine, chances are it is already configured to also allow the card to work in the browser without futzing with any settings.)
It is not stealthy in the case of successful identification. Visitors will have seen a certificate selection dialog come up. (Those without a client certificate however will not observe anything unusual.) That is not a common occurrence when surfing random websites. There are however a few websites (mis?)configured to demand client certificates from all visitors, such as this IP6 detection page.
- It may be possible to close the dialog without user interaction. One could start loading a resource that requires client authentication and later use javascript timers to cancel that navigation. In theory this will dismiss the pending UI. (In practice it does not appear to work in Chrome or Firefox for embedded resources, but works for top-level navigation.) To be clear, this does not prevent the dialog from appearing in the first place. It only reduces the time the dialog remains visible, at the expense of increased false-positives because detection threshold must be correspondingly lower.
- A less reliable but more stealthy approach can be built if there is some website the target audience frequently logs into using their PIV card. In that case the attacker can attempt to source embedded content from that site— such as an image— and check if that content loaded successfully. This has the advantage that it will completely avoid UI in some scenarios. If the user has already authenticated to the well-known site within the same browser session, there will be no additional certificate selection dialogs. That signals the user has a PIV card because they are able to load resources from a site ostensibly requiring a certificate from one of the trusted federal PKI issuer. In some cases UI will be skipped even if the user has not authenticated in the current session, but has previously configured their web browser to automatically use the certificate at that site, as is possible with Firefox. (Note there will also be a PIN prompt for the smart-card— unless it has been recently used in the same browser session.)
While the PoC checks whether the user has a certificate from any one of a sizable collection of CAs, it can be modified to pinpoint the CA. Instead of loading a single image, one can load dozens of images in series from different servers each configured to accept only one CA among the collection. This can be used to better profile the visitor, for example to distinguish between contractors at Northrop Grumman (“CN=Northrop Grumman Corporate Root CA-384”) versus employees from the Department of Transportation (“CN=U.S. Department of Transportation Agency CA G4.”)
There are some tricky edge-cases involving TLS session resumption. This is a core performance improvement built into TLS to avoid time-consuming handshakes for every connection. Once a TLS session is negotiated with a particular server—with or without client authentication— that session will be reused for multiple requests going forward. Here that means loading the embedded image a second time will always take the “quick” route by using the existing session. Certificate selection UI will never be shown even if there is a PIV card present. Without compensating logic, that would result in false-negatives whenever the page is refreshed or revisited within the same session. This demonstration attempts to counteract that by setting a session cookie when PIV cards are detected and checking for that cookie on subsequent runs. In case the PoC is misbehaving, try using a new incognito/private window.

Work-arounds

The root cause of this information disclosure lies with the lack of adequate controls around TLS client authentication in modern browsers. While certificates will not be used without affirmative consent from the consumer, nothing stops random websites from initiating an authentication attempt.

Separate browser profiles are not necessarily effective as a work-around. At first it may seem promising to create two different Chrome or Edge profiles, with only one profile used for “trusted” sites setup for authenticating with the PIV card. But unlike cookie jars, digital certificates are typically shared across all profiles. Chrome is not managing smart-cards; Windows cryptography API is responsible for that. That system has no concept of “profiles” or other boundaries invented by the browser. If there is a smart-card reader with a PIV card present attached, the magic of OS middleware will make it available to every application, including all browser profiles.

Interestingly using Firefox can be a somewhat clunky work-around because Firefox uses NSS library instead of the native OS API for managing certificates. While this is more a “bug” than feature in most cases due to the additional complexity of configuring NSS with the right PKCS#11 provider to use PIV cards, in this case it has a happy side-effect: it becomes possible to decouple availability of smart-cards on Firefox from Chrome/Edge. By leaving NSS unconfigured and only visiting “untrusted” sites with Firefox, one can avoid these detection tricks. (This applies specifically to Windows/MacOS where Chrome follows the platform API. It does not apply to Linux where Chrome also relies on NSS. Since there is a single NSS configuration in a shared location, both browsers remain in lock-step.) But it is questionable whether users can maintain such strict discipline to use the correct browser in every case. It would also cause problems for other applications using NSS, including Thunderbird for email encryption/signing.

Until there are better controls in popular browsers for certificate authentication, the only reliable work-around is relatively low-tech: avoid leaving a smart-card connected when the card is not being actively used. However this is impossible in some scenarios, notably when the system is configured to require smart-card logon and automatically lock the screen on card removal.

Password management quirks at Fidelity

IVR Login

“Please enter your password.”

While this prompt commonly appears on webpages, it is not often heard during a phone call. This was not a social-engineering scam either: it was part of the automated response from the Fidelity customer support line. The message politely invites callers to enter their Fidelity password using the dial-pad, in order to authenticate the customer before connecting them to a live agent. There is one complication: dial-pads only have digits and two symbols, asterisk/plus and pound key. To handle the full range of symbols present in passwords, the instructions call for using the standard convention for translating letters into digits helpfully displayed on most dial-pads:

Standard dial-pad, with mapping of letters to numbers
(Image from Wikipedia)

The existence of this process indicates a quirk in the way Fidelity stores passwords for their online brokerage service.

Recap: storing passwords without storing them

Any website providing personalized services is likely to have an authentication option that involves entering username and password. That website then has to verify whether the correct password is entered, by comparing the submitted version against the one recorded when the customer last set/changed their password.

Surprisingly there are ways to do that without storing the original password itself. In fact storing passwords is an anti-pattern. Even storing them encrypted is wrong: “encryption” by definition is reversible. Given encrypted data, there is a way to recover the original, namely by using the secret key in a decryption operation. Instead passwords are stored using a cryptographic hash-function. Hash functions are strictly one-way: there is no going back from the output to the original input. Unlike encryption, there is no magic secret key known to anyone— not even the person doing the hashing— that would permit reversing the process.

For example, using the popular hash function bcrypt, the infamous password “hunter2” becomes:

$2b$12$K1.sb/KyOqj6BdrAmiXuGezSRO11U.jYaRd5GhQW/ruceA9Yt4rx6

This cryptographic technique is a good fit for password storage because it allows the service to check passwords during login without storing the original. When someone claiming to be that customer shows up and attempts to login with what is allegedly their password, the exact same hash function can be applied to that submission. The output can be compared against the stored hash and if they are identical, one can conclude with very high probability that the submitted password was identical to the original. (Strictly speaking, there is a small probability of false positives: since hash functions are not one-to-one, in theory there is an an infinite number of “valid” submissions that yield the same hash output. Finding one of those false-positives is just as difficult as finding the correct password, owing to the design properties of hash functions.)

From a threat model perspective, one benefit of storing one-way hashes is that even if they are disclosed, an attacker does not learn the user credential and can not use it to impersonate the user. Because hashes are irreversible, the only option available to an attacker is to mount a brute-force attack: try billions of password guesses to see if any of them produce the same hash. The design of good hash-functions for password storage is therefore an arms-race between defenders and attackers. It is a careful trade-off between making hashing efficient enough for the legitimate service to process a handful of passwords quickly during a normal login process, while making it expensive enough to raise costs for an attacker trying to crack a hash by trying billions of guesses.

A miss is as good as a mile

Much ink has been spilled over the subtleties of designing good functions for password hashing. An open competition organized in 2013 sought to create a standardized, loyalty-free solution for the community. Without going too much into details of these constructions, the important property for our purposes is that good hash functions are highly sensitive to their input: change even one bit of the input password and the output changes drastically. For example, the passwords “hunter2” and “hunter3” literally differ in just one bit—the very last bit distinguishes the digit three from the digit two in ASCII code. Yet their bcrypt hashes look nothing alike:

Hash of hunter2 ➞ zSRO11U.jYaRd5GhQW/ruceA9Yt4rx6

Hash of hunter3 ➞ epdCpQLaQbcGTREZLZAFDKp5i/zQmp2

(One subtlety here: password hash functions including bcrypt incorporate a random “salt” value to make each hash output unique, even when they are invoked on the same password. To highlight changes caused by the password difference, we deliberately used the same salt when calculating these hashes above and omitted the salt from the displayed value. Normally it would appear as a prefix.)

Bottom line: there is no way to check if two passwords are “close enough” by comparing hash outputs. For example, there is no way to check if the customer got only a single letter of the password wrong or if they got all the letters correct except for lower/upper-case distinction. One can only check for exact quality.

The mystery at Fidelity

Having covered the basics of password hashing we can now turn to the unusual behavior of the Fidelity IVR system. Fidelity is authenticating customers by comparing against a variant of their original password. The password entered on the phone is derived from, but not identical to the one used on the web. It is not possible to perform that check using only a strong cryptographic hash. A proper hash function suitable for password storage would erase any semblance of similarity between these two versions.

There are two possibilities:

Fidelity is storing passwords in a reversible manner, either directly as clear-text or in an encrypted form where they can still be recovered for comparison.
Fidelity is canonicalizing passwords before hashing, converting them to the equivalent “IVR format” consisting of digits before applying the hash function.

It is not possible to distinguish between these, short of having visibility into the internal design of the system.

But there is another quirk that points towards the second possibility: the website and mobile apps also appear to validate passwords against the canonicalized version. A customer with the password “hunter2” can also login using the IVR-equivalent variant “HuNt3R2”

Implications

IVR passwords are much weaker than expected against offline cracking attacks. While seemingly strong and difficult to guess, the password iI.I0esl>E`+:P9A is in reality equivalent to the less impressive 4404037503000792.

The notion of entropy can help quantify the problem. A sixteen character string composed of randomly selected printable ASCII characters has an entropy around 105 bits. That is a safety margin far outside the reach of any commercially motivated attacker. Now replace all of those 16 characters by digits and the entropy drops to 53 bits— or about nine quadrillion possibilities, which is surprisingly within range of hobbyists with home-brew password cracking systems.

In fact entropy reduction could be worse depending on exactly how users are generating passwords. The mapping from available password symbols to IVR digits is not uniform:

Only the character 1 maps to the digit 1 on the dial-pad.
Seven different characters (A, B, C, a, b, c, 2) are mapped to the digit 2
By contrast there are nine different symbols mapping to the digit 7, due to the presence of an extra letter for that key on the dial-pad
Zero is even more unusual in that all punctuation marks and special characters get mapped to it, for more than 30 different symbols in total

In other words, a customer who believes they are following best-practices, using a password manager application and having their app generate “random” 16 character password will not end up with a uniformly distributed IVR password. Zeroes will be significantly over-represented due to the bias in mapping while the digit one will be under-represented.

The second design flaw here involves carrying over the weakness of IVR passwords to the web interface and mobile applications. While the design constraints of phone-based customer support requires accepting simplified passwords, it does not follow that the web interface would also accept these. (This assumes the web interface grants higher privileges, in the sense that certain actions can only be carried out by logging into the website— such as adding bank accounts and initiating funds transfers— and not possible through IVR.)

Consider an alternative design where Fidelity asks customers to pick a secondary PIN for use on the phone. In this model there are two independent credentials. There is the original password used when logging in through the website or mobile apps. It is case-sensitive and permits all symbols. An independent PIN consisting only of digits is selected by customers for use during phone authentication. Independent being the operative keyword here: it is crucial that the IVR PIN is not derived from the original password via some transformation. Otherwise an attacker who can recover the weak IVR PIN can use that to get a head-start on recovering the full password.

Responsible disclosure

This issue was reported to the Fidelity security team in 2021. This is their response:

“Thank you for reaching out about your concerns, the security of your personal information is a primary concern at Fidelity. As such, we make use of industry standard encryption techniques, authentication procedures, and other proven protection measures to secure your information. We regularly adapt these controls to respond to changing requirements and advances in technology. We recommend that you follow current best practices when it comes to the overall security of your accounts and authentication credentials. For further information, you can visit the Security Center on your Fidelity.com account profile to manage your settings and account credentials.”

Postscript: why storing two hashes does not help

To emphasize the independence requirement, we consider another design and explain why it does not achieve the intended security effect. Suppose Fidelity kept the original design for a single password shared between web and IVR, but hashed it two ways:

First is a hash of the password verbatim, used for web logins.
Second is a hash of the canonicalized password, after being converted to IVR form by mapping all symbols to digits 0 through 9. This hash is only checked when the customer is trying to login through the phone interface.

This greatly weakens the full password against offline cracking because the IVR hash provides an intermediate stepping stone to drive guessing attacks against the full credential. We have already pointed out that the entropy in all numeric PINs is very low for any length customers could be expected to key into a dial-pad. The additional problem is that after recovering the IVR version, the attacker now has a much easier time cracking the full password. Here is a concrete example: suppose an attacker mounts a brute-force attack against the numeric PIN and determines that a stolen hash corresponds to “3695707711036959.” That data point is very useful when attempting to recovery the full password, since many guesses are not compatible with that IVR version. For example, the attacker need not need waste any CPU cycles on hashing the candidate password “uqyEzqGCyNnUCWaU.” That candidate starts with “u” which would map to the digit “8” on a dial-pad. It would be inconsistent with something the attacker already knows: the first character of the real password maps to 3, based on the IVR version. Working backwards from the same observation, the attacker knows that they only need to consider guesses where the first symbol is one of {8, t, u, v, T, U, V}.

This makes life much easier for an attacker trying to crack hashes. The effort to recover a full password is not even twice the amount required to go after the IVR version. Revisiting the example of 16 character random passwords: we noted that unrestricted entries have about 105 bits of entropy while the corresponding IVR version is capped to around 53 bits. That means once the simplified IVR PIN is recovered by trying 2**53 guesses at most, the attacker only needs another 2**(105-53) == 2**52 guesses to hit on the full password. This is only about 50% more incremental work— and on average the correct password will be discovered halfway through the search. In other words, having a weak “intermediate” target to attack has turned an intractable problem (2**105 guesses) into two eminently solvable sub-problems.

Saved by third-party cookies: when phishing campaigns make mistakes

(Or, that rare instance when third-party cookies actually helped improve security.)

Third-party cookies have earned a bad reputation for enabling widespread tracking and advertising driven surveillance online— one that is entirely justified. After multiple half-hearted attempts to deprecate them by leveraging its browser monopoly with Chrome, even Google eventually threw in the towel. Not to challenge that narrative but this blog provides an example of an unusual incident where third-party cookies actually helped protect consumers, protecting them from an ongoing phishing attack. While this phishing campaign occurred in real life, we will refer to the site in question as Acme, after the hypothetical company in Wile E Coyote.

Recap on phishing

Phishing involves creating a look-alike, replica of a legitimate website in order to trick customers into disclosing sensitive information. Common targets are passwords which can be used to login to the real website by impersonating the user. But phishing can also go after personally identifiable information such as a credit-cards or social-security numbers directly, since those are often monetizable on their own. The open nature of the web makes it trivial to clone the visual appearance of websites for this purpose. One can simply download every image, stylesheet, video from that site, stash the same content on a different server controlled by the attacker and point users into visiting this latter copy. (While this sounds sinister, there are even legitimate use cases for it such as mirroring websites to reduce load or protect them from denial-of-service-attacks.)

Important point to remember is that visuals can be deceptive: the bogus site may be rendered pixel-for-pixel identical inside the browser view. The address bar is one of the only reliable clues to the provenance of the content; that is because it is part of the user-interface controlled 100% by the browser. It can not be manipulated by the attacker. But good luck spotting the difference between login.acme.com, loginacme.com, acnne.com or any of the dozen other surprising ways to confuse the unwary. Names that look “close” at the outset can be controlled by completely unrelated entities, thanks to the way DNS works.

What makes phishing so challenging to combat is that the legitimate website is completely out of the picture at the crucial moment the attack is going on. The customer is unwittingly interacting with a website 100% controlled by an adversary out to get them, while under the false impression that they are dealing with a trusted service they have been using for years. Much as the real site may want to jump in with a scary warning dialog to stop their customer from making a crucial judgment error, there is no opportunity for such interventions. Recall that the customer is only interacting with the replica. All content the user sees is sourced from the malicious site, even if it happens to be a copy of content originally copied from the legitimate one.

Rookie mistake

Unless that is, the attacker makes a mistake. That is what happened with the crooks targeting Acme: they failed to clone all of the content and create a self-contained replica. One of the Acme security team members noticed that the malicious site continued to reference content hosted from the real one. Every time a user visited the phishing site, their web browser was also fetching content from the authentic Acme website. That astute observation paved the way for an effective intervention.

In this particular phishing campaign, the errant reference back to the original site was for a single image used to display a logo. That is not much to work with. On the one hand, Acme could detect when the image was being embedded from a different website, thanks to the HTTP Referer [sic] header. (Incidentally referrer-policies today would interfere with that capability.) But this particular image consumed precious little screen real estate, only measuring a few pixels across. One could return an altered image— skull and crossbones, mushroom cloud or something garish— but it is very unlikely users would notice. Even loyal customers who have the Acme logo committed to memory might not think twice if a different image appears. Instead of questioning the authenticity of the site, they would attribute that quirk to a routine bug or misguided UI experiment.

It is not possible to influence the rest of the page by returning a corrupt image or some other type of content. For example it is not possible to prevent the page from loading or redirect it back to the legitimate login page. Similarly one can not return some other type of content such as javascript to interrupt the phishing attempt or warn users. (Aside: if the crooks had made the same mistake by sourcing a javascript file from Acme, such radical interventions would have been possible.)

Remember me: cookies & authentication

To recap, this is the situation Acme faces:

There is an active phishing campaign in the wild
Game of whack-a-mole ensues: Acme security team continues to report each site to browser vendors and hosting companies and Cloudflare (Crooks are fond of using Cloudflare, because it acts as a proxy sitting in front of the malicious site, disguising its true origin and making it difficult for defenders to block it reliably.) Attacker responds by changing domain names and resurfacing the exact same phishing page under a different domain name.
Every time a customer visits a phishing page, Acme can observe the attack happening in real time because its servers receive a request
Despite being in the loop, Acme can not meaningfully disrupt the phishing page. It has limited influence over the content displayed to the customer

Here is the saving grace: when customers reach out to the legitimate Acme website in step #3 as part of the phishing page, their web browser sends along all cookies. Some of those cookies contain information that uniquely identifies the specific customer. That means Acme finds out not only that some customer visited a phishing site, but it can find out exactly which customer did. In fact there were at least two such cookies:

“Remember me” cookie used to store email address and expedite future logins by filling in the username field in login forms.
Authentication cookies set after login. Interestingly, even expired cookies are useful for this purpose. Suppose Acme only allowed login sessions to last for 24 hours. After the clock runs out, customer must reauthenticate by providing their password or MFA again. Authentication cookies would have embedded timestamps reflecting those restriction. In keeping with the policy of requiring “fresh” credentials, after 24 hours that cookie would no longer be sufficient for authenticating the user. But for the purpose of identifying which user is being phished, it works just fine. (Technical note: “expired” here is referring to the application logic; the HTTP standard itself defines an expiration time for cookies after which point the browser deletes that cookie. If a cookie expires in that sense, it would not be of much use— it becomes invisible to the server.)

This gives Acme a lucky break to protect customers from the ongoing phishing attack. Recall that Acme can detect when incoming request for the image is associated with the phishing page. Whenever that happens, Acme can use the accompanying cookies to look up exactly which customer has stumbled onto the malicious site. To be clear, Acme can not determine conclusively whether the customer actually fell for phishing and disclosed their credentials. (There is a fighting chance the customer notices something off about the page after visiting it, and stops short of giving away their password. Unfortunately that can not be inferred remotely.) As such Acme must operate on the worst-case assumption that phishing will succeed and place preemptive restrictions on the account, such as temporarily suspending logins or restricting dangerous actions. That way, even if the customer does disclose their credentials and the crooks turn around to “cash in” those credentials by logging into the genuine Acme website, they will be thwarted from achieving their objective.

Third-party stigma for cookies

There is one caveat to the availability of cookies required to identify the affected customer: those cookies are now being replayed in a third-party context. Recall that the first vs third-party distinction is all about context: whether a resource (such as image) being fetched for inclusion on a webpage is coming from the same site as the overall owner of the page, called “top level document.” When an image is fetched as part of a routine visit to Acme website, it is a first-party request because the image is hosted at the same origin as the top-level document. But when it is being retrieved by following a reference from the malicious replica, it becomes a third-party request.

Would existing Acme cookies get replayed in that situation and reveal the identity of the potential phishing victim? That answer depends on multiple factors:

Choice of browser. At the time of this incident, most popular web browsers freely replayed cookies in third-party contexts, with two notable exceptions:
- Safari suppressed cookies when making these requests.
- Internet Explorer: As the lone browser implementing P3P, IE will automatically “leash” cookies if they are set without an associated privacy policy: cookies will be accepted, but only replayed in first-party contexts.
User overrides to browser settings. While the preceding paragraph section describes the default behavior of each browser, users can modify these to make them more or less stringent.
Use of “samesite” attribute. About 15 years after IE6 inflicted P3P and cookie management on websites, a proposed update to the HTTP cookie specification finally emerged to standardize and generalize its leashing concept. But the script was flipped: instead of browsers making unilateral decisions to protect user privacy, website owners would declare whether their cookies should be made available in third-party contexts. (One can imagine which way advertising networks— crucially dependent on third-party cookie usage for their ubiquitous surveillance model— leaned on that decision.)

Luckily for Acme, the relevant cookies here were not restricted by the samesite attribute. As for browser distribution, IE was irrelevant with market share in the single digits, primarily restricted to enterprise users in managed IT environments, a far cry from the target audience for Acme. Safari on the other hand did have a non-negligible share, especially among mobile clients since Apple did not allow independent browser implementations such as Chrome at the time. (That restriction would only be lifted in 2024 when the EU reset Apple’s expectations around monopolistic behavior.)

In the end, it was possible to identify and take evasive actions for the vast majority of customers known to have visited the malicious website. More importantly, intelligence gathered from one interaction is frequently useful in protecting other customers, even when the latter are not directly identified as being targeted by an attack. For example, an adversary often has access to a handful of IP4 addresses when they are attempting to cash-in stolen credential. When an IP address is observed attempting to impersonate a known phishing victim, every other login from that IP can be treated with higher suspicion. Detection mechanisms can be invaluable even with less than 100% coverage.

Verdict on third-party cookies

Does this prove third-party cookies have some redeeming virtue and the web will be less safe when— or at this rate, if— they are fully deprecated? No. This anecdote is more the exception proving the rule. Advertising industry has been rallying in defense of third-party cookies for over a decade, spinning increasingly desperate and far-fetched scenarios. From the alleged death of “free” content (more accurately, ad-supported content where the real product being peddled are the audience eyeballs for ad networks) to allegedly reduced capability for detecting fraud and malicious activity online, predictions of doom have been a constant part of the narrative.

To be clear: this incident does not in any way provide more ammunition for such thinly-veiled attempts at defending a fundamentally broken business model. For starters, there is nothing intrinsic to phishing attacks that requires help from third-party cookies for detection. The crooks behind this particular campaign made an elementary mistake: they left a reference to the original site when cloning the content for their malicious replica. There is no rule that says other crooks are required to follow suit. While this is an optimistic assumption built into defense strategies such as canary tokens, new web standards have made it increasingly easier to avoid such mistakes. For example content security policy allows a website to precisely delineate which other websites can be contacted for fetching embedded resources. It would have been a trivial step for crooks to add CSP headers and prevent any accidental references back to the original domain, neutralizing any javascript logic lurking in there to alert defenders.

Ultimately the only robust solution for phishing is using authentication schemes that are not vulnerable to phishing. Defense and enterprise sectors have always had the option of deploying PKI with smart-cards for their employees. More recently consumer-oriented services have convenient access to a (greatly watered-down) version of that capability with passkeys. Jury is out on whether it will gain any traction or remain consigned to niche audience owing to the morass of confusing, incompatible implementations.

The missing identity layer for DeFi

Bootstrapping permissioned applications

To paraphrase the infamous 1993 New Yorker cartoon: “On the blockchain, nobody knows that you are a dog.” All participants are identified by opaque addresses with no connection to their real-world identity. Privacy by default is a virtue, but even those who voluntarily want to link addresses to their identity have few good options that would be persuasive. This blog post can lay claim to the address 0xa12Db34D434A073cBEE0162bB99c0A3121698879 on Ethereum but can readers be certain? (Maybe Ethereum Naming Serivce or ENS can help.) On the one hand, there is an undeniable egalitarian ethos here: if the only relevant facts about an address are those represented on-chain— its balance in cryptocurrency, holdings of NFTs, track record of participating in DAO governance votes— there is no way to discriminate between addresses based on such “irrelevant” factors as the citizenship or geographic location of the person/entity controlling that address. Yet such discrimination based on real-world identity is exactly what many scenarios call for. To cite a few examples:

Combating illicit financing of sanctioned entities. This is particularly relevant given that rogue states including North Korea have increasingly pivoted to committing digital asset theft as their access to the mainstream financial system is cut off.
Launching a regulated financial service where the target audience must be limited by law, for example to citizens of a particular country only.
Flip-side of the coin: excluding participants from a particular country (for example, the United States) in order to avoid triggering additional regulatory requirements that would come into play when serving customers in that jurisdiction.
Limiting participation in high-risk products to accredited investors only. While this may seem trivial to check by inspecting the balance on-chain, the relevant criteria are total holdings of that person, which are unlikely to be all concentrated in one address.

As things stand, there are at best some half-baked solutions to the first problem. Blockchain analytics companies such as Chainalysis, TRM Labs and Elliptic surveil public blockchains, tracing funds movement associated with known criminal activity as these actors hop from address to address. Customers of these services can in turn receive intelligence about the state of an address or even an application such as lending pool. Chainalysis even makes this information conveniently accessible on-chain: the company maintains smart-contracts on Ethereum and other EVM compatible chains containing a list of OFAC sanctioned addresses. Any other contract can consult this registry to check on the status of an address they are interacting with.

The problem with these services is three-fold:

The classifications are reactive. New addresses are innocent until proven guilty, when they are later involved in illicit activity. At that point, the damage has been done: other participants may have interacted with the address or allowed the address to participate in their decentralized applications. In some cases it may be possible to unwind specific transactions or isolate the impact. In other situations such as a lending pool where funds from multiple participants are effectively blended together, it is difficult to identify which transactions are now “tainted” by association and which ones are clean.
“Not a terrorist organization” is a low bar to meet. Even if this could be ascertained promptly and 100% accurately, most applications have additional requirements of their participants. Some of the examples alluded to below include location, country of citizenship or accredited investor status. Excluding the tiny fraction of bad actors in the cross-hairs of FinCEN is useful but insufficient for building the types of regulated dapps that can take DeFi mainstream.
All of these services follow a “blacklist” model: excluding bad actors. In information security, it is a well-known principle that this model is inferior to “whitelisting”— only accepting known good actors. In other words, a blacklist fails open: any address not on the list is assumed clean by default. The onus is on the maintainer of the list to keep up with the thousands of new addresses that crop up, not to mention any sign of nefarious activity by existing addresses previously considered safe. By contrast, whitelists require an affirmative step before addresses are considered trusted. If the maintainer is slow to react, the system fails safe: a good address is considered untrusted because the administrator has not gotten around to including them.

What would an ideal identity verification layer for blockchains look like? Some high-level requirements are:

Flexible. Instead of expressing a binary distinction between sanctioned vs not-yet-sanctioned, it must be capable of expressing a range of different identity attributes as required by a wide range of decentralized apps.
Opt-in. The decision to go through identity verification for an address must reside strictly with person or persons controlling that address. While we can not stop existing analytics companies from continuing to conduct surveillance of all blockchain activity and try to deanonymize addresses, we must avoid creating additional incentives or pressure for participants to voluntarily surrender their privacy.
Universally accepted. The value of an authentication system increases with the number of applications and services accepting that identity. If each system is only useful for onboarding with a handful of dapps, it is difficult for participants to justify the time and cost of jumping through the hoops to get verified. Government identities such as driver’s licenses are valuable precisely because they are accepted everywhere. Imagine an alternative model where every bar had to perform its own age verification system and issue their own permits— not recognized by any other establishment— in order to enforce laws around drinking age.
Privacy respecting. The protocols involved in proving identity must limit information disclosed to the minimum required to achieve the objective. Since onboarding requirements vary between dapps, there is a risk of disclosing too much information to prove compliance. For example, if a particular dapp is only open to US residents, that is the only piece of information that must be disclosed, and not for example the exact address where the owner resides. Similarly proof of accredited investor status does not require disclosing total holdings, or proving that a person is not a minor can be done without revealing the exact date of birth. (This requirement has implications on design. In particular, it rules out simplistic approaches around issuing publicly readable “identity papers” directly on-chain, for example as a “soul-bound” token attached to the address.

Absent such an identity layer, deploying permission DeFi apps is challenging. Aave’s dedicated ARC pool is an instructive example. Restricted to KYCed entities vetted by the custodian Fireblocks, it failed to achieve even a fraction of the total-value locked (TVL) available in the main Aave lending pool. While there were many headwinds facing the product due to timing and the general implosion of cryptocurrency markets in 2022 (here is a good post-mortem thread), the difficulty of scaling the market when participants must be hand-picked is one of the challenges. While ARC may have been one of the first and more prominent examples, competing projects are likely to face the same odds for boot-strapping their own identity system. In fact they do not even stand to benefit from the work done by the ARC team: while participants went through rigorous checks to gain access to that walled garden, there is no reusable, portable identity resulting from that process. Absent an open and universally recognized KYC standard, each project is required to engage in a wasteful effort to field their own identity system. In many ways, the situation is worse than the early days of web authentication. Before identity federation standards such as SAML and OAuth emerged to allow interoperability, every website resorted to building their own login solution. Not surprisingly, many of these were poorly designed and riddled with security vulnerabilities. Even in the best case when each system functioned correctly in isolation, collectively they burdened customers with the challenge of managing dozens of independent usernames and passwords. Yet web authentication is a self-contained problem, much simpler than trying to link online identities to real-world ones.

What about participants’ incentives for jumping through the necessary hoops for on-boarding? Putting aside ARC, there is a chicken-egg problem to boot-strapping any identity system: Without interesting application that are gated on having that ID, participants have no compelling reason to sign up for one; the value proposition is not there. Meanwhile if few people have onboarded with that ID system, no developer wants to build an application limited to customers with one of those rare IDs— that would be tantamount to choking off your own customer acquisition pipeline. Typically this vicious cycle is only broken in one of two ways:

An existing application with a proprietary identity system, which is already compelling and self-sustaining, sees value in opening up that system such that verified identities can be used elsewhere. (Either because it can monetize those identity services or due to competitive pressure from competing applications offering the same flexibility to their customers for free.) If there are multiple such applications with comparable criteria for vetting customers, this can result in an efficient and competitive outcome. Users are free to take their verified identity anywhere and participate in any permissions application, instead of being held hostage by the specific provider who happened to perform the initial verification. Meanwhile developers can focus on their core competency— building innovative applications— instead of reinventing the wheel to solve an ancillary problem around limiting access to the right audience.
New regulations are introduced, forcing developers to enforce identity verification for their applications. This will often result in an inefficient scramble for each service provider to field something quickly to avoid the cost of noncompliance, leaving little room for industry-wide cooperation or standards to emerge. Alternatively it may result in a highly centralized outcome. One provider specializing in identity verification may be in the right place at the right time when rules go into effect, poised to become the de facto gatekeeper for all decentralized apps.

In the case of DeFi, this second outcome is looking increasingly more likely.

Of Twitter bots, Sybil attacks and verified identities

Seeking a middle-ground for online privacy

The exact prevalence of bots has become the linchpin of Elon Musk’s attempt to bail out on the proposed acquisition of Twitter. Existence of bots is not disputed by either side; the only question is what percent of accounts these constitute. Twitter itself puts the figure around 5%, using a particular metric called “monetizable daily active users” or mDAU for calculating the ratio. Mr. Musk disputes that number and claims it is much higher, without citing any evidence despite having obtained access to raw data from Twitter for carrying out his own research.

Any discussion involving bots and fake accounts naturally leads to the question: why is Twitter not verifying all accounts to make sure they are actual humans? After all the company already has a concept of verified accounts sporting a blue badge, to signal that the account really belongs to the person it is claiming to be. This deceptively simple question leads into a tangle of complex trade-offs around exactly what verification can achieve and whether it would make any difference to the problem Twitter is trying to solve.

First we need to clarify what is meant by bot accounts. Suppose there is a magical way to perform identity verification online. While not 100% reliable, cryptocurrency exchanges and other online financial platforms are already relying on such solutions to stay on the right side of Know Your Customer (KYC) regulations. These include a mix of collecting information from the customer— such as the time-honored abuse of social security numbers for authentication— uploading copies of government-issued identity documents and cross-checking all this against information maintained by data brokers. None of this is free but suppose Twitter is willing to fork over a few dollars per customer on the theory that the resulting ecosystem will be much more friendly to advertisers. Will that eliminate bots?

The answer is clearly no, at least not according to the straightforward definition of bots. Among other things, nothing stops a legitimate person from going through ID verification and then transferring control of their account to a bot. There need not be any nefarious intent behind this move. For example, it could be a journalist who sets up the account to tweet links to their articles every time they publish a new one. In fact the definition of “bot” itself is ambiguous. If software is designed to queue up tweets from the author and publish them verbatim at specific future times, is that a bot? What if the software augments or edits human-authored content instead of publishing it as-is? Automation is not the problem per se. Having accounts that are controlled by software— even software that is generating content automatically without human intervention— may be perfectly benign. The real questions are:

Who is really behind this account
Why are they using automation to generate content?

Motivation is ultimately unknowable from the outside but the first question can be tracked down to a name, either a person or corporate entity. Until such time as we have sentient AI creating its own social-media accounts, there is going to be someone behind the curtain, accountable for all content spewing from that account. Identity verification can point to that person pulling the levers. (For now we disregard the very real possibility of verified accounts being taken over or even deliberately resold to another actor by the rightful owner.) But that knowledge alone is not particularly useful. What would Twitter do with the information that “nickelbackfan123” is controlled by John Smith of New York, NY? Short of instituting a totalitarian social credit system along the lines of China to gate access to social networks, there is no basis for turning away Mr. Smith or treating him differently than any other customer. Even if ID verification revealed that the customer is a known persona non grata to the US government— fugitive on the FBI most-wanted list or an OFAC-sanctioned oligarch— Twitter has no positive obligation to participate in some collective punishment process by denying them an online presence. Social media presence is not a badge of civic integrity or proof of upstanding character, a conclusion entirely familiar to any one who has spent time online.

But there is one scenario where Twitter can and should preemptively block account creation. Suppose this is not the first account but 17th one Mr. Smith is creating? (Let’s posit that all the other accounts remain active, and this is not a case of starting over. After all in America we all stand for second-acts and personal reinvention.) On the other hand if one person is simultaneously in controlling dozens of accounts, the potential for abuse is high— especially when this link is not clear to followers. Looked another way: there is arguably no issue with a known employee of the Russian intelligence agency GRU registering for a Twitter account and using their presence to push disinformation. The danger comes not from the lone nut-job yelling at the cloud— that is an inevitable part of American politics— but that one person falsely amplifying their message using hundreds of seemingly independent sock-puppet accounts. In the context of information security, this is known as a “Sybil attack:” one actor masquerading as thousands of different actors in order to confuse or mislead systems where equal weight is given to every participant. That makes a compelling case for verified identities online: not stopping bad actors from creating an account, but stopping them from creating the second, third or perhaps the one-hundredth sock-puppet account.

There is no magic “safe” threshold for duplicate accounts; it varies from scenario to scenario. Insisting on a one-person-one-account policy is too restrictive and does not take into account— no pun intended— use of social media by companies, where one person may have to represent multiple brands in addition to maintaining their own personal presence. Even when restricting our attention to individuals, many prefer to maintain a separation between work and personal identities, with separate social media accounts for different facets of their life. Pet lovers often curate separate accounts for their favorite four-legged companions— often eclipsing their own “real” stream in popularity. If we contain multitudes, it is only fair that Twitter allow a multitude of accounts. In other cases, even two is too many. If someone is booted off the platform for violating terms of service, posting hate speech or threatening other participants, they should not be allowed to rejoin under another account. (Harder question: should all personal accounts associated with that person on the platform be shuttered? Does Fido the dog get to keep posting pictures if his companion just got booted for spreading election conspiracies under a different account?)

Beyond real-names

So far the discussion about verified identity focused only on the relationship between an online service such as Twitter and an individual or corporation registering for an account on that platform. But on social media platforms, the crucial connections run laterally, between different users of the platform as peers. It is one thing for Twitter to have some assurance about the real world identity connected to a user. What about other participants on the platform?

One does not have to look back too far to see a large scale experiment in answering that question in the affirmative and evaluating how well that turned out. Google Plus, the failed social networking experiment from designed to compete against Facebook, is today best remembered as the punchline to jokes— if it is remembered at all. But at the time of its launch, G+ was controversial for insisting on the use of “real names”. Of course the company had no way to enforce this at the time. Very few Google services interacted with real world identities, by requiring payment or interactions with existing financial institutions. (The use of a credit card suddenly allows for cross-checking names against those already verified by another institution such as a bank. While there is no requirement that the name on a credit card is identical to that appearing on government issued ID, it is a good proxy in most cases.) Absent such consistency checks, all that Google could do was insist that the same name be used across all services— if you are sending email as “John Smith” then your G+ name shall be John Smith. Given how ineffective this is at stopping users from fabricating names at the outset, there had to be a process for flagging accounts violating this rule. That policing function was naturally crowd-sourced to customers, with the expectation that G+ users would “snitch” on each other by escalating matters to customer support with a complaints when they spotted users with presumably fake names. While it is unclear if this half-baked implementation would have prevented G+ from turning into the cesspool of conspiracy theories and disinformation that Facebook evolved into, it certainly resulted in one predictable outcome: haphazard enforcement, with allegations of real-names violation used to harass individuals defending unpopular views. In a sense G+ combined the worst of both worlds: weak, low-quality identity verification by the platform provider coupled with a requirement for consistency between this “verified” identity known to Google and outward projection visible to other users.

Yet one can also imagine alternative designs that decouple identity verification from the freedom to use pseudonyms or assumed nicknames. Twitter could be 100% confident that the person who signed up is a certain John Smith from New York City in the offline world, while still allowing that customer to operate under a different name as far as all other users are concerned. This affords a reasonable compromise between providing freedom of expressing identity while discouraging abuse: if Mr. Smith is booted from the platform for threatening speech under a pseudonym, he is not coming back under any other pseudonym. (There is also the additional deterrence factor at play: if the behavior warrants referral to law enforcement, the platform can provide meaningful leads on the identity of the perpetrator, instead of an IP address to chase down.)

This model still raises some thorny questions. What if John Smith deliberately adopts the name of another person in their online profile to mislead other participants? What if the target of impersonation is a major investor or political figure whose perceived opinions could influence others and impact markets? Even the definition of “impersonation” is unclear. If someone is publishing stock advice under the pseudonym “NotWarrenBuffett,” is that parody or deliberate attempt at market manipulation? But these are well-known problems for existing social media platforms. Twitter has developed the blue checkmark scheme to cope with celebrity impostors: accounts with the blue check have been verified to be accurately stating their identity while those without are… presumably suspect?

That leads to one of the unintended side-effects of ubiquitous identity verification. Discouraging he use of pseudonyms (because participants using a pseudonym are relegated to second-class citizenship on the platform compared to those using their legal name) may have a chilling effect on expression. This is less a consequence of verified identities and more about the impact of making the outcome of that process prominently visible— the blue badge on your profile. Today the majority of Twitter accounts are not verified. While the presence of a blue badge elevates trust in a handful of accounts, its absence is not perceived as casting doubt on the credibility of the speaker. This is not necessarily by design, but an artifact of the difficulty of doing robust verification at scale (just ask cryptocurrency exchanges) especially for a service reliant on advertising revenue, where there is no guarantee the sunk cost can be recouped over the lifetime of the customer. In a world where most users sport the verification badge by agreeing to include their legal name in a public profile, those dynamics will get inverted: not disclosing your true identity will be seen as suspect and reduce the initial credibility assigned to the speaker. Given the level of disinformation circulating online, that increase skepticism may not be a bad outcome.

Smart-cards vs USB tokens: esoteric form factors (part III)

[continued from part II]

A smart-card is a smart-card is a…

Once we take away the literal interpretation of “card” out of smart-card, we can see the same underlying IC appearing in a host of other form factors. Here are some examples, roughly in historical order of appearance.

Trusted Platform Module or TPM. Defined by the Trusted Computing Group in the early 2000s, TPM provides separate hardware intended to act as root of trust and provide security services such as key management to the primary operating system. Their first major application was Bitlocker full disk-encryption in the ill-fated Windows Vista operating system. In an entertaining example of concepts coming full-circle, Windows 7 introduced the notion of virtual smart-cards which leverage the TPM to simulate smart-cards for scenarios such as remote authentication.
Electronic passports, or Machine Readable Travel Documents (MRTD) as they are officially designated by the standardizing body ICAO. This is an unusual scenario where NFC is the only interface to the chip; there is no contact plate to interface with vanilla card readers.
Embedded secure elements on mobile devices. While it is possible to view these as “TPM for phones” the integration model tends to be very different. In particular TPMs are tightly integrated into the boot process for PC to provide measured boot functionality, while eSE historically has been limited to a handful of scenarios such as payments. Nokia and other manufacturers experimented with SEs in the late 2000s, before Google Wallet first introduced it to the US market at scale on Android devices. Apple Pay followed suit a few years later. SE is effectively a dual-interface card. The “contact” interface is permanently connected to the mobile operating system (eg Android or iOS) while the contactless interface is hooked to the NFC antenna, with one caveat: there is an NFC controller in the middle actively involved in the communication. That controller can alter communications, for example directing traffic either to the embedded SE or the host operating system depending on which “card” application is being accessed. By contrast a vanilla card usually has no controller standing between antenna and the chip.

One vulnerability, multiple shapes

An entertaining example of shared heritage of hardware involves the ROCA (Return Of Coppersmith Attack) vulnerability from 2017. Researchers discovered a vulnerability in the RSA key generation logic in Infineon chips. This was not a classic case of randomness failure: the hardware did not lack for entropy for a change. Instead it had a faulty logic that over-constrained the large prime numbers chosen to create the RSA modulus. It turns out that moduli generated as the product of such primes were vulnerable to ancient attack that allowed efficient factoring, breaking the security of such keys. This one bug affected a wide range of products all based on the same underlying platform:

Infineon TPMs
Smart-cards, most notably the Estonian eID national identity card deployment
Yubicrap hardware tokens

One bug, one secure hardware platform, multiple manifestations.

Beyond form-factor: trusted user interface

There are some USB tokens with additional functionality that at least theoretically improve security compared to using vanilla cards. Most of these involve the introduction of a trusted user interface, augmenting the token with input/output devices so it can communicate information with the user independently of the host. Consider a scenario where digitally signing a message authorized the in transfer of money to a specified bank account. In addition to securing the secret key material that generates those signatures, one would have to be very careful about the messages being signed. An attacker does not need to have access to the key bits to wreak havoc, although that is certainly sufficient. They can trick the authorized owner of the key to sign a message with altered contents, diverting funds into a different account without ever gaining direct access to the private key.

This problem is difficult to solve with standard smart-cards, since the message being signed is delivered by the host system the card is attached to. On-card applications assume the message is correct. With dual-interface cards, there are some kludgy solutions involving the use of two different hosts. For examples the message can be streamed over contact interface initially, but must be confirmed from again over contactless interface. This allows using a second machine to verify that the first one did not submit a bogus message. Note that dual-interface is necessary for this, since the card can not otherwise detect the difference between initial delivery and secondary approval.

A much better solution is to equip the “card” with UI that can display pertinent details about the message being signed along with a mechanism for users to accept or cancel the operation. Feitian LCD PKI token is an example of a USB token with exactly that capability. It features a simple LCD display and buttons. On-board logic parses and extracts crucial fields from messages submitted for signing, such as amount, currency and recipient information in the case of a money transfer. Instead of immediately signing the message, the token displays a summary on the display and waits for affirmative approval from the user via one of the buttons. (Those buttons are physically part of the token itself. Malicious code running on the host can not fake a button press.) Similar ideas have been adopted for cryptocurrency hardware wallets, with one important difference: most of those wallets are not built around a previously validated smart-card platform. The difference is apparent in the sheer number of trivial & embarrassing vulnerabilities in such products that have long been eliminated in more mature market segments such as EMV.

The challenge for all of these designs is that they are necessarily application specific, because the token must make sense of the message and display summarize it in a succinct way for the user to make an informed decision. For all the time standards groups have spent putting angle brackets (XML) or curly braces (JSON) around data, there is no universal standard format for encoding information, much less its semantics. This requires at least some parsing and presentation logic hard-coded in the token itself, in a way that can not be influenced by the host. Otherwise if a malicious host can influence presentation on display, it could also alter the appearance of an unauthorized transaction to appear benign, defeating the whole point of getting having a human in the loop to sanity check what gets signed. Doing this in a general and secure way outside narrow applications such as cryptocurrency is an open problem.

Smart-cards vs USB tokens: optimizing for logical access (part II)

[continued from part I]

The problem with cards

Card form-factor or “ID1” as it is officially designated, is great for converged access: using the same credential for physical and logical access. Cards can open real doors in physical world life and virtual doors online. But they have an ungainly requirement in smart-card readers to function properly. For physical access this is barely noticed: there is a badge-reader mounted somewhere near the door that everyone uses. Employees only need to remember to bring their card, not their own reader. For logical access, it is more problematic. While card-readers can be quite compact these days, it is still one more peripheral to carry around. Every person who needs to access an online resource gated by the card— connect to the company VPN, login to a protected website or decrypt a piece of encrypted email— also needs a reader. For the disappearing breed of fixed-base equipment such as desktop PCs and workstation, one could simply have a reader permanently glued to every desk.

Yet the modern work-force is highly mobile. Many companies only issue laptops to their employees or at least expect that their personnel can function just equally effectively when they are away from the office— good luck getting to that workstation in the office during an extended pandemic lockdown. While a handful of laptops can be configured with built-in readers, the vast majority are going to require a reader dangerously dangling from the side, ready to get snapped off, hanging on to a USB-to-USB-C adapter since most readers were not designed for the newer ports. There is also the hardware compatibility issue to worry about since most manufacturers used to target Windows and rely on automatic driver installation through plug & play. (This is largely a solved problem today. Most readers comply with the CCID standard which has solid support through the open-source libccid package on Linux & OSX. Even special snowflake hardware is accompanied by drivers for Linux.)

USB tokens

This is where USB tokens come in handy. Tokens combine the reader and “card” into a single device. That might seem more complex and fragile but it actually simplifies detection from the point of view of the operating system. A reader may or may not have a card present, so the operating system must handle insertion and removal events. USB tokens appear as a reader with a card permanently present, removing one variable from the equation. In addition to featuring the same type of secure IC from a card, these devices also have a USB controller sitting in front of that chip to mediate communication. Usually the controller does nothing more fancy than “protocol arbitrage:” taking messages delivered over USB from the host and relaying them to the secure IC using ISO7816 which is the common protocol supported by smart-cards. But sometimes controllers can augment functionality, by presenting new interfaces such as simulating a keyboard to inject one-time passcodes.

Some examples such as Safenet eToken used in SWIFT authentication define their own card application standard. More often vendors choose to follow an existing standard that enjoys widespread software support. The US government PIV standard is a natural choice, given its ubiquity and out-of-box support on Windows without requiring any additional driver installation. In late 2000s GoldKey became an early example of offering a PIV implementation in USB token format; they also went to the trouble of getting FIPS140 certification for their hardware. Taglio PIVKey followed shortly afterwards with with USB tokens based on Feitian and later NXP platforms. Eventually other vendors such as Yubico copied the same idea by adding PIV functionality to their existing line.

In principle most USB tokens have no intrinsic functionality or security advantages over the equivalent hardware in card form. They are just a repackaging of the exact same secure IC running the exact same application. The difficulty of extracting secrets from the token can not be appreciably more difficult than extracting secrets from the equivalent card. If anything, the introduction of an additional controller to handle USB communication can only make matters worse. That controller is privy to sensitive information flowing across the interface. For example when the user enters a PIN to unlock the card, that information is visible to the USB controller. Likewise if the token is used for encryption, all decrypted messages pass through the secondary controller. So the USB token arguably has a larger attack surface involving an additional chip to target. Unlike the smart-card IC which has been developed with security and tamper resistance objectives, this second chip is a sitting duck.

Usability matters

Yet from a deployment perspective, USB tokens greatly simplify rolling out strong two-factor authentication in an enterprise. By eliminating the ungainly card reader, they improve usability. Employees only need to carry around one piece of hardware that is easily transported and could even remain permanently attached to their laptop, albeit at the risk of slightly increasing certain risks from compromised devices. Examples:

In 2014 Airbnb began issuing GoldKey PIV tokens to employees for production SSH access. Airbnb fleet is almost exclusively based on Macbooks. While the office space included fixed infrastructure such as monitors and docking stations at every desk, none of that would have been available for employees working from home or traveling— not uncommon for a company in the hospitality business.
Regulated cryptocurrency exchange & custodian Gemini issues PIV tokens to employees for access to restricted systems used for administering the exchange

The common denominator for both of these scenarios is that logical access takes precedence over physical access. Recall that CAC & PIV programs came to life under auspices of the US defense department. Defense use-cases are traditionally driven by a territorial mindset of controlling physical space and limiting infiltration of that area by flesh-and-blood adversaries. That makes sense when valuable assets are tangible objects, such as weapons, ammunition and power plants. Technology and finance companies have a different threat model: their most valuable assets are not material in nature. It is not the office building or literal bars of gold sitting in a vault that need to be defended; even companies directly trading in commodities such as gold and oil frequently outsource actual custody of raw material to third-parties. Instead the nightmare scenarios revolve around remote attackers getting access to digital assets: accessing customer information, stealing intellectual property or manipulating payment systems to inflict direct monetary damages. When locking down logical access is the overarching goal, usability and deployment advantages of tokens outweigh their incompatibility with physical-access infrastructure.

[continued]

Updated: June 20th, with examples of hardware token models

Smart-cards vs USB tokens: when form factor matters (part I)

Early on in the development of cryptography, it became clear that off-the-shelf hardware intended for general purpose computing was not well suited to safely managing secrets. Smart-cards were the answer. A miniature computer with its own modest CPU, modest amount of RAM and persistent storage, packaged into the shape an ordinary identity card. Unlike an ordinary PCs, these devices were not meant to be generic computers, with end-user choice of applications. Instead they were preprogrammed for a specific security application, such as credit-card payments in the case of chip & PIN or identity verification in the case of the US government identification programs.

Smart-cards solved a vexing problem in securing sensitive information such as credentials and secret keys: bits are inherently copyable. Smart-cards created an “oracle” abstraction for using secrets while making it difficult for attackers to extricate that secret out of the card. The point-of-sale terminal at the checkout counter can ask this oracle to authorize a payment or the badge-reader mounted next to a door could ask for proof that it was in possession of a specific credential. But they can not ask the card to cough up the underlying secret used in those protocols. Not even the legitimate owner of the card can do that, at least not by design. Tamper-resistance features are built into the hardware design to frustrate attempts to extract any information from the card beyond what the intended interface exposes.

“Insert or swipe card”

The original card form-factor proved convenient in many scenarios when credentials were already carried on passive media in the shape of cards. Consider the ubiquitous credit card. Digits of the card number, expiration date and identity of the cardholder are embossed on the card in raised lettering for the ultimate low-tech payment solution, where a mechanical reader presses the card imprint through carbon paper on a receipt. Magnetic stripes were added later for automated reading and slightly more discrete encoding of the same information. We can view these as two different “interfaces” to the card. Both involve retrieving information from a passive storage medium on the card. Both are easy targets for perpetrating fraud at scale. So when it came time for card networks to switch to a better cryptographic protocol to authorize payments, it made sense to keep the form factor constant while introducing a third interface to the card: the chip. By standardizing protocols for chip & PIN and incentivizing/strong-arming participants in the ecosystem to adopt newer mart-cards, payment industry opened up new market for secure IC manufacturers.

In fact EMV ended up introducing two additional interfaces, contact and contactless. As the name implies, first one used direct contact with a brass plate mounted on the card to communicate with the embedded chip. The latter used a wireless communication protocol, called NFC or Near Frequency Communication for shuttling the same bits back and forth. (The important point however is that in both cases those bits typically arrive at the same piece of hardware: while there exist some stacked designs where the card has two different chips operating independently, in most cases a single “dual-interface” IC handles traffic from both interfaces.)

Identity & access management

Identity badges followed a similar evolution. Federal employees in the US always had to wear and display their badges for access to controlled areas. When presidential directive HSPD-12 called for raising the bar on authentication in the early 2000s, the logical next step was upgrading the vanilla plastic cards to smart-cards that supported public-key authentication protocols. This is what the Common Access Card (CAC) and its later incarnation Personal Identity Validation (PIV) systems achieved.

With NFC, smart-cards can open doors at the swipe of a badge. But they can do a lot more when it comes to access control online. In fact the most common use of smart-cards prior to CAC/PIV programs were in the enterprise space. Companies operating in high-security environments issues smart-cards systems to employees for accessing information resources. Cards could be used to login to a PC, access websites through a browser and even encrypt email messages. In many ways the US government was behind the curve in adoption of smart-cards for logical access. Several European countries have already deployed large-scale electronic ID or eID systems for all citizens and offer government services online accessed using those cards.

Can you hear me now?

Another early application of smart-card technology for authentication appeared in wireless communication, with the ubiquitous SIM card. These cards hold the cryptographic secrets for authenticating the “handset”— in other words, a cell phone— to the wireless carrier providing service. Early SIM cards had the same dimensions as identity cards, called ID1. As cell-phones miniaturized to the point where some flip-phones were smaller than the card itself, SIM cards followed suit. The second generation of “mini-SIM” looked very much like a smart-card with the plastic surrounding the brass contract-plate trimmed. In fact many SIM cards are still delivered this way, as a full size card with a SIM-sized section that can be removed. Over time standards were developed for even smaller form-factors designated “micro” and “nano” SIM, chipping away at the empty space surrounding the contact plate.

This underscores the point that most of the space taken up by the card is wasted; it is inert plastic. All of the functionality is concentrated in a tiny area where the contact plate and secure IC are located. The rest of the card can be etched, punched or otherwise cut with no effect on functionality. (Incidentally this statement is not true of contactless cards because the NFC antenna runs along the circumference of the card. This is in order to maximize the surface area covered by antenna in order to maximize NFC performance: recall that these cards have no battery or other internal source of power. When operating in contactless mode, the chip relies on induction to draw power from the electromagnetic field generated by the reader. Antenna size is a major limiting factor, which is why most NFC tags and cards have a loop antenna that closely follows the external contours of its physical shape.)

[continued]

Filecoin, StorJ and the problem with decentralized storage (part II)

[continued from part I]

Quantifying reliability

Smart-contracts can not magically prevent hardware failures or compel a service provider at gun point to perform the advertised services. At best blockchains can facilitate contractual arrangements with a fairness criteria: the service provider gets paid if and only if they deliver the goods. Proofs-of-storage verified by a decentralized storage chain are an example of that model. It keeps service providers honest by making their revenue contingent on living up to the stated promise of storing customer data. But as the saying goes, past performance is no guarantee of future results. A service provider can produce the requisite proofs 99 times and then report all data is lost when it is time for the next one. This can happen because of an “honest” mistake or more troubling, because it is more profitable for decentralized providers to break existing contracts.

When it comes to honest mistakes—random hardware failures resulting in unrecoverable data loss— the guarantees that can be provided by decentralized storage alone are slightly weaker. This follows from a limitation with existing decentralized designs: their inability to express the reliability of storage systems, except in most rudimentary ways. All storage systems are subject to risks of hardware failure and data loss. That goes for AWS and Google. For all the sophistication of their custom-designed hardware they are still subject to laws of physics. There is still a mean-time-to-failure associated with every component. It follows must be a design in place to cope with those failures across the board, ranging from making regular backups to having diesel generators ready to kick in when the grid power fails. We take for granted the existence of this massive infrastructure behind the scenes when dealing with the likes of Amazon. There is no such guarantee for a random counterparty on the blockchain.

Filecoin uses a proof-of-replication intended to show that not only does the storage provider have the data but they have multiple copies. (Ironically that involves introducing even more work on the storage provider to format data for storage— otherwise they can fool the test by re-encrypting one copy into multiple replicas when necessary— further pushing the economics away from the allegedly zero marginal cost.) That may seem comparable to the redundancy of AWS but it is not. Five disks sitting in the same basement hooked up to the same PC can claim “5-way replication.” But it is not meaningful redundancy because all five copies are subject to correlated risk, one lightning-strike or ransomware infection away from total data loss. By comparison Google operates data-centers around the world and can afford to put each of those five copies in a different facility. Each one of those facilities still has a non-zero chance of burning to the ground or losing power during a natural disaster. As long as the locations are far enough from each other, those risks are largely uncorrelated. That key distinction is lost in the primitive notion of “replication” expressed by smart-contracts.

Unreliable by design

Reliability questions aside, there is a more troubling problem with the economics of decentralized storage. It may well be the most rational— read: profitable— strategy to operate an unreliable service deliberately designed to lose customer data. Here are two hypothetical examples to demonstrate the notion that on a blockchain, there is no success like failure.

Consider a storage system designed to store data and publish regular proofs of storage as promised, but with one catch: it would never return that data if the customer actually requested it. (From the customer perspective: you have backups but unbeknownst to you, they are unrecoverable.) Why would this design be more profitable? Because streaming a terabyte back to the customer is dominated by an entirely different type of operational expense than storing that terabyte in the first place: network bandwidth. It may well be profitable to set up a data storage operation in the middle-of-nowhere with cheap real-estate, abundant power but expensive bandwidth. Keeping data in storage while publishing the occasional proof involves very little bandwidth, because proof-of-storage protocols are very efficient in space. The only problem comes up if the customer actually wants their entire data streamed back. At that point a different cost structure involving network bandwidth comes into play and it may well be more profitable to walk away.

To make this more concrete: at the time of writing AWS charges ~1¢ per gigabyte per month for “infrequently accessed” data but 9¢ per gigabyte of data outbound over the network. Conveniently inbound traffic is free; uploading data to AWS costs nothing. As long as prevailing Filecoin market price is higher than S3 prices, one can operate a Filecoin storage miner on AWS to arbitrage the difference— this is exactly what DropBox used to do before figuring out how to operate its own datacenter. The only problem with this model is if the customer comes calling for their data too early or too often. In that case the bandwidth costs may well disrupt the profitability equation. If streaming the data back would lose money overall on that contract, the rational choice is to walk away.

Walking away from the contract for profit

Recall that storage providers are paid in funny money, namely the utility token associated with the blockchain. That currency is unlikely to work for purchasing anything in the real world and must be converted into dollars, euros or some other unit of measure accepted by the utility company to keep the datacenter lights on. That conversion in turn hinges on a volatile exchange rate. While there are reasonably mature markets in major cryptocurrencies such as Bitcoin and Ethereum, the tail-end of the ICO landscape is characterized by thin order-books and highly speculative trading. Against the backdrop of what amounts to an extreme version of FX risk, the service provider enters into a contract to store data for an extended period of time, effectively betting that the economics will work out. It need not be profitable today but perhaps it is projected to become profitable in the near future based on rosy forecasts of prices going to the moon. What happens if that bet proves incorrect? Again the rational choice is to walk away from the contract and drop all customer data.

For that matter, what happens when a better opportunity comes along? Suppose the exchange rate is stable or those risks are managed using a stablecoin while the market value of storage increases. Buyers are willing to pay more of the native currency per byte of data stashed away. Or another blockchain comes along, promising more profitable utilization of spare disk capacity. That may seem like great news for the storage provider except for one problem: they are stuck with existing customers paying lower rates negotiated earlier. Optimal choice is to renege on those commitments: delete existing customer data and reallocate the scarce space to higher-paying customers.

It is not clear if blockchain incentives can be tweaked to discourage this without creating unfavorable dynamics for honest service providers. Suppose we impose penalties on providers for abandoning storage contracts midway. These penalties can not be clawback provisions for past payments. The provider may well have already spent that money to cover operational expenses. For the same reason, it is not feasible to withhold payment until the very end of the contract period, without creating the risk that the buyer may walk away. Another option is requiring service providers to put up a surety bond. Before they are allow to participate in the ecosystem, they must set aside a lump sum on the blockchain held in escrow. These funds would be used to compensate any customers harmed by failure to honor storage contracts. But this has the effect of creating additional barriers to entry and locking away capital in a very unproductive way. Similarly the idea of taking monetary damages out of future earnings does not work. It seems plausible that if a service provider screws over Alice because Bob offered a better price, recurring fees paid by Bob should be diverted to compensate Alice. But the service provider can trivially circumvent that penalty while still doing business with Bob: just start over with a new identity completely unlinkable to that “other” provider who screwed over Alice. To paraphrase the New Yorker cartoon on identity: on the blockchain nobody knows you are a crook.

Reputation revisited

Readers may object: surely such an operation will go out of business once the market recognizes their modus operandi and no one is willing to entrust them with storing data? Aside from the fact that lack of identity on a blockchain renders it meaningless to go out-of-business, this posits there is such a thing as “reputation” buyers take into account when making decisions. The whole point of operating a storage market on chain is to allow customers to select the lowest bidder while relying on the smart-contract logic to guarantee that both sides hold up their side of the bargain. But if we are to invoke some fuzzy notion of reputation as selection criteria for service providers, why bother with a blockchain? Amazon, MSFT and Google have stellar reputations in delivering high-reliability, low-cost storage with no horror stories of customers randomly getting ripped-off because Google decided one day it would be more profitable to drop all of their files. Not to mention, legions of plaintiffs’ attorneys would be having a field day with any US company that reneges on contracts in such cavalier fashion, assuming a newly awakened FTC does not get in on the action first. There is no reason to accept the inefficiencies of a blockchain or invoke elaborate cryptographic proofs if reputation is a good enough proxy.

Off-by-one: the curious case of 2047-bit RSA keys

This is the story of an implementation bug discovered while operating an enterprise public-key infrastructure system. It is common in high-security scenarios for private keys to be stored on dedicated cryptographic hardware rather than managed as ordinary files on the commodity operating system. Smart-cards, hardware tokens and TPMs are examples of popular form factors. In this deployment, every employee was issued a USB token designed to connect to old-fashioned USB-A ports. USB tokens have a usability advantage over smart-cards in situations when most employees are using laptops: there is no separate card reader required, eliminating one piece carry lug around. The gadget presents itself to the operating system as a combined card reader with a card always present. Cards on the other hand have an edge for “converged access” scenarios involving both logical and physical access. Dual-interface cards with NFC can also be tapped against badge readers to open doors. (While it is possible to shoe-horn NFC into compact gadgets and this has been done, physical constraints on antenna size all but guarantee poor RF performance. Not to mention one decidedly low-tech but crucial aspect of an identity badge is having enough real-estate for the obligatory photograph and name of the employee spelled out in legible font.)

The first indication of something awry with the type of token used came from a simple utility rejecting the RSA public-key for a team member. That public-key had been part of a pair generated on the token, in keeping with the usual provisioning process that guarantees keys live on the token throughout their entire lifecycle. To recap that sequence:

Generate a key-pair of the desired characteristics, in this case 2048-bit RSA. This can be surprisingly slow with RSA on the order of 30 seconds, considering the hardware in question is powered by relatively modest SoCs under the hood.
Sign a certificate signing-request (CSR) containing the public-key. This is commonly done as a single operation at the time of key-generation, due to an implementation quirk: most card standards such as PIV require a certificate present before clients can use the card because they do not know have a way to identify private-keys in isolation.
Submit that CSR to the enterprise certificate authority to obtain a certificate. In principle certificates can be issued out of thin air. In reality most CA software can only accept a CSR containing the public-key of the subject, signed with the corresponding private key— and they will verify that criteria.
Load issued certificate on the token. At this point the token is ready for use in any scenario demanding PKI credentials, be it VPN, TLS client authentication in a web-browser, login to the operating system or disk-encryption.

On this particular token, that sequence resulted in a 2047-bit RSA key, one bit off the mark and falling short of the NIST recommendations to boot. A quick glance showed the provisioning process was not at fault. Key generation was executed on Windows using the tried-and-true certreq utility. (Provisioning new credentials is commonly under-specified compared to steady-state usage of existing credentials, and vendors often deign to publish software for Windows only.) That utility takes an INF file as configuration specifying key type to generate. Quick glance at the INF file showed the number 2048 had not bit-rotted into 2047.

Something else lower in the stack was ignoring those instructions or failing to generate keys according to the specifications. Looking through other public-keys in the system showed that this was not in fact an isolated case. The culprit appeared to be the key-generation logic on the card itself.

Recall that when we speak of a “2048-bit RSA key” the counts are referring to the size of the modulus. An RSA modulus is the product of two large primes of comparable size. Generating a 2048-bit RSA key then is done by generating two random primes of half that size at 1024-bits and multiplying those two values together.

There is one catch with this logic: the product of two numbers N-bits in size each is not guaranteed to be 2·N. That intuitive-sounding 2·N result is an upper-bound: the actual product can be 2·N or 2·N-1 bits. Here is an example involving tractable numbers and the more familiar decimal notation. It takes two digits to express the prime numbers 19 and 29. But multiplying them we get 19 * 29 = 551, a number spanning three digits instead of four. By contrast the product of two-digit primes 37 and 59 is 2183, which is four digits as expected.

Informally, we can say not all N-bit numbers are alike. Some are “small,” meaning they are close to the lower bound of 2^n-1, the smallest possible N-bit number. At the other end of the spectrum are “large” N-bit numbers, closer to the high end of the permissible range at 2ⁿ. Multiplying large N-bit numbers produces the expected 2N-bit product, while multiplying small ones can fall short of the goal.

RSA implementations commonly correct for this by setting some of the leading bits of the prime to 1, forcing each generated prime to be “large.” In other words, the random primes are not randomly selected from the full interval [2^n-1, 2ⁿ – 1] but a tighter interval that excludes “small” primes. (Why not roll the dice on the full interval and check the product after the fact? See earlier point about the time-consuming nature of RSA key generation. Starting over from scratch is expensive.) This is effectively an extension of logic that is already present for prime generation, namely setting the most significant bit to one. Otherwise the naive way to “choose random N-bit prime” by considering the entire interval [0, 2ⁿ– 1] can result in a much shorter prime, one that begins with an unfortunate run of leading zeroes. That guarantees failure: if one of the factors is strictly less than N bits, the final modulus can never hit the target of 2N bits.

So we know this token has a design flaw in prime generation that occasionally outputs 2047-bit modulus when asked for 2048. How occasional? If it were only setting the MSB to one and otherwise picking primes uniformly in the permissible interval, the error rate can be approximated by the probability that two random variables X and Y selected independently at random from the range [1, 2] have a product less than 2. Visualized geometrically, this is the area under the curve xy < 2 in a square region defined by sides in the same interval. That is a standard calculus problem that can be solved by integration. It predicts about 40% of RSA modulus falling short by one bit. That fraction is not consistent with the observed frequency which is closer to 1 in 10, an unlikely outcome from a Bayesian perspective if that was an accurate model for what the token is doing. (Note that if two leading bits were forced set on both primes, the error case is completely eliminated. From that perspective, the manufacturer was “off-by-one” according to more than one meaning of the phrase.)

So how damaging is this particular quirk to the security of RSA keys? It is certainly an embarrassing by virtue of how easy it should have been to catch this during testing. That does not reflect well on the quality assurance. Yet the difficulty of factoring 2047-bit keys is only marginally lower than that for full 2048-bits— which is to say far outside the range of currently known algorithms and computing power available to anyone outside the NSA. (Looked another way, forcing another bit to 1 when generating the primes also reduces the entropy.) Assuming this is the only bug in RSA key generation, there is no reason to throw away these tokens or lash out at the vendor. Also to be clear: this particular token was not susceptible to the ROCA vulnerability that affected all hardware using Infineon chips. In contrast to missing one from the generated key, Infineon RSA library produced full-size keys that were in fact much weaker than they appeared due to the special structure of the primes. That wreaked havoc on many large-scale systems, including latest generation Yubicrap (after a dubious switch from NXP to Infineon hardware) and the Estonian government electronic ID card system. In fact the irony of ROCA is proving that key length is far from being the only criteria for security. Due to the variable strategies used by Infineon to generate primes at different lengths, customer were better off using shorter RSA keys on the vulnerable hardware:

Screen Shot 2019-12-04 at 10.09.50 AM.png

(Figure 1, taken from the paper “The Return of Coppersmith’s Attack: Practical Factorization of Widely Used RSA Moduli”)

The counter-intuitive nature of ROCA is that the estimated worst-case factorization time (marked by the blue crosses above) does not increase in an orderly manner with key length. Instead there are sharp drops around 1000 and 2000 bits creating a sweet spot for the attack where the cost of recovering keys is drastically lower. Meanwhile the regions shaded yellow and orange correspond to key lengths where the attack is not feasible. To pick one example from the above graph, 1920-bit keys would not have been vulnerable to the factorization scheme described in the paper. Even 1800-bit keys would have been a better choice than the NIST-anointed choice of 2048. While 1800 keys were still susceptible to the attack, it would have required too much computing power—note the Y-axis has logarithmic scale— while 2048-bit keys were well within range of factoring with commodity hardware that can be leased from AWS.

It turns out that sometimes it is better missing the mark by one bit than hitting the target dead-on with the wrong algorithm.