Unanswered questions about the Flame certificate forgery (2/2)

Second question: why bother with MD5 collision in the first place?

As explained in the SummerCon presentation, this particular forgery depended on exquisite timing. First the expiration date of the certificate is exactly one year from the moment it issued, measured in seconds. Second, the serial number of the certificate issued by the TS licensing server is a function of two semi-predictable variables:

  • Number of other certificates issued before
  • Current time, in millisecond resolution

This poses quite a challenge for an attacker seeking to exploit a collision against MD5. Recall that the attack depends on crafting two certificates with identical hash– one is the certificate that the attacker predicts the licensing server will issue, the second one is the certificate that the attacker actually wants to obtain. Ability to find collisions in MD5 ensures that the signature on the first one is as good as a signature on the second one. But “predict” is the operative keyword here: after all it is up to the issuing authority to decide on the serial number and expiration date on the certificate that will be issued. Randomly chosen serial numbers would have trivially defeated the attack. (In fact this is such a good idea that randomization is required for so-called extended validation class of certificates that breathed new life into the defunct CA business model by allowing companies a lot more money to perform the same basic due-diligence they should have performed for every SSL certificate.) Instead the licensing server used current time and an incrementing counter to generate the serial number.

That is a lucky break for the attacker: guess the values correctly when crafting the collision, and the licensing server has unwittingly issued a code-signing certificate made out in the name of MSFT. Making life easier is the ability to do dry-runs of the attack, acquiring  licenses freely to observe the counter and current “time” according to the server: TS license server will happily issue “licenses” to any enterprise user in a certain Active Directory group, eg every employee of the company that has a valid business reason to access Windows server. But even with this advantage, it is a fragile process because it requires making projections about the variables above, namely:

  • How many other users will have obtained a license between now and when the collision is going to be submitted.
  • Exactly what time– down to the millisecond– that collision will be processed. Note this is not when the attacker submits the request to the licensing server; it’s when the server gets around to issuing the certificate. All types of latency, from simple network jitter, to OS scheduling delays could throw this off.

The first one is easy to work around: plan the attack over a weekend or official holiday, and perhaps at a time of day when few legitimate users are going to be requesting licenses to interfere with the attack. Assuming long term persistence on the enterprise network, one can observe the fluctuation in demand over time to spot such opportunities. A different strategy is to target an organization where the TS licensing server is idle by design– perhaps it has just been set up but is not yet activated, or it is being decommissioned in favor of a different licensing model. In this case the assumption is that the creators of Flame have resources to compromise lots of different organizations, so they can pick one with the right TS licensing setup.) In all cases, timing the attack to take place against an idle server can partially help with the second problem as well– if the server is only responding to the attacker, there is no concern about high load on the server leading to variable processing times. But the jury is out on whether milli-second accuracy can be obtained this way, so the attacker may still have to try multiple times. As a comparison point: even with a perfectly sequential serial ID and one-second time resolution, the forgery attack from 2008 required several tries.

The other side of the equation is the cost for each MD5 collision– this is the computational resources “wasted” as a result of guessing incorrectly. Latest estimates put this on the order of $10K-$100K per collision given publicly known techniques. It is likely that the Flame authors had access to novel cryptanalytic attacks lowering that cost, in addition to large amounts of computing power that might wash it away altogether. (As economists like to point out, CPU cycles are a non-replenishable resource. Once the upfront investment in a couple million servers is paid for, one might as well put them to use spinning on a problem.)

Another option is to mess with the licensing server’s notion of “time.” In most modern systems including Windows, the current clock is obtained from the network using a protocol such as NTP. If the Flame authors had access to exploits against the time synchronization scheme, they could “roll-back” the time to try again after each failed attempt with same timestamp. But this will not necessarily reduce the number of collisions required, since the number of issued certificates still increments regardless of success/fail, requiring a different collision to pair up with the expected serial number the CA will choose.

So far this discussion has only considered attacks that treat the licensing server as a black box– attacker interactions are restricted to submitting seemingly-valid license requests to the server and perhaps attacking its surrounding environment, such as disrupting clock synchronization. What they are not doing is outright break into that machine, exercise the signing capability directly on any chosen message or export the private key for future use. Why? Licensing servers are not a particularly sensitive/critical part of infrastructure. They are a regular Windows server, configured in a specific role. They are not specially hardened for better security or closely monitored for any sign of trouble. (After all the raison d’etre of the system is to guard MSFT revenue source, it has no intrinsic value to the enterprise.) Few enterprises use hardware security modules or other key-management techniques.

In addition to maintaining long-term presence in the enterprise network, it is also a safe bet that the Flame creators have access to a significant collection of vulnerabilities, both public and zero-day. Why would they not go after the target directly by compromising the licensing server? Recall that due to the flawed setup of the MSFT trust chain, any organization anywhere in the world operating a TS licensing server will do. In the highly unlikely event that the first enterprise they tried has a clue about security and runs bullet-proof Windows servers, the attackers need only move on to a different one. Surely some enterprise somewhere in the world is running a vulnerable licensing server ripe for the extraction of private keys? Armed with the key, the attackers need not waste any time trying to find MD5 collisions, they can issue the certificate directly. (The conspiratorially minded could argue this is exactly what happened, with the additional twist that a single MD5 collision was chosen *after the fact* to create the appearance that it was an interactive attack. This has the nice property of providing misdirection: actual timestamp on the certificate with colliding hash is no longer meaningful. It could have been back- or forward-dated to the point that trying to mine the logs from the CA during the alleged incident turns up noise.)

Finally there is the question of disclosing offensive capabilities. By attacking the licensing server directly, attackers risk burning a 0-day in case they are found out. (In the worst case scenario– more likely is that the server has known unpatched vulnerabilities with readily available weaponized exploits.) This is not exactly the end of the world. Stuxnet contained multiple 0-days and the authors presumably took into calculation the possibility that one day the malware samples will be reverse engineered. Not to mention that thanks to the likes of VUPEN, everyone and their uncle has access to Windows zero-days these days. Finding one being used in the wild might prompt an out-of-band patch from MSFT and some temporary indignation, but then everyone moves on.

Using an MD5 collision and embedding such a certificate in Flame on the other hand reveals an entirely capability: access to novel cryptographic techniques that are unknown in the civilian world. For an organization interested in trying to hide its capabilities, this is more revealing than the loss of a couple of Windows zero-days that could have blended into the background-noise of vulnerability research.

CP

Unanswered questions about the Flame certificate forgery (1/2)

1. Which enterprise was it?

The authors of Flame exploited a series of design flaws in the way MSFT operated terminal services licensing to obtain a code-signing certificate impersonating MSFT. This step involved interacting with some TS licensing server that was already setup to issue these licenses, which also double as code-signing certificate due to a blatant violation of least-privilege principle. Typically such licensing servers are operated by large enterprises, to simplify the problem of granting licenses to their users to connect to Windows server.

That raises an obvious question: which organization was it? While each licensing server receives a certificate with the same non-descript name Microsoft LSRA PA (that does not in fact identify the organization it belongs to, in yet another example of bad design)  they each have unique signing keys. As long as MSFT was keeping logs for the subordinate CA certificates issued, it is possible to identify conclusively which enterprise the forged certificate chains up through. So far MSFT has not publicly named the organization, nor have the implicated parties come forward of their own accord. It is entirely plausible that the organization did not realize it was their TS licensing infrastructure used to facilitate the Flame attack. This is similar to the pair of semi-conductor firms that had to be alerted their signatures were found on Stuxnet— how many organizations proactively checked their own CA against the forged Flame certificate chain?  But it is equally likely that MSFT or perhaps a law-enforcement agency would have reached out by now (keeping in mind this could be an organization anywhere in the world) and let these folks know they were the unlucky ones. So far this appears to be handled quietly, perhaps to protect the “guilty”– for most enterprises, having experienced a security breach is something to be swept under the rug. This is unfortunate, because it would have been possible to infer something about the modus operandi of the Flame creators based on why they picked that organization. That brings us to the second question.

[continued]

CP

Taking security seriously, while failing spectacularly at it

It’s become nearly impossible to state “we take security [of our users] seriously” with a straight face.

This week witnessed three different companies suffer three unrelated incidents (two of them sharing the same underlying root cause) and all of them resorting to the same damage control cliches.

Here is the MSFT mea culpa on the MD5 collision debacle covered earlier on this blog:

“Microsoft takes the security of its customers seriously”

Here is LinkedIn, responding to 6.5M unsalted password hashes floating around– as an aside, nothing like starting the day discovering your own password hash in a contraband file shared by colleagues before receiving a data breach notification from the clueless web site in question:

“We take the security of our members very seriously.”

Here is the online dating site eHarmony spinning the leak of 1.5 million passwords:

“The security of our customers’ information is extremely important to us, and we do not take this situation lightly.”

With companies taking security so seriously, attackers hardly need anyone to take a more light-hearted approach. One can imagine the Joker asking: “Why so serious?”

It is difficult to know from the outside how these vulnerabilities came about. (Full disclosure: the author is ex-MSFT employee,  but was not involved in terminal services and possesses no information about the incident beyond what is available from public sources.) Were they unknown to the organization? Is that because that aspect of the system was never reviewed by qualified personnel? Or was it missed because the reviewers assumed this was an acceptable solution? Or perhaps the issue was properly flagged by the security team but postponed/neglected by an engineering organization single-mindedly focused on schedule? It is safe to say there will be a post-mortem at MSFT. If LinkedIn and eHarmony have a culture of accountability and learning from mistakes, perhaps they will also conduct their own internal investigations. But the useful information and frank conclusions reached in such exercises rarely leave the safe confines of the company– and that is assuming the leadership can resist the temptation to turn that post-mortem into an exercise in whitewashing.

So we can only draw conclusions based on public information, without the benefit of mitigating circumstances to absolve the gift. And these conclusions are not flattering for any of the companies involved.

LinkedIn and Harmony committed a major design flaw in storing passwords using unsalted hashes with SHA1. This is a clear-cut case of bad judgment that even the CISSP-friendly excuse “we follow industry best practices” is not applicable–  the importance of salting to frustrate dictionary attacks is well known. For comparison the crypt scheme used for storing UNIX passwords dates to the 1980s and has salting. (Both sites could have also chosen intentionally slow hashes– for example by iterating the underlying cryptographic primitive– to further reduce the rate of password recovery, but this is a relatively small omission in comparison.) The fact that both sites a experienced a breach resulting in access to passwords is almost less of a PR black-eye given the frequency of such mishaps across the industry, compared to the skeletons in the closet revealed as a result of the breach.

By comparison, the MSFT incident is far more nuanced and complex compared to the LinkedIn/eHarmony basic failure to understand cryptography. It is not a single mistake, but a pattern of persistently poor judgment that resulted in the ability of Flame authors to create a forged signing certificate:

  • Using a subordinate CA chaining up the MSFT product root for terminal server licensing. The product root is extremely valuable because it is trusted for code signing. As explained by Ryan Hurst, there was already another MSFT root that would have been perfectly suitable for TS purpose.
  • Leaving the code-signing EKU in the certificate, even though terminals server licensing appears to have no conceivable scenario that requires this feature.
  • Attempting to mitigate the excessive privilege from aforementioned mistake by using critical extensions, such as the Hydra OID. (As an aside: “Hydra” was the ancient codename for terminal services in NT4 days.) This would have worked, if Windows XP correctly implemented the concept of critical extension– if the application does not recognize it, it must reject the certificate as invalid. On XP Authenticode certificates still past muster even in the presence of unknown critical extensions– that is almost one third of Windows installations, given the relatively “recent” age of Windows 7 and abysmal uptake of its predecessor Vista.
  • Issuing certificates using the MD5 algorithm in 2010 (!!)– that is nearly two years after a dramatic demonstration that certificate forgery is feasible against CAs using MD5 and six years after the publication of first results of MD5 collision.
  • Using a sequential serial number for each certificate issue, which is critical to enabling the MD5 collision attack by allowing attackers to predict exactly what the CA will be signing.
Individually any of these five blunders could be attributed to routine oversight. By itself any mistake would have been survivable, as long as remaining defenses were correctly implemented. But collectively botching all of them undermines the credibility of the  “taking security seriously” PR spin.
CP

Economics and incentives: terminal services licensing vulnerability

Or: “Don’t confuse the interests of the user with those of the software vendor.”

Researchers recently identified one of the zero days used by the ultra-sophisticated malware Flame, likely created by nation states for the express purpose of industrial-espionage. It turns out to be  yet another PKI vulnerability that allows attackers to create malicious code seemingly signed by MSFT. Carrying the MSFT signature is the ultimate seal of approval for software on Windows. Such applications typically get unique privileges or are given a free-pass on security checks due to the implicit trust placed in their pedigree. In response MSFT swiftly moved to revoke the issuing certificate authorities associated with these forged signatures. (The cruel irony, as pointed out by many commentators, is that Windows maintains those trust roots via Windows Update.)

While PKI vulnerabilities in Windows have devastating consequences, they are not new. Moxie’s find in 2002 regarding the failure to check key usage properly during chain validation was equally damaging, although it was reported the old-fashioned way by a security researcher instead of being reverse-engineered out of malware in the wild. What is unusual about this case is that a large chunk of the problem can be attributed to operational errors in the way MSFT handled licensing. My former colleague Ryan Hurst gives a great breakdown of the root-causes behind the security advisory #2718704 . Consistent with other epic failures of security on this magnitude, it was not a single isolated mistake but a series of engineering incompetence, poor design judgment and operational laziness (in choosing the path of least resistance for managing a new certificate authority) that culminated in the current crisis.

There is a different way to look at this vulnerability in terms of incentives. It’s axiomatic that software is not perfect– not in functionality or in security. Once this is acknowledged, the problem becomes one of managing risks rather than trying to eliminate the completely. Given that every piece of code we deploy might harbor some unknown but catastrophic vulnerability, the question comes down to whether the benefits derived from that application outweigh the risks.

Case in point: web browsers are one of the most closely scrutinized and heavily targeted pieces of software. In the most recent Pwn2Own competition at CanSecWest in Vancouver BC, every single major web browser was successfully exploited. Yet few security professionally would realistically  suggest that users stop visiting websites altogether. (More cautious among us may advise disabling a few bells-and-whistles such as Flash– a notoriously buggy component whose main “value proposition” is the promise of animated hamsters on every web page.) The benefits of the web applications are so compelling and obvious that we have invested massively in building more resilient browsers, so that users can continue to accrue those benefits with lower risk.

That brings up the question of what great value users derive from the software implicated in the latest debacle: Terminal Services Licensing Server. That calls for a slight detour into the arcane world of Windows pricing for enterprises. Most home users’ experience with Windows software licensing is thankfully limited to occasionally entering the 25-letter product activation code when installing a new copy of the operating system. (There is also the occasional false-positive nag when the OS decides that its environment has changed because the user installed some new hardware for example, and subtly accuses the user of pirating software.) Each such key corresponds to one license, either included as part of the purchase of the machine or perhaps of buying a new copy of the OS at retail price.

Enterprises have far more complex models when it comes to paying for software, especially “server class” software where a set of centralized, shared machines is offering applications– such as printing, file sharing or remote desktop– to a population of users. In these scenarios the enterprise pays not just for installing the server, but also for each user connecting to that server to access its functionality.

It’s instructive comparing this with an open-source model: licensing schemes appear to be fundamentally incompatible with “free” software models, where the adjective is used in the sense of rights accorded the user rather than monetary terms. A vendor could not charge users more to run that software on a machine with 16 processors compared to 4, or to serve a thousand users instead of a hundred. Any such artificial restrictions would be “edited” out of the source code in a matter of minutes. Open-source software is only limited by the capabilities of the hardware it runs on and fully “saturates” them to their full potential. Proprietary software on the other hand can be artificially throttled to implement “price discrimination” where two customers can end up paying a different amount for the exact same service. (Note the software distributed is identical whether it is set to serving 10 or 100 users. One could argue that scaling the application to handle the higher load itself is a cost for the developer. In that case this extra cost is being recouped by over-charging the heaviest users and under-charging the lower volume ones.)

The catch is, doing that requires additional machinery. Left to its own devices, software will run at full speed and service all requests. It takes extra effort to slow it down or limit it to only doing its job after the check has cleared. This is where the likes of Terminal Services Licensing comes in. Its mission in life is to make sure that a Windows server  product only does its “serving” to those clients that have paid for it. This is undoubtedly valuable to MSFT, in terms of enforcing the desired pricing model and collecting additional revenue. From the customer perspective, it could go either way. Economically it may lead to a more efficient outcome for the enterprise if paying based on actual server demand measured in real-time is cheaper than trying to estimate peak load and buy licenses upfront. Of course it could also be construed as nickel-and-diming these users: pay several thousand dollars for the server and then pay another $10-$50 for each client.

The Flame zero-day incident shows that economics is not the only dimension to this problem. Confronted with the risk of getting the enterprise 0wned, the prudent CSO would opt for paying more for software upfront, instead of worrying about one more useless component that creates additional opportunties for attack without any redeeming value– if they had the choice. All but the largest enterprises lack bargaining power when negotiating such deals. In fact lack of meaningful choice is a recurring theme: licensing software is proprietary (although the protocol has been disclosed, compliments of the consent decree) and there are no “better” or “more secure” alternatives that can be deployed as alternative.

More importantly, the flaws in Terminal Server Licensing create negative externalities for everyone,  due to the way MSFT implemented licensing by granting sub-ordinate CA status to enterprises. If some enterprise were to mismanage its signing privileges, it is no surprise that its own users can be compromised. That part is expected, and even creates proper incentive for each enterprise to secure its own signing infrastructure. But due to the stunning design flaw, malware signed by one misbehaving enterprise certificate authority (or “licensing server” to use the preferred terminology of software toll-extraction) can be used to every other enterprise and even home users who have no business accessing any server software.

That is a suboptimal risk tradeoff.

CP