On the limits of binary interoperability


Once upon a time there was a diversity of CPU architectures: when this blogger worked on Internet Explorer in the late 90s, the browser had to work on all architectures that Windows NT shipped: Intel x86 being by far the most prevalent by market share, but also DEC Alpha, PowerPC, MIPS and SPARC. (There was also a version of IE for Apple Macintosh; while those also used PowerPC chips, that code base was completely different, effectively requiring an “emulation” layer to pretend that Win32 API was available on a foreign platform.)

2000s witnessed a massive consolidation of hardware. MSFT dropped SPARC, MIPS and PowerPC in quick succession. DEC Alpha— an architecture far ahead of its time— would limp along for a little longer because it was the only functioning 64-bit hardware at a time when MSFT was trying to port Windows to 64-bits. Code-named Project Sundown in a not-so-subtle dig at Sun Microsystems (then flying high with Java, long before the ignominious Oracle acquisition) this effort originally targeted the Intel Itanium. But working systems featuring the Itanium would not arrive until much later. Eventually Itanium would prove to be one of the biggest fiascoes in Intel’s history, earning the nickname “Itanic.” It survived in niche enterprise markets in zombie-like state until 2022 when it was finally put out to pasture.)

Even Apple once a pillar of the alliance pushing for PowerPC surprised everyone by switching to x86 during the 2000s. For a while it appeared that Intel was on top of the world. Suddenly “Intel inside” was true of laptops, workstations and servers. Only mobile devices requiring low-power CPUs had an alternative with ARM. AMD provided the only competition, but their hardware ISA was largely identical to x86. When AMD leapfrogged Intel by defining a successful 64-bit extension of x86 while Intel was running in circles with Itanium going nowhere, the Santa Clara company simply adopted AMD64 into what collectively became x64. Late 2010s then represented something of a reversal in fortunes for both Intel and x86/x64 architecture in general. Mobile devices began to outsell PCs, Apple jumped on the ARM bandwagon for its laptops, AWS began shilling for ARM in the datacenter and RISC-V made the leap from academic exercise into production hardware.

Old software meets new hardware

When a platform changes its underlying CPU architecture, there is a massive disruption to the available software ecosystem. By design most application development targets a specific hardware architecture. When the platform shifts to different hardware, existing software must be recompiled for the new hardware at a minimum. More often than not, there will be source code changes required, as in the case of going from targeting 32-bit Windows to 64-bit Windows.

Unlike operating system releases, this is not a process the platform owner can help with. Consider how conservative MSFT used to be about breaking changes. The company maintained an aggressive QA process to check that existing software released for previous versions of Windows continued to work the upcoming one. Sometimes that meant being bug-for-bug compatible with some behavior going back to Windows 3.1. Deliberate changes breaking existing software were rare. That is because Redmond understood the flywheel effect between operating systems and applications. The more applications exist for an operating system, the more motivated customers are to demand that OS for their next purchase. The more popular an OS becomes, the more likely independent software developers take notice and begin to write new applications or port their existing offerings to that OS. If one operating system has 90% of market share and its competitor only has 10%, the rational choice for ISVs is to prioritize the first one. That could take different shapes: It could be that the application only ships for that OS. The revenue from that marginal 10% expansion may not be enough to compensate for the steep increase in development cost. Especially when the platforms are fundamentally different, as in Windows and MacOS, there is often very little code-sharing possible and a lot of duplicated effort. Or they could deprioritize the effort, eventually releasing a greatly scaled-back version for the less-popular OS eventually, with fewer features and less attention to quality. (Exhibit A: MSFT Office for Mac.)

That makes hardware changes a treacherous time to navigate for platform owners, threatening to break this lucrative flywheel. If the new environment does not have the same rich ecosystem of third-party software support, customers may delay upgrading until their favorite apps are ported. Or worse, they may even consider switching to a competing platform: if migrating to 64-bit Windows is going to be a radical change and result in significant changes to the corporate IT platform, why not consider going all the way and switch to MacOS?

Enter binary compatibility. Windows has multiple compatibility layers for running applications that never expected to run on the current version of Windows. For example:

  • WoW64 (“Windows-on-Windows”) allows 32-bit binaries to execute on 64-bit Windows. This is the new iteration of the original “WoW” that helped run 16-bit DOS and Windows 3.X binaries on 32-bit Windows NT/2000/XP.
  • Similar layer exists for Windows on ARM to run binaries compiled for the x86/x64 architecture

In both cases, applications are given a synthesized (fabricated? falsified?) view of the operating system. There is a “Program Files” directory but it is not the one where native applications are installed. There is a “registry” with all the usual keys for controlling software settings, but it is not the same location referenced by native applications. Called registry redirection, this compatibility feature allows application written for a “foreign architecture” (for example 32-bit applications on native 64-bit operating system) to operate as if they were running on their original target without interfering with the rest of the system.

The phantom mini-driver on Windows ARM

Impressive as these tricks are, there are limits to interoperability between binaries compiled for different hardware architectures. Case in point: this blogger recently tried using a new brand of cryptographic hardware token on Windows ARM. Typically this requires installing the so-called mini-driver associated with that specific model. “Driver” terminology is misleading, as these are not the usual low-level device drivers running in kernel-mode. Modern smart-cards and USB hardware tokens look the same at that layer, such that they can be uniformly handled by a single PC/SC driver. Instead “mini-driver” abstraction is a user-mode extensibility mechanism introduced by the Windows smart-card stack. By creating a common abstraction that all hardware conforms to, it allows high-level applications such as web browsers and email clients to use cryptographic keys in a uniform manner, without worrying about vendor-specific implementation details behind how that key is stored. Effectively this becomes the “middleware” every vendor is responsible for providing if they want Windows applications to take advantage of their cryptographic widget.

In this case the vendor helpfully provided an MSI installer for their middleware, with one catch: there was only an x86 version. No problem, since Windows ARM can run x86 binaries after all. Sure enough, the installer ran without a hitch and after a few clicks reported successful installation. Except when it came time to use the hardware associated with the mini-driver: at that point, the system continued to fall back to the default PIV mini-driver instead of the vendor specific one. (This is a problem. As discussed in previous posts, the built-in PIV driver on Windows is effectively read-only. It can use existing objects on the card, but cannot handle key generation or provision new certificates.) That means the smart-card stack could not locate the more specific vendor driver with additional capabilities. Did the installer hallucinate its successful execution?

Internet Explorer and its doppelganger

Interview question from the late 2000s for software engineer candidates claiming high level of proficiency with Windows:

Why are there two versions of Internet Explorer on 64-bit Windows, one 32-bit and another 64-bit?”

The answer comes down to one of these limitations of interoperability: 64-bit processes can only load 64-bit DLLs in process. Likewise 32-bit processes can only load 32-bit DLLs. Before MSFT finally deprecated native-code extensions such as ActiveX and browser helper objects (“BHO”, historically one of the best ways to author malware for shadowing all browsing activity) it was not uncommon for websites to rely on the presence of such add-ons. For example Java and Adobe Flash were implemented this way. But historically such native extensions were all written for 32-bit Windows and they could not have successfully loaded into a 64-bit browser process.

MSFT is notorious for its deference to backwards compatibility and reluctance for breaking changes, for good reason— it did not turn out all that well when that principle was abandoned in one bizarre spell of hubris for Vista. So it was a foregone conclusion that 64-bit Windows must include 32-bit IE for down-level compatibility; there was no way hundreds of independent software publishers would get their act together in time to have 64-bit versions of their extensions ready when Vista hit the shelves. (Turns out they need not have worried; those copies were not exactly flying off the shelves.) The concession to backwards compatibility went much deeper than simply shipping two instances of the browser: 32-bit IE remained the default browser until a critical mass of those third-party extensions were ported to 64-bits, despite all of its limitations and lack of security hardening features available to true 64-bit applications. (From a consumer point of view, one could argue the unavailability of popular native code extensions is very much a feature. With Adobe Flash being a cesspool of critical vulnerabilities, having a browser that can never run Flash is an unambiguous win for security.)

Worth calling out: this was decidedly not the case for every Windows application. There were not two copies of Notepad or Solitaire; those are not extensible apps. Notepad did not offer a platform inviting third-party developers to add functionality packaged into DLLs meant for loading in-process.

Foreign architectures

This mismatch also explains the case of the “ghost” smart-card driver: those drivers are user-mode DLLs intended for in-process loading by the Windows cryptography API. Most applications interfacing with smart-cards only ship one version for the native architecture. For example there is exactly one copy of certreq for generating new certificate requests. On Windows ARM that is an ARM64 binary. It can not load DLLs written for the x64 architecture, even if those DLLs happen to present on the system.

That vindicates the installer: it was not hallucinating when it reported success. All of the necessary DLLs were indeed dropped in the right folder, which the installer was made to believe is the right place for shared library. Appropriate “registry” entries were created to inform Windows that a new smart-card driver was present and associated with cards of a particular model. But those changes happened in the simulated environment presented to x64 processes on Windows ARM. As far as native certreq ARM process is concerned, there is no evidence of this driver in the registry. (Manually creating the registry keys in the correct/non-redirected location will not solve the problem; it will only delay the inevitable failure point forward. The DLL those entries point to has the wrong architecture and will not load successfully.)

One could ask why the installer even proceeded in this situation: if “successful” completion of the install still results in an unusable setup guaranteed to fail at using the vendor hardware, why bother? But that assumes the installer is even aware that it is executing in a foreign architecture. Chances are when this installer was initially designed, the authors did not expect their product to be used on anything other than Intel x64 architecture. That is because binaries are by definition specific to one hardware platform and often a narrow range of operating system versions. The authors would have logically assumed there is no need to check for installation on ARM anymore than they had to check for RISC-V or PDP11: code execution would never reach that point if the condition being checked were true. It is redundant in the same way as checking if the system is currently experiencing a blue-screen.

The surprise is not why the installer incorrectly reported success. It is why the installer executed at all when invoked on a machine with completely unexpected hardware architecture where every instruction in the binary is being emulated to make it work. That is a testament to how well the ARM/x64 interoperability layer on Windows functions to sustain the illusion of a native Intel environment for emulated apps.

Post-script: limited workarounds

Interestingly MSFT did come up with a limited work-around for some of its core OS functionality, specifically the Windows shell, better known as “explorer.” The local predecessor of the more notorious Internet Explorer, this shell provides the GUI experience on Windows. Not surprisingly it has plenty of extensibility points for third-party developers to enhance or detract from the user-experience with additional functionality. For example, custom actions can be added to the right-click context menu such as upload this file to cloud drive, decompress this archive using some obscure format or scan for malware. Behind the scenes those additional functions are powered by COM components loaded in-process within explorer.

That model breaks when the shell is an x64 binary, while the COM object in question was compiled for x86. Unlike Internet Explorer where one can have a choice of two different versions to launch and even run them side-by-side, there can only be one Windows GUI active on the system. But this is where the use of COM standard also provides a general purpose solution. COM supports remote-procedure calls, “marshaling” data across process or even machine boundaries. By creating a 32-bit COM surrogate process, the shell can continue leveraging all of those 32-bit legacy extensions. They are now loaded in this separate 32-bit surrogate process and invoked cross-process from the hosting application. This trick is 100% transparent to the extension: as far as that COM object is concerned, it was loaded in-process by a 32-bit application exactly as the developers originally envisioned. (This is important because many extensions are not designed to deal with out-of-process calls.)

While that works for the shell, it does not work in general. Not every extensibility layer is designed with marshaling and remote-call capabilities in mind. Windows smart-card drivers are strictly based on a local API. While one could conceivably write a custom proxy to convert those into a remote call— indeed the ability forward smart-cards to a different machine over remote desktop proves this is possible— doing that is not as straightforward as opting into existing RPC capabilities of COM. Applications such as certreq do not have a magical interoperability layer to transparently use drivers written for a foreign architecture.

CP

** This story would become even more convoluted— and arguably an even better interview question— with additional updates to Internet Explorer. When IE8 introduced a multi-process architecture after being beaten to the punch by Google Chrome, its original design involved 64-bit “browser” hosting 32-bit “renderers” for individual tabs. Later versions introduced a an enhanced protected-mode that also used 64-bit renderers to leverage security improvements native to x64 binaries. This clunky and unsafe-by-default model persisted all the way through IE11. It was not until Edge that 32-bit browsers and native code extensions were finally deprecated.

Leave a comment