The unbounded development team: promise and perils of AI coding assistants

Once upon a time, there was a software engineer we will call “Bob.” Bob worked in a large technology company that followed a traditional waterfall model of development, with separation of roles between program managers (“PM”) who defined the functional requirements and software design engineers (“SDE”) responsible for writing the code to turn those specs into reality. But program managers did not dream up specifications out of thin air: like every responsible corporate denizen, they were cognizant of so-called “resource constraints”— euphemism for available developer time.

“Features, quality, schedule; fix any two and the remaining one is automatically determined.”

Schedules are often driven by external factors such as synchronizing with an operating system release or hitting the shelves in time for the Christmas shopping season. Product teams have little influence over these hard deadlines. Meanwhile no self-respecting PM wants sacrifice quality. “Let’s ship something buggy with minimal testing that crashes half the time” is not a statement that a professional is supposed to write down on paper— although no doubt many have expressed that sentiment in triage meetings when hard decisions must be made as the release deadline is approaching. That leaves features as the knob easiest to tweak and this is where developer estimates comes in.

Bob had an interesting quirk. Whenever he was asked to guesstimate the time required to implement some proposed product feature, a strictly bimodal distribution was observed with two peaks:

  • Half-day
  • Two weeks

Over time a pattern emerged: features Bob approved of seemed to fit in an afternoon, even when they seemed quite complicated and daunting to other software engineers who preferred to steer clear of those implementation challenges. Other features that seemed straightforward on the surface were recast by Bob as a two-week long excursion into debugging tar-pits.

In Bob’s defense: estimating software schedules is a notoriously difficult problem that every large-scale project has suffered from since long before Fred Brooks made his immortal observations about the mythical man-month. Also Bob would not be the first or last engineer in history whose estimates were unduly influenced by a certain aesthetic judgment of the proposal. Highly bureaucratic software development shops prevalent in the 20th century relegated engineers to the role of errand boys/girls, tasked with the unglamorous job of “merely” implementing brilliant product visions thrown over the wall from the PM organization. Playing games with fudged schedule estimates becomes the primary means of influencing product direction in those dysfunctional environments. (It did not help that in these regimented organizations, program management and senior leadership were often drawn from non-technical backgrounds, lacking the credibility to call shenanigans on bogus estimates.)

Underlying this bizarre dynamic is the assumption that engineering time is scarce. There is an abundance of brilliant feature ideas that could delight customers or increase market share— if only their embodiment as running code can see the light of day.

AI coding assistants such as Codex and their integration into agentic development flows have now turned that wisdom on its head. It is easier than ever to go from idea to execution, from functional requirements to code-complete, with code that is actually complete: with a suite of unit tests, properly commented and separately documented. “Code is king” or “code always wins” used to be the thought-terminating cliché at large software companies: implying that a flawed, half-baked idea implemented in working code is better than the most elegant but currently theoretical idea on the drawing board. It is safe to say this code-cowboy mentality idolizing implementation over design is completely bankrupt: it is easier than ever to turn ideas into working applications. Those ideas need not even be expressed in some meticulous specification document with sections dedicated to covering every edge case. Vibe-coding is lowering the barrier to entry across the board, not just for implementation knowledge. When it comes to prompting LLMs, precision in writing still matters. Garbage-in-garbage-out still holds. But being able to specify requirements in a structured manner with UML or other formal language is not necessary. If anything the LLM can reverse-engineer that after the fact from its own implementation— in a hilarious twist on another tenet of the code-cowboy mindset: “the implementation is the spec.”

There is an irony here that LLMs have delivered in the blink of an eye the damage experts had once prognosticated/feared outsourcing could wreak on the industry: turn software implementation from being the most critical aspect of development practiced by rarefied talent to a commodity that could be shipped off to the lowest bidder in Bangalore. (The implications of this change on the “craft” of development are already being lamented.)

The jury is still out on whether flesh-and-blood developers can maintain that torrent of code generated by AI down the road, should old-fashioned manual modifications ever prove necessary. One school of thought expects a looming disaster: clueless engineers blindly shipping code they do not understand to production, knowing full well they are on the hook for troubleshooting when things go sideways. No doubt some are betting they will have long moved on and that responsibility will fall on the shoulders of some other unfortunate soul tasked with making sense of the imperfectly functioning yet perfectly incomprehensible code spat out by AI. Another view says such concerns are about as archaic as fretting over a developer having to jump in and hand-optimize or worse hand-correct assembly language generated by their compiler. In highly niche esoteric or niche of development where LLMs lack sufficient samples to train properly, it may well happen that human judgment is still necessary to achieve correctness. But for most engineers plan B for a misbehaving LLM assistant is asking a different LLM assistant to step in to debug its way out of the mess.

Software designers are now confronted with a variant of the soul-searching question: “If you knew you could not fail, what would you do?” For software projects, failure is and will remain very much an option. But its root causes are bound to be different. LLMs have taken the most romanticized view of failed projects off the table: ambitious product vision crashing against the hard reality of finite engineering time or limited developer expertise failing to rise to the occasion. Every one can now wield the power of a hundred-person team composed of mercenary engineers with expertise in every imaginable specialty from low-level systems programming to tweaking webpage layouts. That does not guarantee success but it does ensure the eventual outcome will take place on a scale grander than possible before. Good ideas will receive their due and reach their target market, no longer held back by mismatch between willpower and resources, or the vagaries of chancing upon the right VC willing to bankroll the operation.

At least, that is the charitable prediction. Downside is the same logic goes for terrible ideas too: they will also be executed to perfection. Perhaps those wildly skewed schedule estimates from engineer Bob served a purpose after all: they were a not-so-subtle signal that some proposed feature was a Bad Idea™ that has not been thought through properly. Notorious for sycophancy, AI coding assistants are the exact opposite of that critical mindset. They will not push-back. They will not question underlying assumptions or sanity-check the logic behind the product specification. They will simply carry out the instructions as prompted in what may well become the most pervasive example of “malicious compliance.” In the same way that social media bestowing everyone a bullhorn did not improve the average quality of discourse on the internet, giving every aspiring product manager the equivalent of 100 developers working around the clock to implement their whims is unlikely to yield the next game-changing application. If anything, making engineering costs “too cheap to meter” may result in companies doubling down on obviously failing ideas for strategic reasons. Imagine if Microsoft did not have to face the harsh reality of market discipline, but could keep iterating on Clippie or Vista indefinitely in hopes that the next iteration will finally take off. In a world where engineering time is scarce, companies are incentivized to cull failures early, to redirect precious resources towards more productive avenues. Those constraints disappear when shipping one more variant of the same bankrupt corporate shibboleth—think Google Buzz/Wave/Plus, Amazon Fire phone, Windows mobile platform, Apple Vision Pro— is just a matter of instructing the LLM to “think harder” and spin a few more hundreds hours iterating on the codebase.

CP

On the limits of binary interoperability

Once upon a time there was a diversity of CPU architectures: when this blogger worked on Internet Explorer in the late 90s, the browser had to work on all architectures that Windows NT shipped: Intel x86 being by far the most prevalent by market share, but also DEC Alpha, PowerPC, MIPS and SPARC. (There was also a version of IE for Apple Macintosh; while those also used PowerPC chips, that code base was completely different, effectively requiring an “emulation” layer to pretend that Win32 API was available on a foreign platform.)

2000s witnessed a massive consolidation of hardware. MSFT dropped SPARC, MIPS and PowerPC in quick succession. DEC Alpha— an architecture far ahead of its time— would limp along for a little longer because it was the only functioning 64-bit hardware at a time when MSFT was trying to port Windows to 64-bits. Code-named Project Sundown in a not-so-subtle dig at Sun Microsystems (then flying high with Java, long before the ignominious Oracle acquisition) this effort originally targeted the Intel Itanium. But working systems featuring the Itanium would not arrive until much later. Eventually Itanium would prove to be one of the biggest fiascoes in Intel’s history, earning the nickname “Itanic.” It survived in niche enterprise markets in zombie-like state until 2022 when it was finally put out to pasture.)

Even Apple once a pillar of the alliance pushing for PowerPC surprised everyone by switching to x86 during the 2000s. For a while it appeared that Intel was on top of the world. Suddenly “Intel inside” was true of laptops, workstations and servers. Only mobile devices requiring low-power CPUs had an alternative with ARM. AMD provided the only competition, but their hardware ISA was largely identical to x86. When AMD leapfrogged Intel by defining a successful 64-bit extension of x86 while Intel was running in circles with Itanium going nowhere, the Santa Clara company simply adopted AMD64 into what collectively became x64. Late 2010s then represented something of a reversal in fortunes for both Intel and x86/x64 architecture in general. Mobile devices began to outsell PCs, Apple jumped on the ARM bandwagon for its laptops, AWS began shilling for ARM in the datacenter and RISC-V made the leap from academic exercise into production hardware.

Old software meets new hardware

When a platform changes its underlying CPU architecture, there is a massive disruption to the available software ecosystem. By design most application development targets a specific hardware architecture. When the platform shifts to different hardware, existing software must be recompiled for the new hardware at a minimum. More often than not, there will be source code changes required, as in the case of going from targeting 32-bit Windows to 64-bit Windows.

Unlike operating system releases, this is not a process the platform owner can help with. Consider how conservative MSFT used to be about breaking changes. The company maintained an aggressive QA process to check that existing software released for previous versions of Windows continued to work the upcoming one. Sometimes that meant being bug-for-bug compatible with some behavior going back to Windows 3.1. Deliberate changes breaking existing software were rare. That is because Redmond understood the flywheel effect between operating systems and applications. The more applications exist for an operating system, the more motivated customers are to demand that OS for their next purchase. The more popular an OS becomes, the more likely independent software developers take notice and begin to write new applications or port their existing offerings to that OS. If one operating system has 90% of market share and its competitor only has 10%, the rational choice for ISVs is to prioritize the first one. That could take different shapes: It could be that the application only ships for that OS. The revenue from that marginal 10% expansion may not be enough to compensate for the steep increase in development cost. Especially when the platforms are fundamentally different, as in Windows and MacOS, there is often very little code-sharing possible and a lot of duplicated effort. Or they could deprioritize the effort, eventually releasing a greatly scaled-back version for the less-popular OS eventually, with fewer features and less attention to quality. (Exhibit A: MSFT Office for Mac.)

That makes hardware changes a treacherous time to navigate for platform owners, threatening to break this lucrative flywheel. If the new environment does not have the same rich ecosystem of third-party software support, customers may delay upgrading until their favorite apps are ported. Or worse, they may even consider switching to a competing platform: if migrating to 64-bit Windows is going to be a radical change and result in significant changes to the corporate IT platform, why not consider going all the way and switch to MacOS?

Enter binary compatibility. Windows has multiple compatibility layers for running applications that never expected to run on the current version of Windows. For example:

  • WoW64 (“Windows-on-Windows”) allows 32-bit binaries to execute on 64-bit Windows. This is the new iteration of the original “WoW” that helped run 16-bit DOS and Windows 3.X binaries on 32-bit Windows NT/2000/XP.
  • Similar layer exists for Windows on ARM to run binaries compiled for the x86/x64 architecture

In both cases, applications are given a synthesized (fabricated? falsified?) view of the operating system. There is a “Program Files” directory but it is not the one where native applications are installed. There is a “registry” with all the usual keys for controlling software settings, but it is not the same location referenced by native applications. Called registry redirection, this compatibility feature allows application written for a “foreign architecture” (for example 32-bit applications on native 64-bit operating system) to operate as if they were running on their original target without interfering with the rest of the system.

The phantom mini-driver on Windows ARM

Impressive as these tricks are, there are limits to interoperability between binaries compiled for different hardware architectures. Case in point: this blogger recently tried using a new brand of cryptographic hardware token on Windows ARM. Typically this requires installing the so-called mini-driver associated with that specific model. “Driver” terminology is misleading, as these are not the usual low-level device drivers running in kernel-mode. Modern smart-cards and USB hardware tokens look the same at that layer, such that they can be uniformly handled by a single PC/SC driver. Instead “mini-driver” abstraction is a user-mode extensibility mechanism introduced by the Windows smart-card stack. By creating a common abstraction that all hardware conforms to, it allows high-level applications such as web browsers and email clients to use cryptographic keys in a uniform manner, without worrying about vendor-specific implementation details behind how that key is stored. Effectively this becomes the “middleware” every vendor is responsible for providing if they want Windows applications to take advantage of their cryptographic widget.

In this case the vendor helpfully provided an MSI installer for their middleware, with one catch: there was only an x86 version. No problem, since Windows ARM can run x86 binaries after all. Sure enough, the installer ran without a hitch and after a few clicks reported successful installation. Except when it came time to use the hardware associated with the mini-driver: at that point, the system continued to fall back to the default PIV mini-driver instead of the vendor specific one. (This is a problem. As discussed in previous posts, the built-in PIV driver on Windows is effectively read-only. It can use existing objects on the card, but cannot handle key generation or provision new certificates.) That means the smart-card stack could not locate the more specific vendor driver with additional capabilities. Did the installer hallucinate its successful execution?

Internet Explorer and its doppelganger

Interview question from the late 2000s for software engineer candidates claiming high level of proficiency with Windows:

Why are there two versions of Internet Explorer on 64-bit Windows, one 32-bit and another 64-bit?”

The answer comes down to one of these limitations of interoperability: 64-bit processes can only load 64-bit DLLs in process. Likewise 32-bit processes can only load 32-bit DLLs. Before MSFT finally deprecated native-code extensions such as ActiveX and browser helper objects (“BHO”, historically one of the best ways to author malware for shadowing all browsing activity) it was not uncommon for websites to rely on the presence of such add-ons. For example Java and Adobe Flash were implemented this way. But historically such native extensions were all written for 32-bit Windows and they could not have successfully loaded into a 64-bit browser process.

MSFT is notorious for its deference to backwards compatibility and reluctance for breaking changes, for good reason— it did not turn out all that well when that principle was abandoned in one bizarre spell of hubris for Vista. So it was a foregone conclusion that 64-bit Windows must include 32-bit IE for down-level compatibility; there was no way hundreds of independent software publishers would get their act together in time to have 64-bit versions of their extensions ready when Vista hit the shelves. (Turns out they need not have worried; those copies were not exactly flying off the shelves.) The concession to backwards compatibility went much deeper than simply shipping two instances of the browser: 32-bit IE remained the default browser until a critical mass of those third-party extensions were ported to 64-bits, despite all of its limitations and lack of security hardening features available to true 64-bit applications. (From a consumer point of view, one could argue the unavailability of popular native code extensions is very much a feature. With Adobe Flash being a cesspool of critical vulnerabilities, having a browser that can never run Flash is an unambiguous win for security.)

Worth calling out: this was decidedly not the case for every Windows application. There were not two copies of Notepad or Solitaire; those are not extensible apps. Notepad did not offer a platform inviting third-party developers to add functionality packaged into DLLs meant for loading in-process.

Foreign architectures

This mismatch also explains the case of the “ghost” smart-card driver: those drivers are user-mode DLLs intended for in-process loading by the Windows cryptography API. Most applications interfacing with smart-cards only ship one version for the native architecture. For example there is exactly one copy of certreq for generating new certificate requests. On Windows ARM that is an ARM64 binary. It can not load DLLs written for the x64 architecture, even if those DLLs happen to present on the system.

That vindicates the installer: it was not hallucinating when it reported success. All of the necessary DLLs were indeed dropped in the right folder, which the installer was made to believe is the right place for shared library. Appropriate “registry” entries were created to inform Windows that a new smart-card driver was present and associated with cards of a particular model. But those changes happened in the simulated environment presented to x64 processes on Windows ARM. As far as native certreq ARM process is concerned, there is no evidence of this driver in the registry. (Manually creating the registry keys in the correct/non-redirected location will not solve the problem; it will only delay the inevitable failure point forward. The DLL those entries point to has the wrong architecture and will not load successfully.)

One could ask why the installer even proceeded in this situation: if “successful” completion of the install still results in an unusable setup guaranteed to fail at using the vendor hardware, why bother? But that assumes the installer is even aware that it is executing in a foreign architecture. Chances are when this installer was initially designed, the authors did not expect their product to be used on anything other than Intel x64 architecture. That is because binaries are by definition specific to one hardware platform and often a narrow range of operating system versions. The authors would have logically assumed there is no need to check for installation on ARM anymore than they had to check for RISC-V or PDP11: code execution would never reach that point if the condition being checked were true. It is redundant in the same way as checking if the system is currently experiencing a blue-screen.

The surprise is not why the installer incorrectly reported success. It is why the installer executed at all when invoked on a machine with completely unexpected hardware architecture where every instruction in the binary is being emulated to make it work. That is a testament to how well the ARM/x64 interoperability layer on Windows functions to sustain the illusion of a native Intel environment for emulated apps.

Post-script: limited workarounds

Interestingly MSFT did come up with a limited work-around for some of its core OS functionality, specifically the Windows shell, better known as “explorer.” The local predecessor of the more notorious Internet Explorer, this shell provides the GUI experience on Windows. Not surprisingly it has plenty of extensibility points for third-party developers to enhance or detract from the user-experience with additional functionality. For example, custom actions can be added to the right-click context menu such as upload this file to cloud drive, decompress this archive using some obscure format or scan for malware. Behind the scenes those additional functions are powered by COM components loaded in-process within explorer.

That model breaks when the shell is an x64 binary, while the COM object in question was compiled for x86. Unlike Internet Explorer where one can have a choice of two different versions to launch and even run them side-by-side, there can only be one Windows GUI active on the system. But this is where the use of COM standard also provides a general purpose solution. COM supports remote-procedure calls, “marshaling” data across process or even machine boundaries. By creating a 32-bit COM surrogate process, the shell can continue leveraging all of those 32-bit legacy extensions. They are now loaded in this separate 32-bit surrogate process and invoked cross-process from the hosting application. This trick is 100% transparent to the extension: as far as that COM object is concerned, it was loaded in-process by a 32-bit application exactly as the developers originally envisioned. (This is important because many extensions are not designed to deal with out-of-process calls.)

While that works for the shell, it does not work in general. Not every extensibility layer is designed with marshaling and remote-call capabilities in mind. Windows smart-card drivers are strictly based on a local API. While one could conceivably write a custom proxy to convert those into a remote call— indeed the ability forward smart-cards to a different machine over remote desktop proves this is possible— doing that is not as straightforward as opting into existing RPC capabilities of COM. Applications such as certreq do not have a magical interoperability layer to transparently use drivers written for a foreign architecture.

CP

** This story would become even more convoluted— and arguably an even better interview question— with additional updates to Internet Explorer. When IE8 introduced a multi-process architecture after being beaten to the punch by Google Chrome, its original design involved 64-bit “browser” hosting 32-bit “renderers” for individual tabs. Later versions introduced a an enhanced protected-mode that also used 64-bit renderers to leverage security improvements native to x64 binaries. This clunky and unsafe-by-default model persisted all the way through IE11. It was not until Edge that 32-bit browsers and native code extensions were finally deprecated.