Blame it on Bitcoin: ransomware and regulation [part I]

[Full disclosure: this blogger worked for a regulated US cryptocurrency exchange]

The disruptive ransomware attack on Colonial Pipeline and subsequent revelations of an even larger ransom paid earlier by the insurer CNA has renewed calls for increased regulation of cryptocurrency. Predictably, an expanding chorus of critics has revived the time-honored “blame-it-on-Bitcoin” school of thought. This post takes a closer look at how additional regulation may impact ransomware. Coincidentally following the “pipeline” model of Colonial, we will look at the flow of ransomware funds from their origin to the recipient and ask how unilateral action by regulators could successfully cut off the flow.

Here is a quick recap on the flow of funds in the aftermath of a ransomware attack:

The business experiencing the ransomware attack makes decides that paying the ransom is the most effective way of restoring operations
They contract with a third-party service to negotiate with the perpetrators and facilitate payment. (Some organizations may choose to handle this on their own but most companies lack know-how in handling cryptocurrency.)
Bitcoin for payment is sourced, typically from a cryptocurrency exchange
Funds are sent to the recipient by broadcasting a Bitcoin transaction. Miners confirm the transaction by including it in a block
Perpetrators convert their Bitcoin into another cryptocurrency or fiat money, also by using a cryptocurrency exchange

What can be accomplished with additional regulation for each step?

Victims: the case against capitulation

Some have argued that the act of paying the ransom could be illegal depending on the country where perpetrators are based. Regardless of whether it is covered by existing laws on the books, there is an economic case for intervention based on the “greater good” of the ecosystem. While paying up may be the expedient or even optimal course of action for one individual victim in isolation, it creates negative externalities downstream for other individuals. For starters, each payment further incentivizes similar attacks by the same threat actor or copycat groups, by proving the viability of a business model built on ransomware. More importantly it provides direct funding to the perpetrator which can be used to purchase additional capabilities— such as acquiring zero-day exploits on the black market— that enable an even more damaging attacks in the future. There is a spectrum of tools from economic theory for addressing negative externalities: fines, taxation and more creative solutions such as cap-and-trade for carbon emissions. In all cases, the objective is to reflect externalities back on the actor responsible for generating them in the first place so they are factored into the cost/benefit analysis. For example companies that opt to pay the ransom may be required to contribute an equivalent amount to a fund created for combatting ransomware. That pool of funds will be earmarked to support law enforcement activities against ransomware groups (for example, taking down their C&C infrastructure) or directly invest in promising technologies that can help accelerate recovery for companies targeted in future attacks.

Middlemen: negotiators and facilitators

Extending the same logic to intermediaries, the US could impose additional economic costs on any company profiting from ransomware activity. Even as unwitting participants, these intermediaries have interests aligned with ransomware actors: more attacks and more payments to arrange, more business for the negotiators.

Granted similar criticism can be leveled at the information security industry: more viruses, more business opportunities for antivirus vendors hawking products by playing up fears of virus infections destroying PCs. Yet few would seriously argue that antivirus solutions are somehow aiding and abetting the underground malware economy. Reputable AV companies can earn a living even when their customers suffer no adverse consequences— in fact that is their ideal steady state arrangement. AV is a preventive technology aimed at stopping malware infections before they occur, not arranging for wealth transfer from affected customer to perpetrator after the fact.

To the extent a ransomware negotiation or payment facilitator service exists as a distinct industry segment, it derives its revenues entirely from successful attacks. This is the equivalent of a mercenary fire-department that only gets paid each time they put out a fire. While these firemen may not take up arson on the side, their interests are not aligned with homeowners they are ostensibly protecting. Real life fire-departments care about building codes and functioning sprinklers because they would like to see as few fires as possible in their community. Our hypothetical mercenary FD has no such incentive, and prefers that the neighborhood burn down frequently, with the added benefit that unlike real firefighters, they are taking on no personal risk while combatting blazes. Even if we are willing to tolerate such a business as necessity (because in the online world there is no real equivalent to the community supported fire-department to save the day) we can impose additional costs on these transactions to compensate for their externalities.

Marketplaces: acquiring cryptocurrency

Moving downstream and looking at the acquisition of bitcoin for the ransom payment, the regulatory landscape gets even more complicated. There are dozens of venues where bitcoin can be purchased in exchange for fiat. Some are online such as Coinbase, others operate offline. Until 2019 the exchange LocalBitcoins arranged for buyers/sellers to meet in real-life and trade using cash. Some exchanges are regulated and implement KYC (Know-Your-Customer) programs to verify real-world identity before onboarding new customers. These exchanges are selective in who they are willing to admit, and they will screen against the OFAC sanction list. Other exchanges are based off-shore, ignore US regulations and are willing to do business with anyone with a heartbeat. There are even decentralized exchanges that operate autonomously on blockchains, but these are only typically capable of trading cryptocurrencies against each other. They can operate in fiat indirectly using stablecoins (cryptocurrencies designed to track the price of a currency such as dollars or euro) but that does not help a first time buyer such as Colonial starting out with a bundle of fiat.

It is difficult to see how additional regulation could be effective in cutting access to all imaginable avenues for a motivated buyer intent on making a ransomware payment. There is already self-selection in effect when it comes to compliance. Regulated exchanges are do not want to be involved in ransomware payments in any capacity, not even as the unwitting platform where funds are sourced. While the purchase may generate a small commission in trading-fees, the reputational risk and PR impact of making headlines for the wrong reason far exceeds any such short-term gain. On the other hand, it is difficult to see how exchanges can stop an otherwise legitimate customer from diverting funds acquired on platform for a ransomware payment. First, there is no a priori reason to block reputable US companies— such as Colonial or CNA— from trading on a cryptocurrency exchange under their authentic corporate identity. Considering that Tesla, Square and Microstrategy have included BTC in the mix for their corporate treasury holdings, it is not unexpected that other CFOs may want to jump in and start building positions. More importantly, buyers are not filling out forms to declare the ostensible purpose of their trade (“for ransomware payment”) when they place orders. Even if an exchange were to block known addresses for ransomware payments— and many regulated exchanges follow OFAC lists of sanctioned blockchain addresses— the customer can simply move funds to a private unhosted wallet first before moving them to the eventual payout address. On the other hand, exchanges can trace funds movements and kick-out customers if they are found to have engaged in ransomware payments in any capacity. While this is a laudable goal for the compliance department, given the infrequency of ransomware payments, being permanently barred from the exchange is hardly consequential for the buyer.

Of greater concern is the game of jurisdictional arbitrage played by offshore exchanges including Binance— the single largest exchange by volume. These exchanges claim to operate outside the reach of US regulations based on their location, accompanied by half-hearted and often imperfect attempts at excluding US customers from transacting on their platform. The challenge is not one of having sufficient regulations but convincing these offshore exchanges that they are not outside the purview of US financial regulations.

Trying to hold other participants in the marketplace accountable for the trade makes even less sense; their involvement is even more peripheral than the trading platform. Trade execution by necessity involves identifiable counter-parties on the other side who received USD in exchange for parting with their bitcoin. But the identity of those counter-parties is a roll of the dice: it could be a high-frequency trading hedge fund working as market-maker to provide liquidity, an individual investor cashing out gains on their portfolio or a large fund slowly reducing their long exposure to bitcoin. None of them have any inkling of what their counterparty will eventually do with the funds once they leave the exchange.

[continued – part II]

Matching gifts with cryptocurrency: the fine-print in contracts (part II)

[continued from part I]

Avoiding contractual scams

Time to revisit a question glossed over earlier. While the smart-contract sketched above sounds good on paper, skeptical donors will be rightfully asking a pragmatic question: how can they be confident that a matching-gifts campaign launched by a sponsor at some blockchain address is in fact operating according to these rules? If the contract functions as described above, all is well. But what if the contract has a backdoor designed to divert funds to a private wallet, instead of delivering them to the nonprofit? Since the trigger for such malicious logic could be arbitrary, past performance is no guarantee of future results. For example, the contract may act honestly for the first few donations— perhaps those arranged by accomplices of the supporter to help build confidence— only to start embezzling funds after a certain trigger is hit.

Transparency of blockchains goes a long way to alleviate these risks. In particular, the sponsor can publish the source code of the contract, along with the version of the Solidity compiler used to convert that code into low-level EVM byte-code. Automated tools already exist for verifying this correspondence; see Etherscan for examples of verified contracts. This reduces the problem of verifying contract behavior to source code auditing, which is somewhat more tractable than reverse engineering EVM byte-code. There are still shenanigans possible at source code level, as starkly demonstrated by the Solidity Underhanded Contest, a competition to come up with the most creative backdoor possible that can stay undetected by human reviewers. In practice there would be one “canonical” matching campaign contract, already audited and in widespread use, similar to the canonical multi-sig wallet contract. Establishing the authenticity of an alleged matching campaign boils down to verifying that a copy of that exact contract has been deployed. (There is an interesting edge-case involving the CREATE2 extension: until recently, Ethereum contracts were considered immutable. A contract at a given address could self-destruct but it could not be replaced by another contract. This is no longer the case for contract launched via CREATE2, so it is important to also verify that the contract was deployed using the standard, original CREATE instruction or alternatively that its initialization code has no external dependencies that may differ between multiple invocations.)

In addition to verifying contract source code, it is necessary to inspect parameters such as the destination address for the nonprofit receiving donations, committed match ratio (in case this is not hard-coded as one-for-one in code) and funding level of the contract.

Difficult case: Bitcoin

In contrast to Ethereum’s full-fledged programming language for smart contracts. Bitcoin has a far more limited scripting language to express spending conditions. This makes it difficult to achieve parity with the Ethereum implementation of a matching campaign. A more limited notion of “matching” can be achieved by leveraging different signatures types in Bitcoin, but at the expense of reverting to all-or-none semantics. Similar to the prior art in Ethereum, the sponsor is only on the hook for matching donations if one or more other participants materialize with donations exceeding a threshold. Below that threshold, nothing happens.

There is also precedent for constructing this type of crowd-funding transaction. To make this more concrete, suppose the sponsor is willing to match donations up to 1 bitcoin to a specific charity. As proof of her commitment, the sponsor creates and signs a partial transaction:

Input: 1 BTC spend-candidate belonging to the sponsor, signed by the sponsor using the SIGHASH_ANYONECANPAY flag. [For reference: summary of different Bitcoin signature modes]
Output: 2 BTC addressed to the charity

As it stands, this TX is bogus: consensus rules require that the inputs provide an amount of funds greater than or equal to the outputs, with the difference going to miners as incentive to include the transaction. Since 1 < 2, this transaction can never get mined— as it stands. But this is where use of SIGHASH_ANYONECANPAY comes in; additional inputs can be added to the “source” side of the transaction, as long as outputs on the “destination” remain the same. This allows building the transaction up, layer by layer, with more participants chipping in with a signed input of their own, until the total inputs add up to 2 BTC— or ideally slightly more than 2 BTC to make room for transaction fees. Once that threshold is reached, the transaction can be broadcast.

Compared to the Ethereum case, this construction comes with some caveats and limitations. First the activity of building up to the full amount must be coordinated off-chain, for example using an old-fashioned website. It is not possible to broadcast a partial TX, have it sit in mempool while collecting additional inputs. An invalid TX with insufficient funds will not be relayed around the network. This stands in contrast to Ethereum where all future donations can be processed on chain once the contract is launched. Second, the sponsor can bail out at any time, by broadcasting a different transaction that spends the source input in a different way. It’s not even considered a double-spend since there were no other valid transactions involving that input as far as mempool is concerned. (While the input address can be constrained using time-locks in its redeem script, the same restriction will also apply to the donation. A fully funded TX will also get stuck and not deliver any funds to the nonprofit until the time-lock expires.)

Change is tricky

As sketched above, the arrangement also requires exactly sized inputs, because there is no meaningful way to redirect change. Consider the situation after a first volunteer pledges 0.9 BTC, leaving the campaign just 0.1 BTC away from the goalpost. If a second volunteer as a UTXO worth 0.2 BTC, they would have to first chip away a separate 0.1BTC output first. Directly feeding in the 0.2 BTC UTXO would result in half the funds getting wasted as mining fees. The outputs are already fixed and agreed upon by previous signatures; there is no way for the last volunteer to redirect any excess contribution to a change address. This can be addressed using a different signature scheme combining SIGHASH_ANYONECANPAY and SIGHASH_SINGLE. This latter flag indicates that a given input is signing only its corresponding output, rather than all outputs. That allows each donor (other than the sponsor) to also designate a change address corresponding to their contribution, in case they only want to donate a fraction of one of their UTXO. Unfortunately this arrangement also allows the sponsor to abscond with funds. Since SIGHASH_SINGLE means individual donors are not in fact validating the first output— ostensibly going to the nonprofit— a dishonest sponsor can collect additional inputs, switch the first output to send 2BTC to a private wallet and broadcast that altered transaction.

A variant of that problem can happen even with an honest sponsor an unwitting contributors racing each other. Suppose Alice and Bob both come across a partially signed transaction that has garnered multiple donations, but has fallen 0.1 BTC short of the goal to trigger the matching promise. Both spontaneously decide to chip in 0.1 BTC to push that campaign across the finish line. If they both sign using SIGHASH_ANYONECANPAY and attempt to broadcast the now valid transaction, there is an opportunity for an unscrupulous miner to steal funds. Instead of considering these conflicting TX as double-spends and only accepting one, an opportunistic miner could merge contributions from Alice and Bob into a single TX. Since both signatures only commit to outputs but expressly allow additional inputs, this merge will not invalidate signatures. The result is a new TX where the input side has 0.1BTC excess, which will line the miners’ pockets as excess transaction fee instead of reaching the charitable organization. One mitigation is to ensure that anyone who is adding the “final” input that will trigger the donation use SIGHASH_ALL to cover all inputs, preventing any other inputs from being merged. The problem with that logic is it assumes global coordination among participants. In a public campaign, typically no one can know in advance when the funding objectives are reached. (Suppose the campaign was 0.2 BTC short of the goal and three people each decide to chip in 0.1 BTC, each individually assuming that the threshold is still not met after their contribution.)

For this reason, this construction is only suitable for “small group matching”— a single, large donation in response to a pledge for a comparable amount from the sponsor. Alice creates the 1 → 2 original transaction pledging one-for-one matching, Bob adds his own exact 1 BTC input, signs all inputs/outputs prior to broadcasting the transaction. If Carol happened to be doing the same, these two transactions could not be merged and either Bob or Carol’s attempt would fail without any loss of funds. For now the construction of a more efficient structure for incrementally raising funds with matching campaign on Bitcoin remains an open problem.

Matching gifts with cryptocurrency: scripting for a good cause (part I)

On-chain philanthropy

Cryptocurrencies are programmable money: they allow specifying rules and conditions around how funds are transmitted directly in the monetary system itself. Instead of relying on contracts and attorneys, blockchains can encode policies that were previously written in legalese. For example one can earmark a pool of funds to be locked until at a certain date, require two signatures to withdraw and only sent to a specific recipient (The last one is only possible with Ethereum and similarly expressive blockchains. Such covenants are not expressible in the rudimentary scripting language of Bitcoin yet, although extensions are known that would make it possible.) While the recent rise of ransomware and incidents such as the Colonial Pipeline closure have put the spotlight on corrosive uses of cryptocurrency— irreversible payments for criminal activity— the same capabilities can also be put to more beneficial uses. Here we explore an example of implementing a philanthropic campaign using Ethereum or Bitcoin.

Quick primer on matching gifts. “Gift” in this context refers to donations made to a charitable organization. A matching campaign is a commitment by an entity to make additional contributions using its own funds, in some proportion to every donation received by the nonprofit, subject to some limits and qualifications. For example it is very common for large companies in America to offer 1:1 matching for donations to 501c3 organizations. (510c3 is a reference to the section of US tax code granting special recognition to nonprofits that meet specific criteria, and allowing their donors to receive favorable tax treatment.) As a data-point: in the early 2000s MSFT offered dollar-for-dollar matching up to $12,000 per year per full-time employee.² Such corporate matching policies are continuous. Other campaigns may be one time. For example a philanthropist may issue a one-time challenge to the leadership of a nonprofit organization, offering to double the impact of any funds received during a specific time window.

Hard-coded, immutable generosity

Blockchains are good at codifying this type of conditional logic— “I will contribute when someone else does.” To better illustrate how such commitments can be implemented, we will consider two different blockchains. First up is Ethereum, which is the easy case thanks to its Turing-complete programming language. Second one is Bitcoin where the implementation gets tricky and somewhat kludgy.

In both cases there are some simplifying assumptions to make the problem more tractable for a blog post:

This is a one-time campaign focused on one specific charitable organization. That side-steps certain problems, including the reverse-mapping blockchain addresses to 501c3 organization and trying to decide whether a given transfer qualifies for the campaign.
- The nonprofit in question has a well-known blockchain address for receiving donations. This applies to more and more organizations as they partner with intermediaries for accepting cryptocurrency donations or directly publish such addresses on their website. For example Heifer International advertises addresses for Bitcoin, Ethereum, Litecoin, Stellar and Ripple.
There is a cap on total funds available for matching but no per-donor quotas. Otherwise we would have a difficult problem trying to decide when a given participant has reached their quota. It is trivial to create an unbounded number of blockchain addresses, and there is no easy way to infer whether two seemingly independent donations originating from different addresses were in fact associated with the same entity.

Easy case: Ethereum

Recall that the objective is a matching campaign enforced by blockchain rules. Specifically we want to move beyond solutions that involve continuous monitoring and active intervention. For example, an outside observer could watch for all transactions to the nonprofit donation address and publish additional transactions of equivalent amount corresponding to each one. That would be a direct translation of how matching campaigns work off-chain: the donor makes a contribution and then proves to the campaign sponsor that they made a donation of so many dollars, usually by means of a receipt issued by the nonprofit. After verifying the evidence, the sponsor writes out a check of their own to the same organization.

While there is nothing wrong with carrying the same arrangement over into a blockchain, we can do better: in particular, the sponsor can make a commitment once such that they have no way to renege on the promise, neither by outright defection or inadvertent failure to keep up their end of the bargain when it comes to writing checks in a timely manner. With Ethereum, the sponsor can create a smart-contract once and fund it with the maximum amount they are willing to watch. Once the contract is launched, the remainder of the campaign runs on auto-pilot: immutable contract logic enforced by blockchain rules sees to it that every qualifying donation is properly matched.

This is hardly a novel observation; in fact there is at least one example of such a contract announced on Reddit and launched on-chain. Unfortunately the campaign does not seem to have gotten much traction since launch and not a single Wei has been passed over to the intended nonprofit recipient. Part of the problem is a design choice in the contract to set an all-or-nothing threshold. Similar to crowd-funding campaigns such as KickStarter, matching is conditioned on a threshold of donations being reached, after which the entire amount is matched. Here is an alternative design premised on processing and matching donations immediately as they arrive:

As before, the sponsor launches a smart-contract on Ethereum and funds it with the maximum amount of ETH pledged. It could be also be funded by an ERC-20 token or stablecoin such as GUSD to avoid price volatility associated with cryptocurrencies.
Donors interested in taking advantage of the campaign send funds to the contract, not directly to the charitable organization. (This raises an important trust question that will be addressed later: how can they be confident the contract is going to work as promised instead of embezzling the funds?)
When incoming funds are received, either the “receive” or “fallback” function for the contract is invoked. That code will inspect the incoming amount and use one of the send/transfer/call functions to transfer twice the amount to the well-known address of the nonprofit. Note that each donation is processed individually and delivered immediately to the recipient. There is no waiting on fulfillment of some global conditions around total funds raised.
There is one edge case to address in processing incoming funds: what if the contract does not have sufficient funds left to process the full amount? The naive logic sketched above will fail and depending on the attempted transfer mechanism, cause the entire transaction to be reverted, resulting in no changes other than wasted gas. An alternative is to match up to maximum amount possible and still forward the entire donation. But one could argue that fails on fairness criteria: the sender was promised one-for-one amplification on their donation. Perhaps they would have redirected their funds elsewhere had they known 100% match was no longer available. (This can happen due to contention between multiple donations, through no fault of either party. Suppose the contract has 3 ETH left for matching, and two people send 2 ETH contributions in the same Ethereum block. Depending on the order those transactions are mined into a block, one will be fully matched while the other will run into this edge-case.)
A better solution is to match contributions up to the remaining amount in the contract and return any unmatched portion back to the caller to reconsider their options. They can always resend funds directly to the nonprofit with the understanding that no match will be forthcoming, or they can wait for another opportunity. This means that once the contract runs out of funds, all incoming donations bounce back to senders. (Of course the sponsor can always resuscitate the campaign with a fresh injection of capital, to accommodate higher than expected demand. But the arrival of such additional funding can not be enforced by the contract.)
Finally the smart-contract can also impose a deadline such as 1 month for the campaign to avoid sponsor funds being locked up indefinitely. The end-date for the campaign must be specified when the contract is created and remain immutable, to prevent the sponsor from bailing out earlier. Once the deadline has elapsed, the sponsor can call a function on the contract to withdraw remaining funds. After that point, all donation attempts will bounce. This is preferable to simply destroying the contract using the Ethereum self-destruct operation; if the contract were to disappear altogether, incoming donations would be black-holed and irretrievably lost.

The next post in this series will tackle the problem of establishing trust in such a smart-contract and the challenges of replicating the same arrangement using more primitive bitcoin scripts.

[continued – part II]

[2] That may appear extremely generous or fiscally irresponsible, depending on your perspective: a public company with fifty-thousand blue-badge employees effectively signed up a six-billion dollar liability in the worst-case scenario. Given human nature and lackluster reputation of tech community for giving, actual expenditures never amounted to more than a small fraction of this upper bound— an ironic commentary for a company founded by the preeminent philanthropist of our generation.

Marathon, “clean mining” and Bitcoin censorhip

The signal in Marathon virtue-signaling

Last week the Marathon mining pool generated plenty of controversy by carrying through on a promise to mine “clean” blocks. First the adjective is misleading. Given renewed focus on the energy consumption associated with Bitcoin, it would be natural to assume some environmental connotation, specifically using renewable sources to supply the massive amount of electricity required for producing that block. Instead for Marathon the measure of block hygiene turns out to involve an altogether different yardstick: compliance with OFAC sanctioned addresses list. The Office of Foreign Assets Control is part of the US Treasury Department. It is responsible for maintaining lists of foreign individuals that US companies are barred from doing business with due to national security and foreign policy concerns. In other words, OFAC is the reason that El Chapo or GRU officers can not sign up for a credit card, open a savings account or apply for a mortgage with a regulated US bank.

OFAC has long taken an interest in cryptocurrencies. It has even sanctioned blockchain addresses on Bitcoin, Ethereum, even privacy-friendly coins such as Monero and ZCash. Regulated US-based cryptocurrency companies already take these lists into account. For example they can block outbound transfers by their own customers (classic scenario is stopping payments to ransomware operators) or freeze incoming transfers from black-balled addresses to prevent those from being laundered by trading into another asset. In that sense, there is nothing new about some blockchain addresses becoming “radioactive” in the eyes of financial institutions. Where Marathon has crossed into uncharted territory is applying these rules to mining new blocks, in a way that affects all participants globally. In the process, it opens a can of worms about the concentration of mining power and whether governments can exert influence on an allegedly decentralized system by squeezing a handful of key participants.

Meaningless gestures

First let’s start with MARA pool. With an estimated share of less than 8% of total hashrate, this clean mining campaign is unlikely to make a dent. (Incidentally that estimate comes from a January Marathon press-release, as an upper-bound on the hashrate achievable if 100% of its capacity were directed at bitcoin.) When Marathon wins the race to mint the next block— which happens 8% of the time on average— it may exclude a pending transaction that would otherwise have been eligible according to standard rules. But other miners remain unencumbered by that self-imposed rule and are happy to include the same TX next time around when they win the race for the minting a block. Crucial point is that once any miner includes the transaction, Marathon is stuck mining on top of that block. Quite ironically Marathon is helping the sanctioned actor now, adding more confirmations to the verboten transaction and helping push it deeper into blockchain history. The net effect from the sender perspective is a slight delay. On average, sanctioned address will find their transactions taking slightly longer to confirm than other addresses. At 8% of share, the difference is imperceptible. Even at 20% it would barely register. Recall that only the initial appearance on block can be delayed by Marathon “clean mining” policy. Once included in any block by any other miner, additional confirmations will arrive at the regular rate. Considering that most counterparts require 3-6 confirmation, the additional delay on the first block is negligible. This is a scenario where the intransigent minority can not enforce its rules on the majority.

Corollary: Marathon “clean mining” is pure virtue signaling.

A conspiracy of pools?

What if additional pools jump on the clean mining bandwagon? Imagine a world where miners are divided into two camps. Unconstrained miners simply follow consensus rules and profit maximization when selecting which transactions to include in a block. Compliant miners on the other hand observe additional restrictions from OFAC or other regulatory regimes that result in the exclusion of certain transactions. At first this endeavor looks a doomed enterprise regardless of the hash-rate commanded by the compliant miners. As long as any miner anywhere on earth is willing to include a transaction— including the proverbial lone hobbyist in his/her basement— it will eventually get confirmed. At best they can slow down the initial appearance. That is still problematic, in that it breaks the rule of fungibility: one bitcoin is no longer identical to another bitcoin. Some addresses are discriminated against when it comes to transaction speed. Still this looks like a minor nuisance, considering that funds will eventually move. Worst case scenario, a sufficiently motivated actor can temporarily rent hash-power to mine their own censored transactions. That suggests any attempt at extending sanctioned address lists to mining is tilting at wind-mills unless it can achieve 100% coverage.

In fact total censorship can be achieved without control over all miners. Once the share of compliant miners approaches the 1/2 mark, the game dynamics shift. In effect compliant miners can execute a 51% attack against the minority to permanently exclude transactions. As before consider the scenario where a miner who is unaware of or deliberately running afoul of OFAC rules mines a block including a transaction from a sanctioned address. Compliant miners now have an option that is not available to Marathon: ignore that block and continue mining on a private fork without the undesirable TX. Unconstrained miners may have a head-start after having found the block first. But the majority will eventually catch-up and become the longest chain, resulting in a chain reorganization that erases all traces of the censored transaction from bitcoin history.

Game-theory of righteous forks

Would regulated miners invoke that nuclear option? Initiating a fork to undo some disfavored transaction is an expensive proposition— even for the side guaranteed to win that fork battle. It is disruptive for the ecosystem too: consider that on both chains, block arrival times will slow down. Instead of 100% of hash-rate being applied to mining one chain, it is split between two forks, but with each fork having the difficulty level of the original chain. (Since these forks are expected to be resolved after a handful of blocks, difficulty adjustment will not arrive in time to compensate for the reduced hash-rate.) On the other hand, compliant miners may have no choice. Their actions are not driven by profit maximization alone; otherwise they would not be excluding perfectly valid transactions in the first place.

In terms of financial incentives, there is a silver-lining to being on the winning side of the fork: block rewards previously claimed by unconstrained miners are up for grabs again, as alternative version of those blocks are produced. Since block chain history is being revised, the coin-base rewards can be reclaimed by compliant miners who supplied an alternative version of blockchain history on the winning side. When history is rewritten by the victors, coin-base rewards belong to those responsible for the revisionism. This incentive alone could compensate the regulated majority for going through the trouble of having to initiate 51% attacks in the name of keeping the blockchain clean. On the other hand there will be unpredictable second-order effects, since miner behavior is visible to all observers. Heavy-handed chain reorganizations and censorship may result in a loss of confidence in the network and corresponding depreciation of Bitcoin, hurting the miner bottom line again.

Unconstrained miners in the minority have even more to lose: not only will they waste resources mining on a doomed chain until the reorg, but they will also lose previously earned rewards for blocks that were replaced in the fork. A rational miner will seek to avoid that outcome. Going along with the majority is the path of least resistance, even if the miner has no ideological affiliation for or against the regulatory scheme in question. As long as the majority is committed to forking the block-chain at all costs in order to avoid running afoul of applicable regulations— the metaphorical gun to the head— game-theory predicts the minority will fall in line. There may be a handful of 51% attacks waged initially if unconstrained miners seek to test the resolve of the regulated ones. Ending up on the wrong side of such a fork will have the effect of quickly reseting the expectations of miners in the minority.

The tipping point may well arrive before 50%. Strategies such as selfish-mining allow smaller concentrations of hash power to attempt to hijack the chain. Compliant miners may even be compelled by regulation into temporarily withdrawing their hash power or even attempting a Hail Mary reorg for a limited number of blocks, before giving up and resuming work on the longest chain even if contains a tainted transaction. While the minority is likely to lose most of these uphill battles, even a small chance of victory and associated redistribution of coinbase rewards could motivate unconstrained miners to play it safe.

Choosing the regulators

What does this portend for the future of cryptocurrency regulation? Marathon may have voluntarily indulged in this bit of meaningless virtue-signaling act, but it is a safe bet that regulators elsewhere are taking note. The concern is not about OFAC or the selection criteria used by the US Treasury for its sanctions. There is a very good argument to be made that ransomware operators, Russian election-meddling groups, ISIS terrorists or North Korean dictators should be cut off from every financial system, including those based on blockchains. Companies operating in that ecosystem in any capacity— exchange, custodian or miner— have a part to play in implementing those policies.

The problem is US regulators are not the only ones drawing up lists of personae non gratae and declaring that transactions by those actors must forever be consigned to the memory pool. In fact the concentration of mining power in China— dramatically illustrated by the drop in hash-rate during a recent power outage— hints at a darker possibility: CCP could point to the Marathon example to strong-arm mining pools into blacklisting addresses belonging to political dissidents, human-rights organizations, Uighur communities and Tibetan activists. Marathon was not under the gun and acted voluntarily. Mining pools in China may have a very literal gun pointed at their head while instructed to comply with censorship rules.

On CAPTCHAs and accessibility (part II)

[continued from part I]

Accessibility as a value system

Accessibility has always been part of the design conversation during product development on every MSFT team this blogger worked on. One could cynically attribute this to commercial incentives originating from US government requirement for software compliance with Americans with Disabilities Act. Federal sales are a massive source of Windows revenue and failing a core requirement that would keep the operating system out of that lucrative market is unthinkable. But the commitment to accessibility extended beyond the operating system division. Online services under the MSN umbrella arguably had even greater focus on inclusiveness and making sure all of the web properties would be usable for customers with disabilities. As with all other aspects of software engineering, individual bugs and oversights could happen, but you could count on every team having a program manager with accessibility in their portfolio, responsible for championing these considerations during development.

Luckily it was not particularly difficult to get accessibility right either, at least when designing websites. By the early 2000s, standardization efforts around core web technologies had already laid the foundations with features specifically designed for accessibility. For example, HTML images have an alternative text or alt-text attribute describing that image in words. In situations when users can not see images, a screen-reader software working in conjunction with the web browser could instead speak those words aloud. World Web Consortium had already published guidelines with hints like this— include meaningful alternative text with every image— to educate web developers. MSFT itself had additional internal guidelines for accessibility. For teams operating in the brave new world of “online services” (as distinct from the soon-to-be-antiquated rich-client or shrink-wrap models of delivering software for local installation) accessibility was essentially a solved problem, much like the problem of internationalization or translating software into multiple languages which used to beguile many a software project until ground rules were worked out. As long as you followed certain guidelines— obvious one being to not hard-code English language text intended for users in your code— your software can be easily translated for virtually any market without changing the code. In the same spirit, as long as you followed specific guidelines around designing a website, browsers and screen readers will take care of the rest and make your service accessible to all customers. Unless that is, you went out of your way to introduce a feature that is inaccessible by design— such as visual CAPTCHAs.

Take #2: audio CAPTCHAs

To the extent CAPTCHAs are difficult enough to stop “offensive” software working on behalf of spammers, they also frustrate “honest” software that exists to assist users with disabilities navigate the user interface. Strict interpretation of W3C guidelines dictates that every CAPTCHA image is accompanied by an alternative text along the lines of “this picture contains the distorted sequence of letters X3JRQA.” Of course if we actually did that, spammers could cheat the puzzle, using automated software to learn the solution from the same hint.

The natural fallback was an audio CAPTCHA: instead of recognizing letters in a deliberately distorted image, users would be asked to recognize letters spoken out in a voice recording with deliberate noise added. Once again the trick is knowing exactly how to distort that soundtrack such that humans have an easy time while off-the-shelf voice recognition software stumbles. Once again, Microsoft Research to the rescue. Our colleagues knew that simply adding white-noise (aka Gaussian noise) would not do the trick. Voice recognition had become very good at tuning that out. Instead the difficulty of the audio CAPTCHA would rely on background “babble”— normal conversation sounds layered on top of the soundtrack at slightly lower volume. The perceptual challenge here is similar to carrying on a conversation in a loud space, focusing on the speaker in front of us while tuning out the cacophony of all the other voices echoing around the room.

As with visual CAPTCHAs, there were various knobs for adjusting the difficulty level of the puzzles. Chastened by the weak security configuration on the original rollout, this time more conservative choices were made. We recognized we were dealing with an example of the weakest-link effect: while honest users with accessibility needs are constrained to use the audio CAPTCHA, spammers have their choice of attacking either one. If either option is significantly easier to break, that is the one they are going to target. If it turns out that voice-recognition software could break the audio, it would not matter how good the original CAPTCHA was. All of the previous work optimizing visual CATPCHAs would be undermined as rational spammers shift over to breaking the audio to continue registering bogus accounts.

Fast forward to when the feature rolled out, that dreaded scenario did not come to pass. There was no spike in registrations coming through with audio puzzles. The initial version simply recreated the image puzzle in sound, but later iterations used distinct puzzles. This is important for determining in each case whether someone solved the image or audio version. But even when using the same puzzle, you would expect attackers requesting a large number of audio puzzles if they had an automated break, along with other signals such as a large number of “near misses” where the submitted solution is almost correct except for a letter or two. There was no such peak in the data. Collective sigh of relief all around.

Calibrating the difficulty

Except it turned out the design missed in the opposite direction this time. It is unclear if spammers even bothered attacking the audio CAPTCHA, much less whether they eventually gave up in frustration and violently chucked their copy of Dragon Naturally Speaking voice-recognition software across the room. There is little visibility into how the crooks operate. But one thing became clear over time: our audio CAPTCHA was also too difficult for honest users trying to sign up for accounts.

It’s not that anyone made a conscious decision to ship unsolvable puzzle. On the contrary, deliberate steps were taken to control difficulty. Sound-alike consonants such as “B” and “P” were excluded, since they were considered too difficult to distinguish. This is similar to the visual CAPTCHA avoiding symbols that look identical such as the digit “1” and letter “I,” or the letters “O” and “Q” which are particularly likely to morph into each other as random segments are being added around letters. The problem is all of these intuitions around what qualifies as “right” difficult level were never validated against actual users.

Widespread suspicion existed within the team that we were overdoing it on the difficulty scale. To anyone actually listening to sample audio clips, the letters were incomprehensible. Those of us raising that objection were met with a bit of folk-psychology wisdom: while the puzzles may sound incomprehensible to our untrained ears, users with visual disabilities are likely to have far more heightened sense of hearing. They would be just fine, this theory went: our subjective evaluation of difficulty is not an accurate gauge because we are not the target audience. That collective delusion may have persisted, were it not for a proper usability study conducted with real users.

Reality check

The wake-up moment occurred in the usability labs on MSFT Redmond-West (“Red-West”) campus. Our usability engineer helped recruited volunteers with specific accessibility requirements involving screen readers. These men and women were sat down in front of a computer to work through a scripted task as members of the Passport team stood helpless, observing behind one-way glass. To control for other accessibility issues that may exist in registration flows, the tasks focused on solving audio CAPTCHAs, stripping away every other extraneous action from the study. Volunteers were simply given dozens of audio CAPTCHA samples calibrated for different settings, some easier and some harder than what we had deployed in production.

After two days, the verdict was in: our audio CAPTCHAs were far more difficult than we realized. Even more instructive were the post-study debriefings. One user said he would likely have asked for help from a relative to complete registering for an account— the worst way to fail customers is making them feel they need help from other people in order to go about their business. Another volunteer wondered aloud if the person designing these audio CAPTCHAs was influenced by John Cage and other avant-garde composers. The folk-psychology theory was bunk: users with visual disabilities were just as frustrated trying to make sense of these these mangled audio as everyone else.

To be clear: this failure rests 100% with the Passport team— not our colleagues in MSFT Research who provided the basic building blocks. If anything, it was an exemplary case of “technology transfer” from research to product: MSR teams carried out innovative work pushing the envelope, problem, handed over working proof-of-concept code and educated the product team on choice of settings. It was our call setting the difficulty level high and our cavalier attitude towards usability that green-lighted a critical feature absent any empirical evidence of its accuracy, all the while patting ourselves on the back that accessibility requirements were satisfied. Mission accomplished Passport team!

In software engineering we rarely come face-to-face with our errors. Our customers are distant abstractions, caricaturized into helpful stereotypes by marketing: “Abby” is the home-user who prioritizes online safety, “Todd” owns a small-business and appreciates time-saving features while “Fred” the IT administrator is always looking to reduce technology costs. Yet we never get to hear directly from Abby, Fred or Todd on how well our work product actually helps them achieve those objectives. Success can be celebrated in metrics trending up— new registrations, logins per day and less commonly trending down— fewer password resets, less outbound spam originating from Hotmail. Failures are abstract, if not entirely out of sight. Usability studies are the one exception when rank-and-file engineers have an opportunity to meet these mythical “users” in the flesh and recognize beyond doubt when our work products have failed our customers.

On CAPTCHAs and accessibility (part I)

[This is an expanded version of what started out as a Twitter thread]

Fighting spam and failing our customers

When Twitter announced its tweet-by-voice feature, they were probably not expecting the backlash from users with disabilities pointing out that the functionality would be unusable in its present state. It is not the first time technology companies forget about accessibility in the rush to ship features out the door. This is one such story from this blogger’s time at MSFT.

Outbound spam problem

In the early 2000s, MSFT Passport was the identity service for all customer-facing online services the company provided: Hotmail, MSN properties, Xbox Live and even developer-facing services including MSDN. (Later renamed Windows Live and now known simply as MSFT Accounts, not to be confused with a completely unrelated Windows authentication feature named “Passport” that retired in 2016.) This put the Passport team— my team— squarely in the midst of a raging battle against outbound spam. Anyone hosting email has to contend with inbound spam and keeping that constant stream out of their customers inbox. But service providers who give away free email accounts also have to worry about the opposite problem: crooks registering for thousands of such free accounts and enlisting the massive resources available to a large-scale provider like Hotmail to push out their fraudulent messages. Since Passport handled all aspects of identity including account registration, it became the first bulwark against keeping spammers out.

While most problems in economics have to do with pricing, the problem of spam originates with complete absence of cost. If customers had to pay for every piece of email they sent or even charged a monthly subscription fee for the privilege of having a Hotmail account, no spammer would find it profitable to use Hotmail accounts for their campaign. But the Orginal Sin of the web is an unshakeable conviction that every service must be “free,” at least on the surface. To the extent that companies are to make money, this doctrine goes, services shall be monetized indirectly— subsidized by some other profitable line of business such as hardware coupled to the service or increasingly, data-mining customer information for increasingly targeted and intrusive advertising, which begat our present form of Surveillance Capitalism. To the extent charges could be levied, they had to be indirect.

Enter CAPTCHAs. This unwieldy acronym stands for “Completely Automated Public Turing-test to tell Computers and Humans Apart.” In the early 2000s the terminology had not been standardized. At MSFT we used the simpler acronym HIP for Human Interaction Proof. The basic idea is having a puzzle that is easy for humans but difficult for computers to solve. The most common example is recognizing distorted letters inside an image. Solving that puzzle becomes the new toll booth for access to an otherwise “free” service. So there is a price introduced but it is charged in the currency of human cognitive workload. Of course spammers are human too: they can sit down and solve these puzzles all day long— or pay other people in developing countries to do so, as researchers eventually discovered happening in the wild. But they can not scale it the same way any longer: before they could register accounts about as fast as their script could post data to Passport servers. Now each one of those registration attempts must be accompanied by proof that someone somewhere devoted a few seconds worth of attention to solve a puzzle.

Designing CAPTCHAs

So what does an ideal CAPTCHA look like? Recall that the ideal puzzle is easy for humans but difficult for computers. This is a moving target: while our cognitive capacity changes very slowly over generations of evolution, the field of artificial intelligence moves much faster to close the gap.

Screen Shot 2020-06-24 at 10.17.29 AM — A visualization of CAPTCHA design space. Advances in AI continue to push the boundary between problems that are solvable by computers and those that are only solvable by humans— so far. Ideal CAPTCHAs are just beyond the reach of AI, while still easy for most people to solve.

Philosophically, there is no small measure of irony in computers scientists devising such puzzles. The very idea of a CAPTCHA contradicts common interpretations of the Church-Turing thesis, one of the founding principles of computer science. Named after Alan Turing and his advisor Alanzo Church, the thesis states that the notion of a Turing machine— an idealized theoretical model of computers we can construct today, but with infinite memory— captures the notion of computability. According to this thesis, any computational problem that is amenable to “solution” by mechanical procedures can be solved by a Turing machine. Its original formulated was squarely in the realm of mathematics but it did not take long for inevitable connection to physics and philosophy of mind to emerge. One interpretation holds that since the human mind can solve certain complex problems—recognizing faces or understanding language— it ought to be possible to implement the same steps for solving that problem on a computer. In this view popular among AI researchers, there can not be a computational problem that is magically solvable by humans and forever out of reach of algorithms. That makes computer science research on CAPTCHAs somewhat akin to engineers designing perpetual motion machines.

Of course few researchers actually believe such problems fundamentally exist. Instead we are simply exploiting a temporary gap between human and AI capabilities. It is a given that AI will continue to improve and encroach into that space of problems temporarily labelled “only solvable by humans.” In other words, we are operating on borrowed time. It is not surprising that many CAPTCHA designs originated with AI researchers: defense and offense feed each other. Less known is that Microsoft Research was at the forefront of this field in the early 2000s. In addition to designing the Passport CAPTCHA, MSR groups broke CAPTCHAs deployed by Ticketmaster, Yahoo and Google, publishing a paper on the subject. (This blogger reached out to affected companies ahead of time with vulnerability notification and make sure there were no objections to publication.) For CAPTCHAs based on recognizing letters, we knew that simple distortion or playing tricks with colors would not be effective. Neural networks are too good at recognizing stand-alone letters. Instead the key is preventing segmentation: make it difficult for OCR to break up the image into distinct letters, by introducing artificial strokes that connect the archipelago of letters into one uninterrupted web of pixels.

Gone in 60 seconds: Spambot cracks Live Hotmail CAPTCHA | Ars Technica — Example of original Passport CAPTCHA, from ArsTechnica

Some trial-and-error was necessary to find the right balance. The first version proved way too easy. In fact it turned out that around the same time Microsoft Office introduced an OCR capability for scanning images into a Word document and that feature alone could partially decode some of the CAPTCHAs. Facepalm moment: random feature in one MSFT product 0wns the security feature of another MSFT product. We can only hope this at least spurred the sales— or more likely pirating— of Office 2003 among enterprising spammers. After some tweaks to the difficulty parameters, image CAPTCHAs settled on a healthy middle-ground, stemming the tide of bogus accounts created by spammers without stopping honest customers from signing up.

There was one major problem remaining however: accessibility. Visual CAPTCHAs work fine for users who could see the images. What about customers with visual disabilities?

[continued]

Smart-cards vs USB tokens: esoteric form factors (part III)

[continued from part II]

A smart-card is a smart-card is a…

Once we take away the literal interpretation of “card” out of smart-card, we can see the same underlying IC appearing in a host of other form factors. Here are some examples, roughly in historical order of appearance.

Trusted Platform Module or TPM. Defined by the Trusted Computing Group in the early 2000s, TPM provides separate hardware intended to act as root of trust and provide security services such as key management to the primary operating system. Their first major application was Bitlocker full disk-encryption in the ill-fated Windows Vista operating system. In an entertaining example of concepts coming full-circle, Windows 7 introduced the notion of virtual smart-cards which leverage the TPM to simulate smart-cards for scenarios such as remote authentication.
Electronic passports, or Machine Readable Travel Documents (MRTD) as they are officially designated by the standardizing body ICAO. This is an unusual scenario where NFC is the only interface to the chip; there is no contact plate to interface with vanilla card readers.
Embedded secure elements on mobile devices. While it is possible to view these as “TPM for phones” the integration model tends to be very different. In particular TPMs are tightly integrated into the boot process for PC to provide measured boot functionality, while eSE historically has been limited to a handful of scenarios such as payments. Nokia and other manufacturers experimented with SEs in the late 2000s, before Google Wallet first introduced it to the US market at scale on Android devices. Apple Pay followed suit a few years later. SE is effectively a dual-interface card. The “contact” interface is permanently connected to the mobile operating system (eg Android or iOS) while the contactless interface is hooked to the NFC antenna, with one caveat: there is an NFC controller in the middle actively involved in the communication. That controller can alter communications, for example directing traffic either to the embedded SE or the host operating system depending on which “card” application is being accessed. By contrast a vanilla card usually has no controller standing between antenna and the chip.

One vulnerability, multiple shapes

An entertaining example of shared heritage of hardware involves the ROCA (Return Of Coppersmith Attack) vulnerability from 2017. Researchers discovered a vulnerability in the RSA key generation logic in Infineon chips. This was not a classic case of randomness failure: the hardware did not lack for entropy for a change. Instead it had a faulty logic that over-constrained the large prime numbers chosen to create the RSA modulus. It turns out that moduli generated as the product of such primes were vulnerable to ancient attack that allowed efficient factoring, breaking the security of such keys. This one bug affected a wide range of products all based on the same underlying platform:

Infineon TPMs
Smart-cards, most notably the Estonian eID national identity card deployment
Yubicrap hardware tokens

One bug, one secure hardware platform, multiple manifestations.

Beyond form-factor: trusted user interface

There are some USB tokens with additional functionality that at least theoretically improve security compared to using vanilla cards. Most of these involve the introduction of a trusted user interface, augmenting the token with input/output devices so it can communicate information with the user independently of the host. Consider a scenario where digitally signing a message authorized the in transfer of money to a specified bank account. In addition to securing the secret key material that generates those signatures, one would have to be very careful about the messages being signed. An attacker does not need to have access to the key bits to wreak havoc, although that is certainly sufficient. They can trick the authorized owner of the key to sign a message with altered contents, diverting funds into a different account without ever gaining direct access to the private key.

This problem is difficult to solve with standard smart-cards, since the message being signed is delivered by the host system the card is attached to. On-card applications assume the message is correct. With dual-interface cards, there are some kludgy solutions involving the use of two different hosts. For examples the message can be streamed over contact interface initially, but must be confirmed from again over contactless interface. This allows using a second machine to verify that the first one did not submit a bogus message. Note that dual-interface is necessary for this, since the card can not otherwise detect the difference between initial delivery and secondary approval.

A much better solution is to equip the “card” with UI that can display pertinent details about the message being signed along with a mechanism for users to accept or cancel the operation. Feitian LCD PKI token is an example of a USB token with exactly that capability. It features a simple LCD display and buttons. On-board logic parses and extracts crucial fields from messages submitted for signing, such as amount, currency and recipient information in the case of a money transfer. Instead of immediately signing the message, the token displays a summary on the display and waits for affirmative approval from the user via one of the buttons. (Those buttons are physically part of the token itself. Malicious code running on the host can not fake a button press.) Similar ideas have been adopted for cryptocurrency hardware wallets, with one important difference: most of those wallets are not built around a previously validated smart-card platform. The difference is apparent in the sheer number of trivial & embarrassing vulnerabilities in such products that have long been eliminated in more mature market segments such as EMV.

The challenge for all of these designs is that they are necessarily application specific, because the token must make sense of the message and display summarize it in a succinct way for the user to make an informed decision. For all the time standards groups have spent putting angle brackets (XML) or curly braces (JSON) around data, there is no universal standard format for encoding information, much less its semantics. This requires at least some parsing and presentation logic hard-coded in the token itself, in a way that can not be influenced by the host. Otherwise if a malicious host can influence presentation on display, it could also alter the appearance of an unauthorized transaction to appear benign, defeating the whole point of getting having a human in the loop to sanity check what gets signed. Doing this in a general and secure way outside narrow applications such as cryptocurrency is an open problem.

The righteous exploit: Facebook & the ethics of attacking your own customers

Vice magazine recently reported on Facebook using a 0-day exploit for the Tails Linux distribution against one of its own users. The target was under investigation by law enforcement, for a series of highly disturbing criminal acts targeting teenage girls on the platform. Previous efforts to unmask the true identity of the suspect had been unsuccessful because they were accessing Facebook using the Tor anonymizing network. For a change from the usual Facebook narrative involving a platform hopelessly run over by trolls, hate-speech and disinformation, this story has an uplifting conclusion: the exploit works. The suspect is arrested. Yet the story was met with mixed reactions, with many seeming to conflate the ethical issue— was it “right” for Facebook security team to have done this?— with more pragmatic questions around how exactly they went about the objective.

Answering the first question is easy. There is a special place in hell reserved for those who prey on the weak and if these allegations are true, this crook squarely belongs there. There is no reason to doubt that Facebook security team acted in good faith. They had access to ample internal information to judge the credibility of the allegations and conclude that this suspect posed such risk to other people that it warranted going beyond any reasonable definition of “assisting a criminal investigation,” into the uncharted territory of actively attacking their own customer with a 0-day exploit. Ultimately the question of guilt is a matter for courts to decide. Contrary to knee-jerk reactions on social media, Facebook did not act as judge-jury-and-executioner in this episode. The suspect may have been apprehended but due process stands. Mr. Hernandez was still entitled his day in court in front of a real judge with a real jury to argue his innocence, and is unlikely to be strapped into an electric chair by a real executioner anytime soon. (In fact the perp plead guilty earlier this year and currently awaits sentencing.)

More troubling questions about the episode emerge when looking closer at exactly how Facebook collaborated with the FBI in bringing the suspect to justice. These questions are likely to come up again in other contexts, without the benefit of a comparable villain to short-cut the ethical questions. That is, if they have not already come up in other criminal investigations waiting for an enterprising journalists to unearth the story.

Timing

Let’s start with the curious timing of publication: the events chronicled in the article take place from 2015 to 2016. Why did the story come out now? There was no new public disclosure, such as the publication of court documents that could reveal— to our collective surprise— how the FBI magically tracked down the true IP address of the criminal using the Tor network. (A question that incidentally has never been answered satisfactorily in the case of Russ Albricht né Dread Pirate Roberts takedown for Silk Road.) Outline of facts are attributed to current and former Facebook employees speaking as anonymous sources. Why now? A slightly conspiratorial view is that Facebook PR desperately wanted a positive story in the current moment, when the company is under fire for refusing to fact-check disinformation on its platform, a stance made even more difficult to defend after Twitter took an unprecedented step to starting labelling tweets from the President. Facebook may have assumed the story could be a happy distraction and score cheap brownie points: “We will condone rampant political disinformation on our platform, but look over here— we went out of our way to help bust this awful criminal.” It is not uncommon for companies to play journalists this way and intentionally leak the desired narrative at the right time. If that was the calculation, public reaction suggests they badly misjudged the reception.

Facebook & Tor: strange bed-fellows

A second issue that has been over-looked in most accounts of this incident is that for Facebook the challenge of deanonymizing miscreants was an entirely self-inflicted problem. The suspect in question used Tor to access Facebook, leaving no identifiable IP address for law enforcement to pursue. There is a wide-range of opinion on the merits of anonymous access and censorship resistance but there is no question that many companies have decided that more harm than good has originated from anonymizing proxies, whether the vanilla centralized VPN variety or Tor. Netflix has an ongoing arms race to block VPNs while VPN providers compete by advertising that their service grants access to Netflix. Cloudflare has drawn the ire of the privacy community by throwing up time-wasting CAPTCHAs in the way of any user trying to access websites fronted by their CDN. Yet Facebook has been going against the grain by not only allowing Tor access but going much further by making Facebook available as a hidden Tor service and even going so far as to obtain the first SSL certificate ever for a hidden service under “.onion” domain.

Such embrace of Tor is quite puzzling, coming from the poster-child of Surveillance Capitalism with a checkered history of commitment to privacy. Tor gives ordinary citizens the power to access information and services without disclosing private information about themselves, even in the presence of “curious” third-parties trying to scrape together profiles from every available signal. That model makes less sense for accessing a social network that requires identifying yourself and using your real name as the prerequisite to meaningful participation. The implied threat model makes no sense: worrying about hiding your IP address while revealing intimate information about your life on a social network that profits by surveilling its own customers is an incoherent view of privacy. A less charitable view is that Facebook chose to pander to the privacy community in an attempt to white-wash its less than impressive record after multiple miscues, including the Beacon debacle and 2011 FTC settlement.

There is a stronger case to be made around avoiding censorship: direct access to Facebook is frequently blocked by autocratic regimes. Tor is arguably the most reliable way to bypass such restrictions. Granted the assumption that expanding access to Facebook results in a better world all around is a laughably absurd idea today. Between Russian interference in the 2016 election, the Cambridge Analytic scandal, a large-scale data breach, discriminatory advertising, ongoing political disinformation and even ethnic violence being orchestrated via Facebook, one could argue the world just might be better off with fewer people accessing this particular platform. But it is easy to forgive Facebook for this bit of self-serving naïveté in 2014, a time when technology companies were still lionized, their negative externalities yet to manifest themselves.

How much of Facebook usage over Tor is legitimate and how much is criminal behavior— such as Hernandez case— disinformation, fraud and spam? As with most facts about Facebook, these data points are not known outside the company. (It is possible they are not even known inside Facebook. For all their unparalleled data-mining capabilities, technology companies have a knack for not posing questions that may have inconvenient answers— such as what percent of accounts are fake, what fraction of advertising clicks are engineered by bots and how much activity from your vaunted Tor hidden-service is malicious.) What is undisputed is that the crook repeatedly registered new accounts after being booted off the platform. Without having a way to identify the miscreant when he returned, Facebook was playing a game of whack-a-mole with these accounts. Services have many options between outright blocking anonymizing proxies and giving them unfettered access to the platform. For example, users could be subject to additional checks or their access to high-risk features— such as unsolicited messaging of other users or video sharing, both implicated in this incident— can be restricted until the account builds sufficient reputation over time or existing accounts vouch for it.

Crossing the Rubicon

Putting aside the question of whether Facebook could have prevented these actions ahead of time, we turn to the more fraught issue of response. Reading between the lines of the Vice article, victims referred the matter to law enforcement and the FBI initiated a criminal investigation. In these scenarios it is common for the company in question to be subpoenaed for “all relevant information” related to the suspect, in order to identify them in real life. This is where the use Tor frustrated the investigation. IP addresses are one of the most reliable pieces of forensic evidence that can be collected about actions occurring online. In most cases the IP address used by the person of interest directly leads to their residence or office. In other cases it may lead to a shared network such as a public library or coffee shop, in which case a little more sleuthing is necessary, perhaps looking at nearby video from surveillance cameras, license plate readers or any payments made at that establishment using credit cards. With Tor, the trail stops cold at the Tor exit node. If the user had instead used a commercial VPN service, there is a fighting chance the operator of the service can be subpoenaed for records. With a decentralized system such as Tor, there are too many possible nodes, distributed all over the world in different jurisdictions with no single party that could be held accountable. In fact, that is exactly the strength of Tor and why it is so valuable when used in defense of free-speech and privacy.

Facebook security team could have stopped there after handing over what little information they had to the FBI. Instead they decided to go further and actively work on unmasking the identity of the customer. This is a difficult stance. In the opinion of this blogger, it is ethically the correct one. The miscreant in question caused significant harm to young, vulnerable individuals. This harm would have continued as long as the perp was allowed to operate on the platform. Absent the appetite to walk-back on the seemingly inviolable commitment to making Facebook available over Tor, the company had no choice other than going on the offensive with an exploit.

Sourcing the exploit

Once the decision is made to pursue active attacks, the only question becomes how. There is a wide range of options. On the very low-tech side of the spectrum, Facebook employees could impersonate a victim in chat messages and try to social-engineer identifying information out of the suspect. There is no mention in the Vice article of such tactics being attempted. It is unlikely that a perp with meticulous attention to opsec would reveal identifying information in a moment of carelessness. What is implied by the article is that FBI immediately reached for high-tech solutions in their arsenal. The first exploit attempt failed, likely because the it was designed for a different platform— operating system and browser combination— than the esoteric setup this crook had involving the Tails Linux distribution.

Luckily there is no shortage of vulnerabilities to exploit in software. Take #2 witnessed Facebook contracting an “outside vendor” to develop a custom exploit chain for the specific platform used by the suspect. This is a questionable move, because going outside to source a brand-new exploit all but guarantees the independent availability of that exploit for others. Sure, Facebook can contractually demand “exclusivity” as a condition for commissioning the work, but let’s not kid ourselves. In the market for exploits, there is no honor among thieves. It is unclear if this outsourcing was a deliberate decision to distance the Facebook security team itself from exploit development or they simply lack the talent in house. (If this were Google, one expects Project Zero cranking out a reliable exploit in an afternoon’s work.)

Pulling the trigger

The next questionable step involved the actual delivery of the exploit, although Facebook may not have had any choice in the matter. According to Vice, Facebook handed over the exploit to the FBI for eventual delivery to the perp. It is as if a locksmith asked to open the door to one particular household to conduct a lawful search, simply hands over the master-key to the policemen and goes home. At this point Facebook has gone far beyond the original mission of unmasking of one noxious criminal: they handed over a 0-day exploit to the FBI, ready for use in any other situation the agency deems appropriate. (Senator Wyden is quoted in the Vice article questioning exactly this logic, asking whether the FBI later submitted the exploit the Vulnerabilities Equity Process.)

Legally the company may have had no other option. In an ideal world, Facebook holds on to the exploit— they paid for it, after all— and delivers it to the suspect directly, under tacit agreement of the FBI, with some form of immunity against prosecution for what would otherwise be a criminal act committed in the process: Facebook breaking into a machine it is not authorized to access. It is unlikely that option exists in the real world or even if it did, that the FBI would willingly pass on the opportunity to add a shiny new 0-day to its arsenal at no cost.

Disarming the exploit?

Given that Facebook had already sourced the exploit from a third-party, there is no guarantee the FBI would not have received a copy through alternate channels, even if Facebook managed to hold on to it internally. That brings up the most problematic part of this episode: vulnerability disclosure. According to Vice, Facebook security team decided against formal vulnerability notification to Tails was required because “the vulnerable code in question had already been removed in the next version.”

That is either a weak after-the-fact excuse for inaction or a stunning lapse of judgment. There is a material difference between a routine software update that inadvertently fixes a critical security vulnerability (or worse fixes it silently, deliberately trying to hide its existence) and one that is actually billed as a critical security update. In one case, users are put on notice that there is an urgency to applying the update in order to protect their systems.

Given that the exploit was already in the hands of the FBI and likely being resold by the original author, this was the only option available to Facebook to neutralize its downstream effects. Had they disclosed the issue to the Tails team after the crook was apprehended, it would have been a great example of responsible use of an exploit to achieve a limited objective with every ethical justification: delivering a criminal suspect into the hands of the justice system. Instead Facebook gave away a free exploit to the FBI knowing full well it can be used in completely unrelated investigations over which the company has no say. If it is used to bring down another Hernandez or comparable offender, society is better off and we can all cheer from the sidelines for another judicious use of an exploit. If the next target is an immigrant union organizer wanted for jay-walking or a Black Lives Matter activist singled out for surveillance based on her race, the same argument can not be made. From the moment this exploit was brought into existence until every last vulnerable Tails instance has been patched, Facebook security team bears some responsibility in the outcomes, good or bad.

It turns out trafficking in exploits is not that different from connecting the world’s population and giving everyone a platform to spread their ideas— without first stopping to ask whether they going to use that capability for charity or malice.

Smart-cards vs USB tokens: optimizing for logical access (part II)

[continued from part I]

The problem with cards

Card form-factor or “ID1” as it is officially designated, is great for converged access: using the same credential for physical and logical access. Cards can open real doors in physical world life and virtual doors online. But they have an ungainly requirement in smart-card readers to function properly. For physical access this is barely noticed: there is a badge-reader mounted somewhere near the door that everyone uses. Employees only need to remember to bring their card, not their own reader. For logical access, it is more problematic. While card-readers can be quite compact these days, it is still one more peripheral to carry around. Every person who needs to access an online resource gated by the card— connect to the company VPN, login to a protected website or decrypt a piece of encrypted email— also needs a reader. For the disappearing breed of fixed-base equipment such as desktop PCs and workstation, one could simply have a reader permanently glued to every desk.

Yet the modern work-force is highly mobile. Many companies only issue laptops to their employees or at least expect that their personnel can function just equally effectively when they are away from the office— good luck getting to that workstation in the office during an extended pandemic lockdown. While a handful of laptops can be configured with built-in readers, the vast majority are going to require a reader dangerously dangling from the side, ready to get snapped off, hanging on to a USB-to-USB-C adapter since most readers were not designed for the newer ports. There is also the hardware compatibility issue to worry about since most manufacturers used to target Windows and rely on automatic driver installation through plug & play. (This is largely a solved problem today. Most readers comply with the CCID standard which has solid support through the open-source libccid package on Linux & OSX. Even special snowflake hardware is accompanied by drivers for Linux.)

USB tokens

This is where USB tokens come in handy. Tokens combine the reader and “card” into a single device. That might seem more complex and fragile but it actually simplifies detection from the point of view of the operating system. A reader may or may not have a card present, so the operating system must handle insertion and removal events. USB tokens appear as a reader with a card permanently present, removing one variable from the equation. In addition to featuring the same type of secure IC from a card, these devices also have a USB controller sitting in front of that chip to mediate communication. Usually the controller does nothing more fancy than “protocol arbitrage:” taking messages delivered over USB from the host and relaying them to the secure IC using ISO7816 which is the common protocol supported by smart-cards. But sometimes controllers can augment functionality, by presenting new interfaces such as simulating a keyboard to inject one-time passcodes.

Some examples such as Safenet eToken used in SWIFT authentication define their own card application standard. More often vendors choose to follow an existing standard that enjoys widespread software support. The US government PIV standard is a natural choice, given its ubiquity and out-of-box support on Windows without requiring any additional driver installation. In late 2000s GoldKey became an early example of offering a PIV implementation in USB token format; they also went to the trouble of getting FIPS140 certification for their hardware. Taglio PIVKey followed shortly afterwards with with USB tokens based on Feitian and later NXP platforms. Eventually other vendors such as Yubico copied the same idea by adding PIV functionality to their existing line.

In principle most USB tokens have no intrinsic functionality or security advantages over the equivalent hardware in card form. They are just a repackaging of the exact same secure IC running the exact same application. The difficulty of extracting secrets from the token can not be appreciably more difficult than extracting secrets from the equivalent card. If anything, the introduction of an additional controller to handle USB communication can only make matters worse. That controller is privy to sensitive information flowing across the interface. For example when the user enters a PIN to unlock the card, that information is visible to the USB controller. Likewise if the token is used for encryption, all decrypted messages pass through the secondary controller. So the USB token arguably has a larger attack surface involving an additional chip to target. Unlike the smart-card IC which has been developed with security and tamper resistance objectives, this second chip is a sitting duck.

Usability matters

Yet from a deployment perspective, USB tokens greatly simplify rolling out strong two-factor authentication in an enterprise. By eliminating the ungainly card reader, they improve usability. Employees only need to carry around one piece of hardware that is easily transported and could even remain permanently attached to their laptop, albeit at the risk of slightly increasing certain risks from compromised devices. Examples:

In 2014 Airbnb began issuing GoldKey PIV tokens to employees for production SSH access. Airbnb fleet is almost exclusively based on Macbooks. While the office space included fixed infrastructure such as monitors and docking stations at every desk, none of that would have been available for employees working from home or traveling— not uncommon for a company in the hospitality business.
Regulated cryptocurrency exchange & custodian Gemini issues PIV tokens to employees for access to restricted systems used for administering the exchange

The common denominator for both of these scenarios is that logical access takes precedence over physical access. Recall that CAC & PIV programs came to life under auspices of the US defense department. Defense use-cases are traditionally driven by a territorial mindset of controlling physical space and limiting infiltration of that area by flesh-and-blood adversaries. That makes sense when valuable assets are tangible objects, such as weapons, ammunition and power plants. Technology and finance companies have a different threat model: their most valuable assets are not material in nature. It is not the office building or literal bars of gold sitting in a vault that need to be defended; even companies directly trading in commodities such as gold and oil frequently outsource actual custody of raw material to third-parties. Instead the nightmare scenarios revolve around remote attackers getting access to digital assets: accessing customer information, stealing intellectual property or manipulating payment systems to inflict direct monetary damages. When locking down logical access is the overarching goal, usability and deployment advantages of tokens outweigh their incompatibility with physical-access infrastructure.

[continued]

Updated: June 20th, with examples of hardware token models

Smart-cards vs USB tokens: when form factor matters (part I)

Early on in the development of cryptography, it became clear that off-the-shelf hardware intended for general purpose computing was not well suited to safely managing secrets. Smart-cards were the answer. A miniature computer with its own modest CPU, modest amount of RAM and persistent storage, packaged into the shape an ordinary identity card. Unlike an ordinary PCs, these devices were not meant to be generic computers, with end-user choice of applications. Instead they were preprogrammed for a specific security application, such as credit-card payments in the case of chip & PIN or identity verification in the case of the US government identification programs.

Smart-cards solved a vexing problem in securing sensitive information such as credentials and secret keys: bits are inherently copyable. Smart-cards created an “oracle” abstraction for using secrets while making it difficult for attackers to extricate that secret out of the card. The point-of-sale terminal at the checkout counter can ask this oracle to authorize a payment or the badge-reader mounted next to a door could ask for proof that it was in possession of a specific credential. But they can not ask the card to cough up the underlying secret used in those protocols. Not even the legitimate owner of the card can do that, at least not by design. Tamper-resistance features are built into the hardware design to frustrate attempts to extract any information from the card beyond what the intended interface exposes.

“Insert or swipe card”

The original card form-factor proved convenient in many scenarios when credentials were already carried on passive media in the shape of cards. Consider the ubiquitous credit card. Digits of the card number, expiration date and identity of the cardholder are embossed on the card in raised lettering for the ultimate low-tech payment solution, where a mechanical reader presses the card imprint through carbon paper on a receipt. Magnetic stripes were added later for automated reading and slightly more discrete encoding of the same information. We can view these as two different “interfaces” to the card. Both involve retrieving information from a passive storage medium on the card. Both are easy targets for perpetrating fraud at scale. So when it came time for card networks to switch to a better cryptographic protocol to authorize payments, it made sense to keep the form factor constant while introducing a third interface to the card: the chip. By standardizing protocols for chip & PIN and incentivizing/strong-arming participants in the ecosystem to adopt newer mart-cards, payment industry opened up new market for secure IC manufacturers.

In fact EMV ended up introducing two additional interfaces, contact and contactless. As the name implies, first one used direct contact with a brass plate mounted on the card to communicate with the embedded chip. The latter used a wireless communication protocol, called NFC or Near Frequency Communication for shuttling the same bits back and forth. (The important point however is that in both cases those bits typically arrive at the same piece of hardware: while there exist some stacked designs where the card has two different chips operating independently, in most cases a single “dual-interface” IC handles traffic from both interfaces.)

Identity & access management

Identity badges followed a similar evolution. Federal employees in the US always had to wear and display their badges for access to controlled areas. When presidential directive HSPD-12 called for raising the bar on authentication in the early 2000s, the logical next step was upgrading the vanilla plastic cards to smart-cards that supported public-key authentication protocols. This is what the Common Access Card (CAC) and its later incarnation Personal Identity Validation (PIV) systems achieved.

With NFC, smart-cards can open doors at the swipe of a badge. But they can do a lot more when it comes to access control online. In fact the most common use of smart-cards prior to CAC/PIV programs were in the enterprise space. Companies operating in high-security environments issues smart-cards systems to employees for accessing information resources. Cards could be used to login to a PC, access websites through a browser and even encrypt email messages. In many ways the US government was behind the curve in adoption of smart-cards for logical access. Several European countries have already deployed large-scale electronic ID or eID systems for all citizens and offer government services online accessed using those cards.

Can you hear me now?

Another early application of smart-card technology for authentication appeared in wireless communication, with the ubiquitous SIM card. These cards hold the cryptographic secrets for authenticating the “handset”— in other words, a cell phone— to the wireless carrier providing service. Early SIM cards had the same dimensions as identity cards, called ID1. As cell-phones miniaturized to the point where some flip-phones were smaller than the card itself, SIM cards followed suit. The second generation of “mini-SIM” looked very much like a smart-card with the plastic surrounding the brass contract-plate trimmed. In fact many SIM cards are still delivered this way, as a full size card with a SIM-sized section that can be removed. Over time standards were developed for even smaller form-factors designated “micro” and “nano” SIM, chipping away at the empty space surrounding the contact plate.

This underscores the point that most of the space taken up by the card is wasted; it is inert plastic. All of the functionality is concentrated in a tiny area where the contact plate and secure IC are located. The rest of the card can be etched, punched or otherwise cut with no effect on functionality. (Incidentally this statement is not true of contactless cards because the NFC antenna runs along the circumference of the card. This is in order to maximize the surface area covered by antenna in order to maximize NFC performance: recall that these cards have no battery or other internal source of power. When operating in contactless mode, the chip relies on induction to draw power from the electromagnetic field generated by the reader. Antenna size is a major limiting factor, which is why most NFC tags and cards have a loop antenna that closely follows the external contours of its physical shape.)

[continued]