Crafting unusable public-keys [part I]


A motivating example

In modern applications it is very common for access rights to be expressed in terms of control over a cryptographic key. All blockchains operate on this model: control of funds comes down to control over a private key, usually ECDSA. Someone who can create valid digital signature using that key is assumed to be owner of those assets.

There are unusual edge-cases when a system requires such a key to be defined as part of its configuration and yet it is important to prove to third-parties that one does not have control over that key. One example is creating firmware images for secure boot. The UEFI secure boot specification defines a number of keys such as platform key (PK) and key-exchange-key (KEK) that need to be set before a system can transition into secure boot mode. Interestingly PK and KEK are not used directly validate what operating systems are allowed to boot: that is controlled by a different EFI variable, labeled database or “db” for short, which contains a list of keys or hashes. The trust chain looks like this:

PK → KEK → DB

Changes to the DB must be signed by one of the KEK entries, and changes to the KEK in turn must be signed by the PK.

Now suppose a privacy-conscious PC manufacturer wants to ship an image with DB configured in a specific way. For example, include the Microsoft third-party signing key in DB to permit Ubuntu, Debian and other operating systems with a signed shim to be installed. At the same time, this manufacturer wants to prove they do not control PK or KEK. (Since either PK and KEK allow modifying the DB, it would amount to a manufacturer backdoor into every unit shipped.) This requires creating PK and KEK entries of a particular type:

  • They must look like valid keys as far as the EFI specification is concerned; otherwise the firmware could reject them, error out or ignore secure boot configuration.
  • OEM must be able to convince customers that they do not control the corresponding private key. More specifically that they can not create signed updates to alter the KEK using PK, or alter the DB using KEK. (As we will see, “not controlling the private key” is not entirely equivalent to this formulation.)

Threat model

Proving that one knows the private key corresponding to a given public key is easy: do something that requires knowledge of the private half and which can be verified with the public key, such as digitally signing a message or decrypting a random ciphertext. Proving the negative— that one does not know it— is more tricky. Intuitively, public and private-key halves of the key are generated together, typically with the public key derived from the private key. Naively that would imply for a given public key, somebody somewhere must have known the private key.

To be clear, we want to rule out some evasions that fail to fully address the trust problem:

  • Claiming that one deleted the private-half after key generation and only retained the public piece. While such “pinky-promise” approaches may be valuable for compliance purposes and appeasing auditors, a mathematical guarantee is much preferable.
  • Claiming someone else unaffiliated generated the key. For example one could pick the public-key from Google’s TLS certificate, asserting he/she does not have the corresponding private key. While that would be a plausible statement— unless the speaker is affiliated with Google— it does not side-step the trust assumption: someone still has the private key and retains a backdoor into any system where that public key is installed. It merely deflects the problem to another trusted third-party (Google) along with a dubious theory about that entity being more competent or trustworthy at key management.
  • Claiming ownership of the key is distributed among a group of trusted third-parties where some minimum quorum must collude for any successful use. Again this does not lift the assumption of good faith, only diffuses it from a single trusted third-party to a coalition of thereof. This rules out approaches such as distributed/join key generation.

The ideal solution is one where one can offer a public key and demonstrate that no person or group ever had access to the private key.

Easy case: ECDSA

If one is willing to depart from standard key generation procedures, it is not too difficult to achieve that goal for some public-key cryptosystems. For example in the case of ECDSA, we have:

  • The private key is a plain integer (called “scalar” in EC terminology) between 1 and the order of the elliptic curve
  • The public key is a point on the curve

For a properly generated key, these two have a relationship: the public key is equal to the product of the secret scalar with a fixed point on the curve, called the “generator” because every point on the curve can be reached by taking successive multiples of that point. It follows that traditional key generation works by:

  1. Randomly selecting a private scalar from the appropriate range
  2. Multiplying that with the generation point to get the public point

What if we invert this sequence? Choose a curve point in a semi-deterministic way (more on this below) without knowing the scalar. The logical next step would be working backwards towards the private key, by calculating the scalar which would yield that point when multiplied by the generator. This is an instance of the “discrete logarithm” problem, akin to an analog of division over finite groups. But that runs up against a practical yet insurmountable problem: solving discrete logs over the type of large curves used in cryptography is computationally out of reach of present systems. In fact the security of every public-key system depends on such an assumption: deriving the private-key from the public-key must be “infeasible” in the sense that requires an amount of computation far beyond what is possible given present day understanding of the mathematics and state-of-the-art technology.1

So if we can prove that the public-key was selected according to a verifiable procedure, it provides high confidence that the corresponding private key is not known to anyone. There are several ways to do this in a nothing-up-my-sleeve manner starting with an English language statement or the digits of pi as the seed.

  • Use a cryptographic hash function such SHA256 to map the seed into a curve point. There are some nuances here. Curve points are ordered pairs <x, y> in a particular range satisfying the curve equation. While the hash output can be used to select x, there is no guarantee that a corresponding y exists. There may be no such point on the curve with that x-coordinate, or there will be exactly two requiring a choice between them. As such combining the hash operation with an incrementing counter may be necessary until a valid point is found.
  • Alternatively use a dedicated hash-to-curve scheme. These are designed to map inputs exactly into curve points of unknown discrete logarithms, without the awkward trial-and-error approach of using a plain hash.

It is important that the choice of seed be free of arbitrary decisions as much as possible. For example a seed containing a very specific sequence of digits with no explanation is bound to raise suspicion. Perhaps the prover “cooked” the seed: they tried trillions of variations on those digits until hitting on a curve point where the discrete logarithm problem was easier to solve by sheer chance. (To be clear: there are presently no such known shortcuts or concept of “weak keys” on ECDSA where one out of a trillion public keys is much easier to break.) By comparison the starting digits of pi are effectively a fact about the universe that no prover can influence.

Difficult case: RSA

Returning to the motivating example for this blog post, it turns out that most UEFI implementations do not handle ECDSA keys very well. RSA being the least common denominator for a portable solution, we run into a problem: RSA is not easily amenable to the previous approach of deterministically selecting a public-key without worrying about the private half. In effect there is shared structure linking the halves:

  • Private key is (N, d) where N is the modulus, a product of two large primes
  • Public key is (N, e) with N same as before and e the inverse of d modulo ɸ(N) the order of the multiplicative group.

In practice key generation proceeds as:

  1. Find two primes p and q of appropriate size to generate the modulus N = p × q
  2. Public exponent e is often fixed by standards or convention, for example 1 + 216 is a popular choice.
  3. Private exponent d is then calculated from e, using the known factorization of N.

If one generates N according to this sequence, then at some point that person knew the individual factors p and q, and could have used that knowledge to calculate the corresponding private exponent d. Meanwhile generating N naively in a deterministic way by hashing an agreed upon seed does quite provide the same security guarantee.

Recall that the security of RSA rests on the difficulty of factoring the modulus. Proper RSA moduli are generated as a product of two roughly equal size primes, and have exactly those two factors. By comparison, a random number of the same size is extremely unlikely to have such a clean structure. Suppose we were were to pick a random 2048-bit number and declare it to be an RSA modulus. What are the odds that number can be completely factored? Emphasis on complete factorization because deriving the private exponent from the public one requires full factorization of the group.

Factoring in the factors

If we interpret “random” in a strict sense, we can say that half the time, that number will be divisible 2. About a third of the time, it will be divisible by 3. More generally for a given “small” prime w, the number will be divisible by that prime 1/w of the time. Whether or not factoring such a composite is feasible depends on, well, many “factors”— specifically the size of the second largest factor.

There is mixed news here. Good news in that statistically speaking large numbers are also likely to have larger factors. More precisely, there is a measure called “smoothness:” a number is called B-smooth if all of its prime factors are less than B. For any given B, the probability of a randomly selected number being B-smooth rapidly declines with the size of that number. So it is astronomically improbable that our random 2048-bit number factors cleanly into a bunch of 32-bit integers, requiring no fancy factorization algorithm beyond trying the first few billion primes.

But there are specialized factoring algorithms for chipping away at small factors, specifically the Elliptic Curve Method or ECM which can reliably get at factors in the ~200 bit range. So the relevant question becomes:

“Given a random N bitnumber, what is the probability that its second-large factor is W bits or less?”

for various choices of N and W. It turns out this probability is not at all negligible for reasonable choices. For example if we use the common 2048 bit RSA key size and 200 bits as the threshold for ECM factoring, Claude Opus 4.6 estimates 1 in 5 chances of complete factorization. Even with key-sizes cranked up to 4096 bits, there is still non-negligible probability over 1%. Those are not at comforting odds for a cryptographic application.

There is another practical problem with this approach even if one is willing to roll the dice on probabilities: it is unclear how one could convince everyone else that the chosen modulus can not be factored easily. At best one can say “we ran ECM for X hours and it failed to find any factors.” That is a not a very reassuring statement. It offers little confidence that the next person trying a little harder with more computing resources could not successfully discover one more factor.

[continued in part II]

CP

  1. Granted both of those can advance. While the mathematics behind discrete logarithm problem have been studied for a long time with only incremental progress in attacks, it is entirely possible that the second dimension— computing technology— will experience a drastic leap with the introduction of quantum computers. ↩︎

Leave a comment