In the crowded field of online backup services, SpiderOak is an example of a company trying to distinguish itself on privacy. Billing itself a “zero-knowledge privacy environment,” the company emphasizes what they can NOT do:
SpiderOak is, in fact, truly zero knowledge. The only thing we know for sure about your data is how many encrypted data blocks it uses […] On the servers, we only see sequentially numbered data blocks — not your foldernames, filenames, etc.
As expected, this also translates into a limitation around password reset:
How is this reconciled with our ability to do a password reset? The short answer is: It isn’t! We cannot reset your password. When you create a SpiderOak account, the setup process happens on your computer […] and there your password is used in combination with a strong key derivation function to create your outer layer encryption keys. Your password is never stored as part of the data sent to SpiderOak servers.
So far, so good. All user data is encrypted using keys derived from the password, before that information is backed up to the cloud. That password in turn is never communicated to the cloud provider. On the surface this appears to satisfy property #3 (and by implication #2) alluded to in the previous post: the service provider can not access user data even with full use of its own resources.
But there is a catch: values derived from the password are stored. The details are buried in the engineering matters section, under “User Authentication Details.” Ostensibly written to assure users that the protocol for verifying knowledge of the password is sound, it amounts to an admission that there is something stored by the service provider that can be used to distinguish correct versus incorrect password submissions. Specifically:
-
Two random salts, stored in the clear by necessity
- A serialized RSA public key, also stored as plaintext
-
A “challenge key” that is computed as output as a specific key-derivation function with the second salt, namely PBKDF2(password, first salt)
-
Full RSA key including the private-half, AES-encrypted using the output of PBKDF2(password, second salt) as the encryption key
That combination serves as a password hash. It can be brute-forced. Given the first random salt and challenge key, it is possible to check if a password guess such as “asdfgh” is correct by re-computing the same key-derivation process via PBKDF2 and comparing the result to the stored value. That means it is in fact possible to recover data by trying large number of possible passwords. While the effectiveness of such an attack depends on the user choice of password and computing power available to the attacker, the risk calculus is the same in all cases. Data recovery can be attempted by the service provider going rogue, a disgruntled employee acting independently or law-enforcement/intelligence agency who obtains access to the encrypted data from the provider. This is in fact corroborated by one of the privacy FAQs directly taking up the question of whether user data can be recovered with access to bits stored in the cloud:
Unless there are significant advances in mathematics […] password derivation techniques on the SpiderOak key structure are very difficult. The key derivation functions we use are strongly designed to withstand heavy brute force password techniques and pre-computation, such that even on a very modern computer, each password guess takes about one second. […] Of course, if you were to choose a password that is made entirely from words in a dictionary, fewer attempts may be needed to guess it.
That is the glass-half-full view. Key derivation is indeed using PBKDF2 with a reasonable number of iterations set to 16384. But already password cracking schemes have been built by hobbyists achieving billions of hashes per second, where the hash function is the underlying primitive operation. Bumping up the repetitions helps quantitatively, but does not address the root cause. As ArsTechnica found out much to their surprise, that random looking “qeadzcwrsfxv1331” may not be a great choice after all.
In case this seems like an inescapable consequence of how encryption works, consider a hypothetical alternative design. Suppose a user manages their own RSA encryption key, stored on their machine. This key is used to encrypt a randomly generated AES key, which is in turn used to encrypt bulk data uploaded to the cloud. In this model, there is no password to brute-force from any data uploaded to the cloud. Ciphertext available to the cloud provider is encrypted in a truly random 128-bit key, where all possible choices of the key are equally likely. (As an aside: that RSA private key may be locally encrypted with a user-chosen passphrase, which sounds like rearranging deck chairs. There is a critical difference: brute-forcing that key will require access to the user machine. There is nothing uploaded to the cloud that helps.) Of course this would mean the data is not accessible on other devices unless the private-key can be roamed there. That is why the ideal implementation would utilize smart cards instead of locally storing keys on disk. Still the possibility of excluding brute-force attacks can be demonstrated without resorting to any fancy gadgets.
There is a broader architectural flaw here. Designs in the spirit of SpiderOak are badly conflating two orthogonal problems:
- Encrypting user data with keys that are managed directly by the user and not available to any third-party
- Saving the resulting ciphertext after encryption to a third-party cloud provider
Many popular solutions already exist for the first problem, with different security properties, key management options and cross-platform availability: BitLocker, PGP disk encryption, truecrypt, loop-aes, FileVault, … There is little reason to introduce yet another arbitrary scheme with new risks– in this case, susceptibility to brute forcing by the cloud provider.
Following posts will look at experimental ways to “compose” existing local encryption schemes with cloud backup services transparently, without giving up any control over cryptography and key management.
CP