Cloud backup and privacy: the problem with SpiderOak (part I)

Continuing the theme from an earlier post– that economic incentives for cloud computing favor service providers to have access to user data, instead of serving as repository of opaque bits– here we look at a service that attempts to swim against the current.

In the wake of FUD created around cloud computing due to PRISM allegations, SpiderOak has come to the forefront as exemplary service that optimizes for user privacy. SpiderOak provides remote backup and file access service, allowing users to save copies of their data in the cloud and access it from any of their devices. This is a crowded space with many competitors, ranging from startups specializing in that one field (DropBox, Mozy) to established companies (SkyDrive from MSFT, Google Drive from Google) offering cloud storage as one piece of  their product portfolio. Wikipedia has a comparison of online backup services, with helpful table that can be sorted on each attribute.

From a privacy perspective the interesting column is the one labeled “personal encryption.” The reason for this non-descriptive label is probably owing to the successful campaign of disinformation cloud service providers have embarked on to reassure users. Every service provider throws around phrases like “military grade encryption” and “256-bit AES” without any consideration to the overall threat model around what exactly that fancy cryptography is designed to protect against. Stripping away this usage of encryption as magic pixie dust, there are three distinct scenarios where it can be effective:

  1. Protecting data in transit. This assumes a bad guy eavesdropping on the network, trying to snoop on private information as it is being backed up to the cloud or, going in the opposite direction, as it is being downloaded from the cloud. This a well-understood problem, with established solutions. A standard communication protocol such as TLS can be used to set up an encrypted channel from the user to the service provider.
  2. Protecting data at rest, from unauthorized access. This is a slightly more nebulous threat model. Perhaps the provider backs up their own data on tape archives offsite, or sends off defective drives for repair– situations where media containing user data could be stolen. In this case bad guys– who are not affiliated with the cloud provider– attain physical possession of the storage. Proper encryption can still prevent them from recovering any useful information from that media.
  3. Protecting data from the service provider itself. This is the most demanding threat model. One envisions the provider itself going rogue, as opposed to #2 where they are only assumed to be incompetent/accident-prone. Standard scenario is the disgruntled employee with full access to the service, who decides to violate company policy and dig through private information belonging to customers. Slightly different but very contemporary issue is that of law enforcement access.

These properties are not entirely orthogonal. If data is not protected in transit, clearly it can not be protected from the service provider either– it is exposed during the upload time at a minimum. Likewise #3 implies #2: if the service provider can not decrypt the data even with full malicious intent, there is no act of negligence they can commit to enable third-parties to decrypt it either. The converse does not hold. Consider encrypted backups. If done correctly, the low-skilled burglars walking off with  backup tapes can not recover any user data. But since the provider can decrypt that data using keys held in its own system, so can others with access to the full capabilities of the company. That means not only the disgruntled employee looking for retribution, but also law enforcement showing up with appropriate papers not to mention APT from halfway around the world who 0wned the service provider.

It is easy to verify that #1 has been implemented, since that part can be observed by any user. Granted there are many ways to get TLS wrong, some more subtle than others. But that pales in comparison to the difficulty of independently verifying the internal processes used by the provider. Are  they indeed encrypting data at rest as claimed? Is there an unencrypted copy left somewhere accidentally? This is why designs that provide stringent guarantees about #3 are very appealing. If user data can not be recovered by the provider, it matters much less what goes on behind the closed doors of their data center.

[continued]

CP