Convergent encryption explained

Convergent encryption, also known as content hash keying, is a cryptosystem that produces identical ciphertext from identical plaintext files. This has applications in cloud computing to remove duplicate files from storage without the provider having access to the encryption keys.[1] The combination of deduplication and convergent encryption was described in a backup system patent filed by Stac Electronics in 1995.[2] This combination has been used by Farsite,[3] Permabit,[4] Freenet, MojoNation, GNUnet, flud, and the Tahoe Least-Authority File Store.[5]

The system gained additional visibility in 2011 when cloud storage provider Bitcasa announced they were using convergent encryption to enable de-duplication of data in their cloud storage service.[6]

Overview

  1. The system computes a cryptographic hash of the plaintext in question.
  2. The system then encrypts the plaintext by using the hash as a key.
  3. Finally, the hash itself is stored, encrypted with a key chosen by the user.

Known Attacks

Convergent encryption is open to a "confirmation of a file attack" in which an attacker can effectively confirm whether a target possesses a certain file by encrypting an unencrypted, or plain-text, version and then simply comparing the output with files possessed by the target.[7] This attack poses a problem for a user storing information that is non-unique, i.e. also either publicly available or already held by the adversary - for example: banned books or files that cause copyright infringement. An argument could be made that a confirmation of a file attack is rendered less effective by adding a unique piece of data such as a few random characters to the plain text before encryption; this causes the uploaded file to be unique and therefore results in a unique encrypted file. However, some implementations of convergent encryption where the plain-text is broken down into blocks based on file content, and each block then independently convergently encrypted may inadvertently defeat attempts at making the file unique by adding bytes at the beginning or end.[8]

Even more alarming than the confirmation attack is the "learn the remaining information attack" described by Drew Perttula in 2008.[9] This type of attack applies to the encryption of files that are only slight variations of a public document. For example, if the defender encrypts a bank form including a ten digit bank account number, an attacker that is aware of generic bank form format may extract defender's bank account number by producing bank forms for all possible bank account numbers, encrypt them and then by comparing those encryptions with defender's encrypted file deduce the bank account number. Note that this attack can be extended to attack a large number of targets at once (all spelling variations of a target bank customer in the example above, or even all potential bank customers), and the presence of this problem extends to any type of form document: tax returns, financial documents, healthcare forms, employment forms, etc. Also note that there is no known method for decreasing the severity of this attack -- adding a few random bytes to files as they are stored does not help, since those bytes can likewise be attacked with the "learn the remaining information" approach. The only effective approach to mitigating this attack is to encrypt the contents of files with a non-convergent secret before storing (negating any benefit from convergent encryption), or to simply not use convergent encryption in the first place.

See also

Notes and References

  1. Secure Data Deduplication, Mark W. Storer Kevin Greenan Darrell D. E. Long Ethan L. Miller http://www.ssrc.ucsc.edu/Papers/storer-storagess08.pdf
  2. System for backing up files from disk volumes on multiple nodes of a computer network, US Patent 5,778,395 filed October 1995, http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=5778395.PN.&OS=PN/5778395&RS=PN/5778395
  3. Reclaiming Space from Duplicate Files in a Serverless Distributed File System, MSR-TR-2002-30, http://research.microsoft.com/apps/pubs/default.aspx?id=69954
  4. Data repository and method for promoting network storage of data, US Patent 7,412,462 provisionally filed Feb 2000, http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=7,412,462.PN.&OS=PN/7,412,462&RS=PN/7,412,462
  5. Drew Perttula and Attacks on Convergent Encryption https://tahoe-lafs.org/hacktahoelafs/drew_perttula.html
  6. Finally! Bitcasa CEO Explains How The Encryption Works, September 18th, 2011, https://techcrunch.com/2011/09/18/bitcasa-explains-encryption/
  7. https://tahoe-lafs.org/hacktahoelafs/drew_perttula.html
  8. http://www.ssrc.ucsc.edu/Papers/storer-storagess08.pdf Storer, Greenan, Long & Miller: "Secure Data Deduplication" University of California at Santa Cruz
  9. https://tahoe-lafs.org/hacktahoelafs/drew_perttula.html