Reconstruction attack explained

A reconstruction attack is any method for partially reconstructing a private dataset from public aggregate information. Typically, the dataset contains sensitive information about individuals, whose privacy needs to be protected. The attacker has no or only partial access to the dataset, but has access to public aggregate statistics about the datasets, which could be exact or distorted, for example by adding noise. If the public statistics are not sufficiently distorted, the attacker is able to accurately reconstruct a large portion of the original private data. Reconstruction attacks are relevant to the analysis of private data, as they show that, in order to preserve even a very weak notion of individual privacy, any published statistics need to be sufficiently distorted. This phenomenon was called the Fundamental Law of Information Recovery by Dwork and Roth, and formulated as "overly accurate answers to too many questions will destroy privacy in a spectacular way."[1]

The Dinur-Nissim Attack

In 2003, Irit Dinur and Kobbi Nissim proposed a reconstruction attack based on noisy answers to multiple statistical queries.[2] Their work was recognized by the 2013 ACM PODS Alberto O. Mendelzon Test-of-Time Award in part for being the seed for the development of differential privacy.[3]

Dinur and Nissim model a private database as a sequence of bits

D=(d1,\ldots,dn)

, where each bit is the private information of a single individual. A database query is specified by a subset

S\subseteq\{1,\ldots,n\}

, and is defined to equal

qS(D)=\sumi{di}

. They show that, given approximate answers

a1,\ldots,am

to queries specified by sets

S1,\ldots,Sm

, such that

|ai-

q
Si

(D)|\lel{E}

for all

i\in\{1,\ldots,m\}

, if

l{E}

is sufficiently small and

m

is sufficiently large, then an attacker can reconstruct most of the private bits in

D

. Here the error bound

l{E}

can be a function of

m

and

n

. Nissim and Dinur's attack works in two regimes: in one regime,

m

is exponential in

n

, and the error

l{E}

can be linear in

n

; in the other regime,

m

is polynomial in

n

, and the error

l{E}

is on the order of

\sqrt{n}

.

References

  1. http://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf The Algorithmic Foundations of Differential Privacy
  2. Irit Dinur and Kobbi Nissim. 2003. Revealing information while preserving privacy. In Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS '03). ACM, New York, NY, USA, 202–210.
  3. Web site: ACM PODS Alberto O. Mendelzon Test-of-Time Award .