The exponential mechanism is a technique for designing differentially private algorithms. It was developed by Frank McSherry[1] and Kunal Talwar[2] in 2007. Their work was recognized as a co-winner of the 2009 PET Award for Outstanding Research in Privacy Enhancing Technologies.[3]
Most of the initial research in the field of differential privacy revolved around real-valued functions which have relatively low sensitivity to change in the data of a single individual and whose usefulness is not hampered by small additive perturbations. A natural question is what happens in the situation when one wants to preserve more general sets of properties. The exponential mechanism helps to extend the notion of differential privacy to address these issues. Moreover, it describes a class of mechanisms that includes all possible differentially private mechanisms.
Source:[4]
In very generic terms, a privacy mechanism maps a set of
n
l{D}
l{R}
l{D}
l{R}
l{D}
l{R}
\mu
l{R}
q:l{D}n x l{R} → R
(d,r)
d\inl{D}n
r\inl{R}
(d,r)
d\inl{D}n
r\inl{R}
q(d,r)
\varepsilon(d) | |
l{E} | |
q |
q:(l{D}n x l{R}) → R
\mu
l{R}
\varepsilon(d):= | |
l{E} | |
q |
r
e\varepsilon x \mu(r)
d\inl{D}n,r\inl{R}
r
q(d,r)
\mu
r
q(d,r)
\varepsilon(d) | |
l{E} | |
q |
\intre\varepsilon x \mu(r)
Theorem (differential privacy):
\varepsilon(d) | |
l{E} | |
q |
(2\varepsilon\Deltaq)
Proof: The probability density of
\varepsilon(d) | |
l{E} | |
q |
r
e\varepsilon\mu(r) | |
\inte\varepsilon\mu(r)dr |
.
Now, if a single change in
d
q
\Deltaq
e\varepsilon\Delta
e-\varepsilon\Delta
d
\exp(2\varepsilon\Deltaq)
We would ideally want the random draws of
r
\varepsilon(d) | |
l{E} | |
q |
q(d,r)
maxrq(d,r)
OPT
OPT
\mu
r
q
Lemma: Let
St=\{r:q(d,r)>OPT-t\}
\bar{S}2t=\{r:q(d,r)\leqOPT-2t\}
p(\bar{S}2t)
\exp(-\varepsilont)/\mu(St)
l{R}
Proof: The probability
p(\bar{S}2t)
p(\bar{S}2t)/p(St)
p(\bar{S | |
2t |
)}{p(St)}=
\int\bar{S2t | |
\exp(\varepsilon |
q(d,r))\mu(r)
dr}{\int | |
St |
\exp(\varepsilonq(d,r))\mu(r)dr}\leq\exp(-\varepsilont)
\mu(\bar{S | |
2t |
)}{\mu(St)}.
The value of
\mu(\bar{S}2t)
Theorem (Accuracy): For those values of
t\geqln\left(
OPT | |
t\mu(St) |
\right)/\varepsilon
\varepsilon(d))]\geq | |
E[q(d,l{E} | |
q |
OPT-3t
Proof: It follows from the previous lemma that the probability of the score being at least
OPT-2t
1-\exp(-\varepsilont)/\mu(St)
t\geqln\left(
OPT | |
t\mu(St) |
\right)/\varepsilon
t
1-t/OPT
OPT-2t
We can assume
\mu(A)
A\subseteql{R}
\mu(l{R})
Source:[5]
Before we get into the details of the example let us define some terms which we will be using extensively throughout our discussion.
Definition (global sensitivity): The global sensitivity of a query
Q
D1,D
n | |
2\inl{D} |
GSQ=
max | |
D1,D2:d(D1,D2)=1 |
|(Q(D1)-Q(D2))|.
Definition: A predicate query
Q\varphi
\varphi
Q\varphi=
|\{x\inD:\varphi(x)\ | |
|}{|D|}. |
Note that
GS | |
Q\varphi |
\leq1/n
\varphi
The following is due to Avrim Blum, Katrina Ligett and Aaron Roth.
Definition (Usefulness): A mechanism
l{A}
(\alpha,\delta)
H
1-\delta
\forallh\inH
D
\widehat{D}=l{A}(D)
|Qh(\widehat{D})-Qh(D)|\leq\alpha
Informally, it means that with high probability the query
Qh
D
\widehat{D}
D
n
k
(x1,x2,...,xk)
xi\in\{0,1\}
\pi1x1+\pi2x2+ … +\pik-1xk-1\geqxk
\pi1,\pi2,...,\pik-1
\widehat{D}
D
In this section we show that it is possible to release a dataset which is useful for concepts from a polynomial VC-Dimension class and at the same time adhere to
\varepsilon
Theorem: For any class of functions
H
D\subset\{0,1\}k
|D|\geqO\left(
k ⋅ \operatorname{VCDim | |
(H)log(1/\alpha)}{\alpha |
| ||||
\right)
(\alpha,\delta)
\widehat{D}
\varepsilon
One interesting fact is that the algorithm which we are going to develop generates a synthetic dataset whose size is independent of the original dataset; in fact, it only depends on the VC-dimension of the concept class and the parameter
\alpha
\tilde{O}(\operatorname{VCDim}(H)/\alpha2)
We borrow the Uniform Convergence Theorem from combinatorics and state a corollary of it which aligns to our need.
Lemma: Given any dataset
D
\widehat{D}
=O(\operatorname{VCDim}(H)log(1/\alpha))/\alpha2
maxh\in|Qh(D)-Qh(\widehat{D})|\leq\alpha/2
Proof:
We know from the uniform convergence theorem that
\begin{align} &\Pr\left[\left|Qh(D)-Qh(\widehat{D})\right|\geq
\alpha | |
2 |
forsomeh\inH\right]\\[5pt] \leq{}&2\left(
em | |
\operatorname{VCDim |
(H)}\right)\operatorname{VCDim(H)} ⋅
-\alpha2m/8 | |
e |
, \end{align}
where probability is over the distribution of the dataset. Thus, if the RHS is less than one then we know for sure that the data set
\widehat{D}
m\geqλ(\operatorname{VCDim}(H)log(m/\operatorname{VCDim}(H))/\alpha2)
λ
\tilde{O}(\operatorname{VCDim}(H)/\alpha2)
m
m\geqλ(\operatorname{VCDim}(H)log(1/\alpha)/\alpha2)
Now we invoke the exponential mechanism.
Definition: For any function
q:((\{0,1\}k)n x (\{0,1\}k)m) → R
D
\widehat{D}
eq(D,\widehat{D)\varepsilonn/2}
From the exponential mechanism we know this preserves
(\varepsilonnGSq)
We define
(q(D),q(\widehat{D}))=-maxh\in|Qh(D)-Qh(\widehat{D})|
To show that the mechanism satisfies the
(\alpha,\delta)
\widehat{D}
q(D,\widehat{D})\geq-\alpha
1-\delta
2km
q(D,\widehat{D})\leq-\alpha
e-\varepsilon\alpha
\widehat{D}
2kme-\varepsilon\alpha
\widehat{D}\in(\{0,1\}k)m
q(D,\widehat{D})\geq-\alpha/2
e-\alpha\varepsilon
Let
A:=
\widehat{D}
q(D,\widehat{D})\geq-\alpha/2
B:=
\widehat{D}
q(D,\widehat{D})\leq-\alpha
\therefore
\Pr[A] | |
\Pr[B] |
\geq
e-\alpha\varepsilon | = | |
2kme-\alpha\varepsilon |
e\alpha\varepsilon | |
2km |
.
1/\delta\geq(1-\delta)/\delta
n\geq | 4 | \left(km+ln |
\varepsilon\alpha |
1 | |
\delta |
\right)\geqO\left(
d ⋅ \operatorname{VCDim | |
(H)log(1/\alpha)}{\alpha |
| ||||
\right).
And hence we prove the theorem.
In the above example of the usage of exponential mechanism, one can output a synthetic dataset in a differentially private manner and can use the dataset to answer queries with good accuracy. Other private mechanisms, such as posterior sampling,[6] which returns parameters rather than datasets, can be made equivalent to the exponential one.[7]
Apart from the setting of privacy, the exponential mechanism has also been studied in the context of auction theory and classification algorithms.[8] In the case of auctions the exponential mechanism helps to achieve a truthful auction setting.