In mathematics and theoretical computer science, analysis of Boolean functions is the study of real-valued functions on
\{0,1\}n
\{-1,1\}n
We will mostly consider functions defined on the domain
\{-1,1\}n
\{0,1\}n
f
\{-1,1\}n
\{0,1\}n
f01(x1,\ldots,xn)=
x1 | |
f((-1) |
xn | |
,\ldots,(-1) |
).
Similarly, for us a Boolean function is a
\{-1,1\}
\{0,1\}
Every real-valued function
f\colon\{-1,1\}n\toR
f(x)=\sumS\hat{f}(S)\chiS(x), \chiS(x)=\prodixi.
(Note that even if the function is 0-1 valued this is not a sum mod 2, but just an ordinary sum of real numbers.)
This is the Hadamard transform of the function
f
n | |
Z | |
2 |
\hat{f}(S)
f
\chiS
\{-1,1\}n
\langlef,g\rangle=2-n
n} | |
\sum | |
x\in\{-1,1\ |
f(x)g(x)
The Fourier coefficients can be calculated using an inner product:
\hat{f}(S)=\langlef,\chiS\rangle.
In particular, this shows that
\hat{f}(\emptyset)=\operatorname{E}[f]
\{-1,1\}n
\|f\|2=\operatorname{E}[f2]=\sumS\hat{f}(S)2.
If we skip
S=\emptyset
f
\operatorname{Var}[f]=\sumS\hat{f}(S)2.
The degree of a function
f\colon\{-1,1\}n\toR
d
\hat{f}(S) ≠ 0
S
d
f
It is convenient to decompose the Fourier expansion into levels: the Fourier coefficient
\hat{f}(S)
|S|
The degree
d
f
f=d=\sum|S|\hat{f}(S)\chiS.
It is obtained from
f
d
We similarly define
f>d,f<d,f\geq,f\leq
The
i
f\colon\{-1,1\}n\toR
\begin{align} &\operatorname{Inf}i[f]=\operatorname{E}\left[\left(
f-f ⊕ | |
2 |
\right)2\right]=\sumS\hat{f}(S)2,\\[5pt] &f ⊕ (x1,\ldots,xn)=f(x1,\ldots,xi-1,-xi,xi+1,\ldots,xn). \end{align}
If
f
\operatorname{Inf}i[f]
i
\operatorname{Inf}i[f]=\Pr[f(x) ≠ f ⊕ (x)].
If
\operatorname{Inf}i[f]=0
f
i
The total influence of
f
\operatorname{Inf}[f]=
n | |
\sum | |
i=1 |
\operatorname{Inf}i[f]=\sumS|S|\hat{f}(S)2.
The total influence of a Boolean function is also the average sensitivity of the function. The sensitivity of a Boolean function
f
i
i
The total influence can also be defined using the discrete Laplacian of the Hamming graph, suitably normalized:
\operatorname{Inf}[f]=\langlef,Lf\rangle
A generalized form of influence is the
\rho
(\rho) | |
\operatorname{Inf} | |
i |
[f] = \operatorname{Stab}\rho[\operatorname{D}if] = \sumS\ni\rho|S|-1\hatf(S)2.
\operatorname{I}(\rho)[f] =
d | |
d\rho |
\operatorname{Stab}\rho[f] = \sumS|S|\rho|S|-1\hatf(S)2.
f:\{-1,1\}n\to\{-1,1\}
|\{i\in[n]:
(1-\delta) | |
\operatorname{Inf} | |
i |
[f]\ge\epsilon\}|\le
1 | |
\delta\epsilon |
.
Given
-1\leq\rho\leq1
x,y\in\{-1,1\}n
\rho
x,y
\operatorname{E}[xiyi]=\rho
\rho
x,z\in\{-1,1\}n
y
yi=\begin{cases}xi&w.p.\rho,\ zi&w.p.1-\rho.\end{cases} or yi=\begin{cases}xi&w.p.
1+\rho | |
2 |
,\ -xi&w.p.
1-\rho | |
2 |
.\end{cases}
We denote this distribution by
y\simN\rho(x)
The noise stability of a function
f\colon\{-1,1\}n\toR
\rho
\operatorname{Stab}\rho[f]=
\operatorname{E} | |
x;y\simN\rho(x) |
[f(x)f(y)]=\sumS\rho|S|\hat{f}(S)2.
For
0\leq\delta\leq1
f
\delta
\operatorname{NS}\delta[f]=
1 | |
2 |
-
1 | |
2 |
\operatorname{Stab}1-2\delta[f].
If
f
f
\delta
The noise operator
T\rho
f\colon\{-1,1\}n\toR
T\rhof\colon\{-1,1\}n\toR
(T\rhof)(x)=
\operatorname{E} | |
y\simN\rho(x) |
[f(y)]=\sumS\rho|S|\hat{f}(S)\chiS.
When
\rho>0
T\rho
1 | log | |
2 |
1 | |
\rho |
x
f
Noise stability can be defined in terms of the noise operator:
\operatorname{Stab}\rho[f]=\langlef,T\rhof\rangle
For
1\leqq<infty
Lq
f\colon\{-1,1\}n\toR
\|f\|q=\sqrt[q]{\operatorname{E}[|f|q]}.
We also define
\|f\|infty=
n} | |
max | |
x\in\{-1,1\ |
|f(x)|.
The hypercontractivity theorem states that for any
q>2
q'=1/(1-1/q)
\|T\rhof\|q\leq\|f\|2 and \|T\rhof\|2\leq\|f\|q'.
Hypercontractivity is closely related to the logarithmic Sobolev inequalities of functional analysis.[2]
A similar result for
q<2
In many situations the input to the function is not uniformly distributed over
\{-1,1\}n
-1
1
\{0,1\}n
0<p<1
\mup
\mup(x)=
\sumixi | |
p |
\sumi(1-xi) | |
(1-p) |
.
This measure can be generated by choosing each coordinate independently to be 1 with probability
p
1-p
The classical Fourier characters are no longer orthogonal with respect to this measure. Instead, we use the following characters:
\omegaS(x)=\left(\sqrt{
p | |
1-p |
The p-biased Fourier expansion of
f
f
f=\sumS\hat{f}(S)\omegaS.
We can extend the definitions of influence and the noise operator to the p-biased setting by using their spectral definitions.
The
i
\operatorname{Inf}i[f]=\sumS\hat{f}(S)2=p(1-p)\operatorname{E}[(f-f ⊕ )2].
The total influence is the sum of the individual influences:
\operatorname{Inf}[f]=
n | |
\sum | |
i=1 |
\operatorname{Inf}i[f] =\sumS|S|\hat{f}(S)2 .
A pair of
\rho
x,z\sim\mup
y\simN\rho(x)
N\rho
yi=\begin{cases}xi&w.p.\rho,\ zi&w.p.1-\rho.\end{cases}
The noise operator is then given by
(T\rhof)(x)=\sumS\rho|S|\hat{f}(S)\omegaS(x)=
\operatorname{E} | |
y\simN\rho(x) |
[f(y)].
Using this we can define the noise stability and the noise sensitivity, as before.
The Russo–Margulis formula (also called the Margulis–Russo formula) states that for monotone Boolean functions
f\colon\{0,1\}n\to\{0,1\}
d | |
dp |
\operatorname{E} | |
x\sim\mup |
[f(x)]=
\operatorname{Inf | |
[f]}{p(1-p)} |
=
n | |
\sum | |
i=1 |
\Pr[f ≠ f ⊕ ].
Both the influence and the probabilities are taken with respect to
\mup
f
f
p
f
p
p
The Russo–Margulis formula is key for proving sharp threshold theorems such as Friedgut's.
One of the deepest results in the area, the invariance principle, connects the distribution of functions on the Boolean cube
\{-1,1\}n
Rn
n
Many of the basic concepts of Fourier analysis on the Boolean cube have counterparts in Gaussian space:
L2
(U\rhof)(x)=\operatorname{E}z[f(\rhox+\sqrt{1-\rho2}z)]
(U\rhof)(x)=\operatorname{E}[f(y)]
x,y
\rho
Gaussian space is more symmetric than the Boolean cube (for example, it is rotation invariant), and supports continuous arguments which may be harder to get through in the discrete setting of the Boolean cube. The invariance principle links the two settings, and allows deducing results on the Boolean cube from results on Gaussian space.
If
f\colon\{-1,1\}n\to\{-1,1\}
f
f
The Friedgut–Kalai–Naor theorem,[4] also known as the FKN theorem, states that if
f
f\colon\{-1,1\}n\to\{-1,1\}
\|f>1\|2<\varepsilon
f
O(\varepsilon)
\|f-g\|2=O(\varepsilon)
g
\Pr[f ≠ g]=O(\varepsilon)
g
Similarly, a Boolean function of degree at most
d
CW2d
CW
f\colon\{-1,1\}n\to\{-1,1\}
\|f>d\|2<\varepsilon
f
O(\varepsilon)
d
The Poincaré inequality for the Boolean cube (which follows from formulas appearing above) states that for a function
f\colon\{-1,1\}n\toR
\operatorname{Var}[f]\leq\operatorname{Inf}[f]\leq\degf ⋅ \operatorname{Var}[f].
This implies that
maxi\operatorname{Inf}i[f]\geq
\operatorname{Var | |
[f]}{n} |
The Kahn–Kalai–Linial theorem,[7] also known as the KKL theorem, states that if
f
maxi\operatorname{Inf}i[f]=\Omega\left(
logn | |
n |
\right)
The bound given by the Kahn–Kalai–Linial theorem is tight, and is achieved by the Tribes function of Ben-Or and Linial:[8]
(x1,1\land … \landx1,w)\lor … \lor
(x | |
2w,1 |
\land … \land
x | |
2w,w |
).
The Kahn–Kalai–Linial theorem was one of the first results in the area, and was the one introducing hypercontractivity into the context of Boolean functions.
If
f\colon\{-1,1\}n\to\{-1,1\}
M
M
\operatorname{Inf}[f]\leqM
Friedgut's theorem[9] is a converse to this result. It states that for any
\varepsilon>0
f
\varepsilon
\exp(\operatorname{Inf}[f]/\varepsilon)
Combined with the Russo–Margulis lemma, Friedgut's junta theorem implies that for every
p
\muq
q ≈ p
The invariance principle[10] generalizes the Berry–Esseen theorem to non-linear functions.
The Berry–Esseen theorem states (among else) that if
f=
n | |
\sum | |
i=1 |
cixi
ci
f
\{-1,1\}n
The invariance principle (in a special case) informally states that if
f
x1,\ldots,xn
f
f
\{-1,1\}n
More formally, let
\psi
f=\sumS\hat{f}(S)\chiS
k=\degf
\varepsilon=maxi\sumS\hat{f}(S)2
\sumS\hat{f}(S)2\leq1
\left|
n} | |
\operatorname{E} | |
x\sim\{-1,1\ |
[\psi(f(x))]-\operatorname{E}g[\psi(f(g))]\right|=O(k9k\varepsilon).
By choosing appropriate
\psi
f
\supt|\Pr[f(x)<t]-\Pr[f(g)<t]|
The invariance principle was the key ingredient in the original proof of the Majority is Stablest theorem.
A Boolean function
f\colon\{-1,1\}n\to\{-1,1\}
f(xy)=f(x)f(y)
xy=(x1y1,\ldots,xnyn)
\chiS
In property testing we want to test whether a given function is linear. It is natural to try the following test: choose
x,y\in\{-1,1\}n
f(xy)=f(x)f(y)
f
1-\varepsilon
f
O(\varepsilon)
Bellare et al.[12] gave an extremely simple Fourier-analytic proof, that also shows that if the test succeeds with probability
1/2+\varepsilon
f
1 | |
2 |
+
1 | |
2 |
\sumS\hat{f}(S)3.
Arrow's impossibility theorem states that for three and more candidates, the only unanimous voting rule for which there is always a Condorcet winner is a dictatorship.
The usual proof of Arrow's theorem is combinatorial. Kalai[13] gave an alternative proof of this result in the case of three candidates using Fourier analysis. If
f\colon\{-1,1\}n\to\{-1,1\}
3 | |
4 |
-
3 | |
4 |
\operatorname{Stab}-1/3[f]
The FKN theorem implies that if
f
f
A classical result in the theory of random graphs states that the probability that a
G(n,p)
-e-c | |
e |
p\sim
logn+c | |
n |
O(1/n)
logn | |
n |
G(n,p)
-c3/6 | |
e |
p\sim
c | |
n |
\Theta(1/n)
Friedgut's sharp threshold theorem[14] states, roughly speaking, that a monotone graph property (a graph property is a property which doesn't depend on the names of the vertices) has a sharp threshold unless it is correlated with the appearance of small subgraphs. This theorem has been widely applied to analyze random graphs and percolation.
On a related note, the KKL theorem implies that the width of threshold window is always at most
O(1/logn)
Let
\operatorname{Maj}n\colon\{-1,1\}n\to\{-1,1\}
n
\operatorname{Stab}\rho[\operatorname{Maj}n]\longrightarrow1-
2 | |
\pi |
\arccos\rho.
This is related to the probability that if we choose
x\in\{-1,1\}n
y\in\{-1,1\}n
x
1-\rho | |
2 |
\operatorname{Stab}\rho[\operatorname{Maj}n]=2\Pr[\operatorname{Maj}n(x)=\operatorname{Maj}n(y)]-1
There are Boolean functions with larger noise stability. For example, a dictatorship
xi
\rho
The Majority is Stablest theorem states, informally, then the only functions having noise stability larger than majority have influential coordinates. Formally, for every
\varepsilon>0
\tau>0
f\colon\{-1,1\}n\to\{-1,1\}
maxi\operatorname{Inf}i[f]\leq\tau
\operatorname{Stab}\rho[f]\leq1-
2 | |
\pi |
\arccos\rho+\varepsilon
The first proof of this theorem used the invariance principle in conjunction with an isoperimetric theorem of Borell in Gaussian space; since then more direct proofs were devised.[16]
Majority is Stablest implies that the Goemans–Williamson approximation algorithm for MAX-CUT is optimal, assuming the unique games conjecture. This implication, due to Khot et al., was the impetus behind proving the theorem.