For certain applications in linear algebra, it is useful to know properties of the probability distribution of the largest eigenvalue of a finite sum of random matrices. Suppose
\{Xk\}
\Pr\left\{λmax\left(\sumkXk\right)\geqt\right\}
Consider a finite sequence
\{Ak\}
d
\{\xik\}
Then, for all
t\geq0
\Pr\left\{λmax\left(\sumk\xikAk\right)\geqt\right\}\leqd ⋅
-t2/2\sigma2 | |
e |
\sigma2=\Vert\sumk
2 | |
A | |
k |
\Vert.
Consider a finite sequence
\{Bk\}
d1 x d2
\{\xik\}
\sigma2=max\left\{\Vert\sumkBkB
* | |
k |
\Vert,\Vert\sumk
*B | |
B | |
k |
\Vert\right\}.
Then, for all
t\geq0
\Pr\left\{\Vert\sumk\xikBk\Vert\geqt\right\}\leq(d1+d2) ⋅
-t2/2\sigma2 | |
e |
.
The classical Chernoff bounds concern the sum of independent, nonnegative, and uniformly bounded random variables.In the matrix setting, the analogous theorem concerns a sum of positive-semidefinite random matrices subjected to a uniform eigenvalue bound.
Consider a finite sequence
\{Xk\}
d
Xk\succeq0 and λmax(Xk)\leqR
Define
\mumin=λmin\left(\sumkEXk\right) and \mumax=λmax\left(\sumkEXk\right).
\Pr\left\{λmin\left(\sumkXk\right)\leq(1-\delta)\mumin\right\}\leqd ⋅ \left[
e-\delta | |
(1-\delta)1-\delta |
\mumin/R | |
\right] |
for\delta\in[0,1),and
\Pr\left\{λmax\left(\sumkXk\right)\geq(1+\delta)\mumax\right\}\leqd ⋅ \left[
e\delta | |
(1+\delta)1+\delta |
\mumax/R | |
\right] |
for\delta\geq0.
Consider a sequence
\{Xk:k=1,2,\ldots,n\}
Xk\succeq0 and λmax(Xk)\leq1
Compute the minimum and maximum eigenvalues of the average expectation,
\bar{\mu}min=λmin\left(
1 | |
n |
n | |
\sum | |
k=1 |
EXk\right) and \bar{\mu}max=λmax\left(
1 | |
n |
n | |
\sum | |
k=1 |
EXk\right).
\Pr\left\{λmin\left(
1 | |
n |
n | |
\sum | |
k=1 |
Xk\right)\leq\alpha\right\}\leqd ⋅
-nD(\alpha\Vert\bar{\mu | |
e | |
min |
)} for0\leq\alpha\leq\bar{\mu}min,and
\Pr\left\{λmax\left(
1 | |
n |
n | |
\sum | |
k=1 |
Xk\right)\geq\alpha\right\}\leqd ⋅
-nD(\alpha\Vert\bar{\mu | |
e | |
max |
)} for\bar{\mu}max\leq\alpha\leq1.
D(a\Vertu)=a\left(loga-logu\right)+(1-a)\left(log(1-a)-log(1-u)\right)
a,u\in[0,1]
In the scalar setting, Bennett and Bernstein inequalities describe the upper tail of a sum of independent, zero-mean random variables that are either bounded or subexponential. In the matrixcase, the analogous results concern a sum of zero-mean random matrices.
Consider a finite sequence
\{Xk\}
d
EXk=0 and λmax(Xk)\leqR
Compute the norm of the total variance,
\sigma2=\Vert\sumkE
2 | |
(X | |
k) |
\Vert.
Then, the following chain of inequalities holds for all
t\geq0
\begin{align} \Pr\left\{λmax\left(\sumkXk\right)\geqt\right\}&\leqd ⋅ \exp\left(-
\sigma2 | |
R2 |
⋅ h\left(
Rt | |
\sigma2 |
\right)\right)\\ &\leqd ⋅ \exp\left(
-t2 | |
\sigma2+Rt/3 |
\right)\\ &\leq\begin{cases} d ⋅ \exp(-3t2/8\sigma2) &fort\leq\sigma2/R;\\ d ⋅ \exp(-3t/8R) &fort\geq\sigma2/R.\\ \end{cases} \end{align}
h(u)
h(u)=(1+u)log(1+u)-u
u\geq0
Consider a finite sequence
\{Xk\}
d
EXk=0 and
p) | |
E(X | |
k |
\preceq
p! | |
2 |
⋅ Rp-2
2 | |
A | |
k |
p=2,3,4,\ldots
Compute the variance parameter,
\sigma2=\Vert\sumk
2 | |
A | |
k |
\Vert.
Then, the following chain of inequalities holds for all
t\geq0
\begin{align} \Pr\left\{λmax\left(\sumkXk\right)\geqt\right\}&\leqd ⋅ \exp\left(
-t2/2 | |
\sigma2+Rt |
\right)\\ &\leq\begin{cases} d ⋅ \exp(-t2/4\sigma2) &fort\leq\sigma2/R;\\ d ⋅ \exp(-t/4R) &fort\geq\sigma2/R.\\ \end{cases} \end{align}
Consider a finite sequence
\{Zk\}
d1 x d2
EZk=0 and \VertZk\Vert\leqR
\sigma2=max\left\{\Vert\sumkE(ZkZ
*) | |
k |
\Vert,\Vert\sumkE
*Z | |
(Z | |
k) |
\Vert\right\}.
Then, for all
t\geq0
\Pr\left\{\Vert\sumkZk\Vert\geqt\right\}\leq(d1+d2) ⋅ \exp\left(
-t2/2 | |
\sigma2+Rt/3 |
\right)
holds.[1]
The scalar version of Azuma's inequality states that a scalar martingale exhibits normal concentration about its mean value, and the scale for deviations is controlled by the total maximum squared range of the difference sequence.The following is the extension in matrix setting.
Consider a finite adapted sequence
\{Xk\}
d
\{Ak\}
Ek-1Xk=0 and
2 | |
X | |
k |
\preceq
2 | |
A | |
k |
Compute the variance parameter
\sigma2=\Vert\sumk
2 | |
A | |
k |
\Vert.
Then, for all
t\geq0
\Pr\left\{λmax\left(\sumkXk\right)\geqt\right\}\leqd ⋅
-t2/8\sigma2 | |
e |
The constant 1/8 can be improved to 1/2 when there is additional information available. One case occurs when each summand
Xk
Xk
Ak
Placing addition assumption that the summands in Matrix Azuma are independent gives a matrix extension of Hoeffding's inequalities.
Consider a finite sequence
\{Xk\}
d
\{Ak\}
EXk=0 and
2 | |
X | |
k |
\preceq
2 | |
A | |
k |
Then, for all
t\geq0
\Pr\left\{λmax\left(\sumkXk\right)\geqt\right\}\leqd ⋅
-t2/8\sigma2 | |
e |
\sigma2=\Vert\sumk
2 | |
A | |
k |
\Vert.
An improvement of this result was established in :for all
t\geq0
\Pr\left\{λmax\left(\sumkXk\right)\geqt\right\}\leqd ⋅
-t2/2\sigma2 | |
e |
\sigma2=
1 | |
2 |
\Vert\sumk
2 | |
A | |
k |
+
2 | |
EX | |
k |
\Vert \leq\Vert\sumk
2 | |
A | |
k |
\Vert.
In scalar setting, McDiarmid's inequality provides one common way of bounding the differences by applying Azuma's inequality to a Doob martingale. A version of the bounded differences inequality holds in the matrix setting.
Let
\{Zk:k=1,2,\ldots,n\}
H
n
d
\{Ak\}
\left(H(z1,\ldots,zk,\ldots,zn)-H(z1,\ldots,z'k,\ldots,zn)\right)2\preceq
2, | |
A | |
k |
zi
z'i
Zi
i
\sigma2=\Vert\sumk
2 | |
A | |
k |
\Vert.
Then, for all
t\geq0
\Pr\left\{λmax\left(H(z)-EH(z)\right)\geqt\right\}\leqd ⋅
-t2/8\sigma2 | |
e |
,
z=(Z1,\ldots,Zn)
An improvement of this result was established in (see also):for all
t\geq0
\Pr\left\{λmax\left(H(z)-EH(z)\right)\geqt\right\}\leqd ⋅
-t2/\sigma2 | |
e |
,
z=(Z1,\ldots,Zn)
\sigma2=\Vert\sumk
2 | |
A | |
k |
\Vert.
The first bounds of this type were derived by . Recall the theorem above for self-adjoint matrix Gaussian and Rademacher bounds:For a finite sequence
\{Ak\}
d
\{\xik\}
\Pr\left\{λmax\left(\sumk\xikAk\right)\geqt\right\}\leqd ⋅
-t2/2\sigma2 | |
e |
\sigma2=\Vert\sumk
2 | |
A | |
k |
\Vert.
2 | |
\sigma | |
AW |
=\sumkλmax
2 | |
\left(A | |
k |
\right)
\sigma2
\Sigma
λmax
The chief contribution of was the extension of the Laplace-transform method used to prove the scalar Chernoff bound (see Chernoff bound#Additive form (absolute error)) to the case of self-adjoint matrices. The procedure given in the derivation below. All of the recent works on this topic follow this same procedure, and the chief differences follow from subsequent steps. Ahlswede & Winter use the Golden–Thompson inequality to proceed, whereas Tropp uses Lieb's Theorem.
Suppose one wished to vary the length of the series (n) and the dimensions of thematrices (d) while keeping the right-hand side approximately constant. Thenn must vary approximately as the log of d. Several papers have attempted to establish a bound without a dependence on dimensions. Rudelson and Vershynin give a result for matrices which are the outer product of two vectors. provide a result without the dimensional dependence for low rank matrices. The original result was derived independently from the Ahlswede–Winter approach, but proves a similar result using the Ahlswede–Winter approach.
Finally, Oliveira proves a result for matrix martingales independently from the Ahlswede–Winter framework. Tropp slightly improves on the result using the Ahlswede–Winter framework. Neither result is presented in this article.
The Laplace transform argument found in is a significant result in its own right:Let
Y
\Pr\left\{λmax(Y)\geqt\right\}\leqinf\theta\left\{e-\theta ⋅ \operatorname{E}\left[\operatorname{tr}e\theta\right]\right\}.
To prove this, fix
\theta>0
\begin{align}\Pr\left\{λmax(Y)\geqt\right\}&=\Pr\left\{λmax(\thetaY)\geq\thetat\right\}\\ &=\Pr\left\{
λmax(\thetaY) | |
e |
\geqe\theta\right\}\\ &\leqe-\theta\operatorname{E}
λmax(\thetaY) | |
e |
\\ &\leqe-\theta\operatorname{E}\operatorname{tr}e(\theta\end{align}
λmax(\thetaY) | |
e |
=λmax(e\theta)\leq\operatorname{tr}(e\theta)
\theta
\theta>0
Thus, our task is to understand
\operatorname{E}[\operatorname{tr}(e\theta)]
\operatorname{E}e\theta:=MY(\theta)
The Golden–Thompson inequality implies that
\operatorname{tr}
M | |
X1+X2 |
(\theta)\leq\operatorname{tr}\left[\left(\operatorname{E}
\thetaX1 | |
e |
\right) \left(\operatorname{E}
\thetaX2 | |
e |
\right)\right]=\operatorname{tr}
M | |
X1 |
(\theta)
M | |
X2 |
(\theta)
Y=\sumkXk
\operatorname{tr}MY(\theta)
\operatorname{tr}(AB)\leq\operatorname{tr}(A)λmax(B)
\operatorname{tr}MY(\theta)\leq\operatorname{tr}\left[\left(\operatorname{E}
| ||||||||||
e |
\right)\left(\operatorname{E}
\thetaXn | |
e |
\right)\right] \leq\operatorname{tr}\left(\operatorname{E}
| ||||||||||
e |
\right)λmax(\operatorname{E}
\thetaXn | |
e |
).
\operatorname{tr}MY(\theta)\leq(\operatorname{tr}I)\left[\Pikλmax(\operatorname{E}
\thetaXk | |
e |
)\right]= d
\sumkλmax\left(log\operatorname{E | |
e |
\thetaXk | |
e |
\right)}
So far we have found a bound with an infimum over
\theta
The major contribution of is the application of Lieb's theorem where had applied the Golden–Thompson inequality. Tropp's corollary is the following: If
H
X
\operatorname{E}\operatorname{tr}eH+X\leq\operatorname{tr}eHeX)}
Y=eX
f(Y)=\operatorname{tr}eH
\operatorname{E}\operatorname{tr}eH\leq\operatorname{tr}eHY)}.
This gives us the major result of the paper: the subadditivity of the log of the matrix generating function.
Let
Xk
\theta\inR
\operatorname{tr}
M | |
\sumkXk |
(\theta) \leq\operatorname{tr}
| |||||||||||
e |
Proof: It is sufficient to let
\theta=1
\operatorname{E}\operatorname{tr}
\sumk\thetaXk | |
e |
\leq\operatorname{tr}
\sumklog\operatorname{E | |
e |
\thetaXk | |
e |
}.
To complete the proof, we use the law of total expectation. Let
\operatorname{E}k
X1,\ldots,Xk
Xi
\operatorname{E}k-1
Xk | |
e |
=\operatorname{E}
Xk | |
e |
.
\Xik=log\operatorname{E}k-1
Xk | |
e |
=log
M | |
Xk |
(\theta)
Finally, we have
\begin{align} \operatorname{E}\operatorname{tr}
| ||||||||||
e |
&=\operatorname{E}0 … \operatorname{E}n-1\operatorname{tr}
| ||||||||||
e |
\\ &\leq\operatorname{E}0 … \operatorname{E}n-2\operatorname{tr}
| ||||||||||
e | ||||||||||
n-1 |
Xn | |
e |
)}\\ &=\operatorname{E}0 … \operatorname{E}n-2\operatorname{tr}
| ||||||||||
e |
\\ &\vdots\\ &=\operatorname{tr}
| ||||||||||
e |
\end{align}
Hm=
m-1 | |
\sum | |
k=1 |
Xk+
n | |
\sum | |
k=m+1 |
\Xik
The following is immediate from the previous result:
\Pr\left\{λmax\left(\sumkXk\right)\geqt\right\} \leqinf\theta\left\{e-\theta\operatorname{tr}
| |||||||||||
e |
\right\}