In mathematics and multivariate statistics, the centering matrix[1] is a symmetric and idempotent matrix, which when multiplied with a vector has the same effect as subtracting the mean of the components of the vector from every component of that vector.
The centering matrix of size n is defined as the n-by-n matrix
Cn=In-\tfrac{1}{n}Jn
In
Jn
For example
C1=\begin{bmatrix} 0\end{bmatrix}
C2=\left[\begin{array}{rrr}1&0\\ 0&1\end{array}\right]-
1 | |
2 |
\left[\begin{array}{rrr}1&1\\ 1&1 \end{array}\right]=\left[\begin{array}{rrr}
1 | |
2 |
&-
1 | \\ - | |
2 |
1 | |
2 |
&
1 | |
2 |
\end{array}\right]
C3=\left[\begin{array}{rrr} 1&0&0\\ 0&1&0\\ 0&0&1\end{array}\right]-
1 | |
3 |
\left[\begin{array}{rrr} 1&1&1\\ 1&1&1\\ 1&1&1\end{array}\right] =\left[\begin{array}{rrr}
2 | |
3 |
&-
1 | |
3 |
&-
1 | \\ - | |
3 |
1 | |
3 |
&
2 | |
3 |
&-
1 | \\ - | |
3 |
1 | |
3 |
&-
1 | |
3 |
&
2 | |
3 |
\end{array}\right]
Given a column-vector,
v
Cn
Cnv=v-
rm{T}v)J | |
(\tfrac{1}{n}J | |
n,1 |
Jn,1
rm{T}v | |
\tfrac{1}{n}J | |
n,1 |
v
Cn
Cn
k=C | |
C | |
n |
k=1,2,\ldots
Cn
Cnv
Cn
Cn
Jn,1
Cn
Cnv
v
Jn,1
The trace of
Cn
n(n-1)/n=n-1
Although multiplication by the centering matrix is not a computationally efficient way of removing the mean from a vector, it is a convenient analytical tool. It can be used not only to remove the mean of a single vector, but also of multiple vectors stored in the rows or columns of an m-by-n matrix
X
The left multiplication by
Cm
CmX
Cn
XCn
CmXCn
The centering matrix provides in particular a succinct way to express the scatter matrix,
S=(X-\mu
T | |
J | |
n,1 |
)(X-\mu
T | |
J | |
n,1 |
)T
X
\mu=\tfrac{1}{n}XJn,1
S=XCn(XC
T | |
n) |
=XCnC
T | |
nX |
T | |
=XC | |
nX |
.
Cn
k=n
p1=p2= … =p
|