Centering matrix explained

In mathematics and multivariate statistics, the centering matrix^[1] is a symmetric and idempotent matrix, which when multiplied with a vector has the same effect as subtracting the mean of the components of the vector from every component of that vector.

Definition

The centering matrix of size n is defined as the n-by-n matrix

C_n=I_n-\tfrac{1}{n}J_n

where

I_n

is the identity matrix of size n and

J_n

is an n-by-n matrix of all 1's.

For example

C₁=\begin{bmatrix} 0\end{bmatrix}

C₂₌\left[\begin{array}{rrr}1&0\\ 0&1\end{array}\right]-

	1
	2

\left[\begin{array}{rrr}1&1\\ 1&1 \end{array}\right]=\left[\begin{array}{rrr}

	1
	2

	1	\\ -
	2

	1
	2

	1
	2

\end{array}\right]

C₃=\left[\begin{array}{rrr} 1&0&0\\ 0&1&0\\ 0&0&1\end{array}\right]-

	1
	3

\left[\begin{array}{rrr} 1&1&1\\ 1&1&1\\ 1&1&1\end{array}\right] =\left[\begin{array}{rrr}

	2
	3

	1
	3

	1	\\ -
	3

	1
	3

	2
	3

	1	\\ -
	3

	1
	3

	1
	3

	2
	3

\end{array}\right]

Properties

Given a column-vector,

of size n, the centering property of

C_n

can be expressed as

C_nv=v-

	rm{T}v)J
(\tfrac{1}{n}J
	n,1

where

J_n,1

is a column vector of ones and

	rm{T}v
\tfrac{1}{n}J
	n,1

is the mean of the components of

C_n

is symmetric positive semi-definite.

C_n

is idempotent, so that

	k=C
C
	n

, for

k=1,2,\ldots

. Once the mean has been removed, it is zero and removing it again has no effect.

C_n

is singular. The effects of applying the transformation

C_nv

cannot be reversed.

C_n

has the eigenvalue 1 of multiplicity n - 1 and eigenvalue 0 of multiplicity 1.

C_n

has a nullspace of dimension 1, along the vector

J_n,1

C_n

is an orthogonal projection matrix. That is,

C_nv

is a projection of

onto the (n - 1)-dimensional subspace that is orthogonal to the nullspace

J_n,1

. (This is the subspace of all n-vectors whose components sum to zero.)

The trace of

C_n

n(n-1)/n=n-1

Application

Although multiplication by the centering matrix is not a computationally efficient way of removing the mean from a vector, it is a convenient analytical tool. It can be used not only to remove the mean of a single vector, but also of multiple vectors stored in the rows or columns of an m-by-n matrix

The left multiplication by

C_m

subtracts a corresponding mean value from each of the n columns, so that each column of the product

C_mX

has a zero mean. Similarly, the multiplication by

C_n

on the right subtracts a corresponding mean value from each of the m rows, and each row of the product

XC_n

has a zero mean.The multiplication on both sides creates a doubly centred matrix

C_mXC_n

, whose row and column means are equal to zero.

The centering matrix provides in particular a succinct way to express the scatter matrix,

S=(X-\mu

	T
J
	n,1

)(X-\mu

	T
J
	n,1

)^T

of a data sample

, where

\mu=\tfrac{1}{n}XJ_n,1

is the sample mean. The centering matrix allows us to express the scatter matrix more compactly as

S=XC_n(XC

	T

	n)

=XC_nC

	T

	nX

	T
=XC
	nX

C_n

is the covariance matrix of the multinomial distribution, in the special case where the parameters of that distribution are

k=n

, and

p_1=p_{2= … =p}

n=	1
	n

Notes and References

John I. Marden, Analyzing and Modeling Rank Data, Chapman & Hall, 1995,, page 59.