Projection (linear algebra) explained

from a vector space to itself (an endomorphism) such that

P\circP=P

. That is, whenever

is applied twice to any vector, it gives the same result as if it were applied once (i.e.

is idempotent). It leaves its image unchanged.^[1] This definition of "projection" formalizes and generalizes the idea of graphical projection. One can also consider the effect of a projection on a geometrical object by examining the effect of the projection on points in the object.

Definitions

A projection on a vector space

is a linear operator

P\colonV\toV

such that

P²=P

When

has an inner product and is complete, i.e. when

is a Hilbert space, the concept of orthogonality can be used. A projection

on a Hilbert space

is called an orthogonal projection if it satisfies

\langlePx,y\rangle=\langlex,Py\rangle

for all

x,y\inV

. A projection on a Hilbert space that is not orthogonal is called an oblique projection.

Projection matrix

is called a projection matrix if it is equal to its square, i.e. if

P²=P

.^[2]

A square matrix

is called an orthogonal projection matrix if

P²=P=P^T

for a real matrix, and respectively

P²=P=P^*

for a complex matrix, where

P^T

denotes the transpose of

and

P^*

denotes the adjoint or Hermitian transpose of

.^[3]

A projection matrix that is not an orthogonal projection matrix is called an oblique projection matrix.

The eigenvalues of a projection matrix must be 0 or 1.

Examples

Orthogonal projection

For example, the function which maps the point

(x,y,z)

in three-dimensional space

R³

to the point

(x,y,0)

is an orthogonal projection onto the xy-plane. This function is represented by the matrix

P = \begin 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end.

The action of this matrix on an arbitrary vector is $P \begin x \\ y \\ z \end = \begin x \\ y \\ 0 \end.$

To see that

is indeed a projection, i.e.,

P=P²

, we compute

P^2 \begin x \\ y \\ z \end = P \begin x \\ y \\ 0 \end = \begin x \\ y \\ 0 \end = P\begin x \\ y \\ z \end.

Observing that

P^T=P

shows that the projection is an orthogonal projection.

Oblique projection

A simple example of a non-orthogonal (oblique) projection is $P = \begin 0 & 0 \\ \alpha & 1 \end.$

Via matrix multiplication, one sees that $P^2 = \begin 0 & 0 \\ \alpha & 1 \end \begin 0 & 0 \\ \alpha & 1 \end= \begin 0 & 0 \\ \alpha & 1 \end = P.$ showing that

is indeed a projection.

The projection

is orthogonal if and only if

\alpha=0

because only then

P^T=P.

Properties and classification

Idempotence

By definition, a projection

is idempotent (i.e.

P²=P

Open map

Every projection is an open map, meaning that it maps each open set in the domain to an open set in the subspace topology of the image. That is, for any vector

and any ball

B_x

(with positive radius) centered on

, there exists a ball

B_Px

(with positive radius) centered on

that is wholly contained in the image

P(B_x)

Complementarity of image and kernel

Let

be a finite-dimensional vector space and

be a projection on

. Suppose the subspaces

and

are the image and kernel of

respectively. Then

has the following properties:

is the identity operator

\forall \mathbf x \in U: P \mathbf x = \mathbf x.

W=U ⊕ V

. Every vector

x\inW

may be decomposed uniquely as

x=u+v

with

u=Px

and

v=x-Px=\left(I-P\right)x

, and where

u\inU,v\inV.

The image and kernel of a projection are complementary, as are

and

Q=I-P

. The operator

is also a projection as the image and kernel of

become the kernel and image of

and vice versa. We say

is a projection along

onto

(kernel/image) and

is a projection along

onto

Spectrum

In infinite-dimensional vector spaces, the spectrum of a projection is contained in

\{0,1\}

(\lambda I - P)^ = \frac 1 \lambda I + \frac 1 P.

Only 0 or 1 can be an eigenvalue of a projection. This implies that an orthogonal projection

is always a positive semi-definite matrix. In general, the corresponding eigenspaces are (respectively) the kernel and range of the projection. Decomposition of a vector space into direct sums is not unique. Therefore, given a subspace

, there may be many projections whose range (or kernel) is

x²-x=x(x-1)

, which factors into distinct linear factors, and thus

is diagonalizable.

Product of projections

The product of projections is not in general a projection, even if they are orthogonal. If two projections commute then their product is a projection, but the converse is false: the product of two non-commuting projections may be a projection.

If two orthogonal projections commute then their product is an orthogonal projection. If the product of two orthogonal projections is an orthogonal projection, then the two orthogonal projections commute (more generally: two self-adjoint endomorphisms commute if and only if their product is self-adjoint).

Orthogonal projections

See main article: article, Hilbert projection theorem and Complemented subspace.

When the vector space

has an inner product and is complete (is a Hilbert space) the concept of orthogonality can be used. An orthogonal projection is a projection for which the range

and the kernel

are orthogonal subspaces. Thus, for every

and

\langlePx,(y-Py)\rangle=\langle(x-Px),Py\rangle=0

. Equivalently:

\langle \mathbf x, P \mathbf y \rangle = \langle P \mathbf x, P \mathbf y \rangle = \langle P \mathbf x, \mathbf y \rangle.

A projection is orthogonal if and only if it is self-adjoint. Using the self-adjoint and idempotent properties of

, for any

and

we have

Px\inU

y-Py\inV

, and

\langle P \mathbf x, \mathbf y - P \mathbf y \rangle = \langle \mathbf x, \left(P-P^2\right) \mathbf y \rangle = 0

where

\langle ⋅ , ⋅ \rangle

is the inner product associated with

. Therefore,

and

I-P

are orthogonal projections.^[4] The other direction, namely that if

is orthogonal then it is self-adjoint, follows from the implication from

\langle(x-Px),Py\rangle=\langlePx,(y-Py)\rangle=0

\langle \mathbf x, P \mathbf y \rangle = \langle P \mathbf x, P\mathbf y \rangle = \langle P \mathbf x, \mathbf y \rangle = \langle \mathbf x, P^* \mathbf y \rangle

for every

and

; thus

P=P^*

The existence of an orthogonal projection onto a closed subspace follows from the Hilbert projection theorem.

Properties and special cases

An orthogonal projection is a bounded operator. This is because for every

in the vector space we have, by the Cauchy–Schwarz inequality:

\left \| P \mathbf v\right\|^2 = \langle P \mathbf v, P \mathbf v \rangle = \langle P \mathbf v, \mathbf v \rangle \leq \left\|P \mathbf v\right\| \cdot \left\|\mathbf v\right\|

Thus

\left\|Pv\right\|\leq\left\|v\right\|

For finite-dimensional complex or real vector spaces, the standard inner product can be substituted for

\langle ⋅ , ⋅ \rangle

Formulas

A simple case occurs when the orthogonal projection is onto a line. If

is a unit vector on the line, then the projection is given by the outer product

P_\mathbf = \mathbf u \mathbf u^\mathsf.

(If

is complex-valued, the transpose in the above equation is replaced by a Hermitian transpose). This operator leaves u invariant, and it annihilates all vectors orthogonal to

, proving that it is indeed the orthogonal projection onto the line containing u.^[5] A simple way to see this is to consider an arbitrary vector

as the sum of a component on the line (i.e. the projected vector we seek) and another perpendicular to it,

x=x_\parallel+x_\perp

. Applying projection, we get

P_ \mathbf x = \mathbf u \mathbf u^\mathsf \mathbf x_\parallel + \mathbf u \mathbf u^\mathsf \mathbf x_\perp = \mathbf u \left(\sgn\left(\mathbf u^\mathsf \mathbf x_\parallel\right) \left \| \mathbf x_\parallel \right \| \right) + \mathbf u \cdot \mathbf 0 = \mathbf x_\parallel

by the properties of the dot product of parallel and perpendicular vectors.

This formula can be generalized to orthogonal projections on a subspace of arbitrary dimension. Let

u_1,\ldots,u_k

be an orthonormal basis of the subspace

, with the assumption that the integer

k\geq1

, and let

denote the

n x k

matrix whose columns are

u_1,\ldots,u_k

, i.e.,

A=\begin{bmatrix}u₁& … &u_k\end{bmatrix}

. Then the projection is given by:^[6]

P_A = A A^\mathsf

which can be rewritten as

P_A = \sum_i \langle \mathbf u_i, \cdot \rangle \mathbf u_i.

The matrix

A^T

is the partial isometry that vanishes on the orthogonal complement of

, and

is the isometry that embeds

into the underlying vector space. The range of

P_A

is therefore the final space of

. It is also clear that

AA^T

is the identity operator on

The orthonormality condition can also be dropped. If

u_1,\ldots,u_k

is a (not necessarily orthonormal) basis with

k\geq1

, and

is the matrix with these vectors as columns, then the projection is:^[7]

P_A = A \left(A^\mathsf A\right)^ A^\mathsf.

The matrix

still embeds

into the underlying vector space but is no longer an isometry in general. The matrix

\left(A^TA\right)^-1

is a "normalizing factor" that recovers the norm. For example, the rank-1 operator

uu^T

is not a projection if

\left\|u\right\| ≠ 1.

After dividing by

u^Tu=\left\|u\right\|^2,

we obtain the projection

u\left(u^Tu\right)^-1u^T

onto the subspace spanned by

In the general case, we can have an arbitrary positive definite matrix

defining an inner product

\langlex,y\rangle_D=y^\daggerDx

, and the projection

P_A

is given by

P_A x = \operatorname_ \left\|x - y\right\|^2_D

. Then

P_A = A \left(A^\mathsf D A\right)^ A^\mathsf D.

When the range space of the projection is generated by a frame (i.e. the number of generators is greater than its dimension), the formula for the projection takes the form:

P_A=AA⁺

. Here

A⁺

stands for the Moore–Penrose pseudoinverse. This is just one of many ways to construct the projection operator.

\begin{bmatrix}A&B\end{bmatrix}

is a non-singular matrix and

A^TB=0

(i.e.,

is the null space matrix of

),^[8] the following holds:

\beginI &= \begin A & B \end\begin A & B \end^\begin A^\mathsf \\ B^\mathsf \end^\begin A^\mathsf \\ B^\mathsf \end \\ &= \begin A & B \end\left(\begin A^\mathsf \\ B^\mathsf \end\begin A & B \end\right)^\begin A^\mathsf \\B^\mathsf \end \\ &= \begin A & B \end \beginA^\mathsfA&O\\O&B^\mathsfB\end^\begin A^\mathsf \\ B^\mathsf \end\\[4pt] &= A \left(A^\mathsfA\right)^ A^\mathsf + B \left(B^\mathsfB\right)^ B^\mathsf\end

If the orthogonal condition is enhanced to

A^TWB=A^TW^TB=0

with

non-singular, the following holds:

I = \beginA & B\end \begin\left(A^\mathsf W A\right)^ A^\mathsf \\ \left(B^\mathsf W B\right)^ B^\mathsf \end W.

All these formulas also hold for complex inner product spaces, provided that the conjugate transpose is used instead of the transpose. Further details on sums of projectors can be found in Banerjee and Roy (2014). Also see Banerjee (2004) for application of sums of projectors in basic spherical trigonometry.

Oblique projections

The term oblique projections is sometimes used to refer to non-orthogonal projections. These projections are also used to represent spatial figures in two-dimensional drawings (see oblique projection), though not as frequently as orthogonal projections. Whereas calculating the fitted value of an ordinary least squares regression requires an orthogonal projection, calculating the fitted value of an instrumental variables regression requires an oblique projection.

A projection is defined by its kernel and the basis vectors used to characterize its range (which is a complement of the kernel). When these basis vectors are orthogonal to the kernel, then the projection is an orthogonal projection. When these basis vectors are not orthogonal to the kernel, the projection is an oblique projection, or just a projection.

A matrix representation formula for a nonzero projection operator

Let

P\colonV\toV

be a linear operator such that

P²=P

and assume that

is not the zero operator. Let the vectors

u_1,\ldots,u_k

form a basis for the range of

, and assemble these vectors in the

n x k

matrix

. Then

k\geq1

, otherwise

k=0

and

is the zero operator. The range and the kernel are complementary spaces, so the kernel has dimension

n-k

. It follows that the orthogonal complement of the kernel has dimension

. Let

v_1,\ldots,v_k

form a basis for the orthogonal complement of the kernel of the projection, and assemble these vectors in the matrix

. Then the projection

(with the condition

k\geq1

) is given by

P = A \left(B^\mathsf A\right)^ B^\mathsf.

This expression generalizes the formula for orthogonal projections given above.^[9] A standard proof of this expression is the following. For any vector

in the vector space

, we can decompose

x=x₁+x₂

, where vector

x₁=P(x)

is in the image of

, and vector

x₂=x-P(x).

P(x₂₎=P(x)-P^2(x)=0

, and then

x₂

is in the kernel of

, which is the null space of

In other words, the vector

x₁

is in the column space of

x₁=Aw

for some

dimension vector

and the vector

x₂

satisfies

B^Tx₂₌₀

by the construction of

. Put these conditions together, and we find a vector

so that

B^T(x-Aw)=0

. Since matrices

and

are of full rank

by their construction, the

k x k

-matrix

B^TA

is invertible. So the equation

B^T(x-Aw)=0

gives the vector

w=(B^TA)^-1B^Tx.

In this way,

Px=x₁=Aw=A(B^TA)^-1B^Tx

for any vector

x\inV

and hence

P=A(B^TA)^-1B^T

In the case that

is an orthogonal projection, we can take

A=B

, and it follows that

P=A\left(A^TA\right)^-1A^T

. By using this formula, one can easily check that

P=P^T

. In general, if the vector space is over complex number field, one then uses the Hermitian transpose

A^*

and has the formula

P=A\left(A^*A\right)^-1A^*

. Recall that one can express the Moore–Penrose inverse of the matrix

A⁺=(A^*A)^-1A^*

since

has full column rank, so

P=AA⁺

Singular values

I-P

is also an oblique projection. The singular values of

and

I-P

can be computed by an orthonormal basis of

. Let

Q_A

be an orthonormal basis of

and let

	\perp
Q
	A

be the orthogonal complement of

Q_A

. Denote the singular values of the matrix

	T
Q
	A

A(B^TA)^-1B^T

	\perp
Q
	A

by the positive values

\gamma₁\ge\gamma₂\ge\ldots\ge\gamma_k

. With this, the singular values for

are:

\sigma_i = 	\begin		\sqrt & 1 \le i \le k \\		0 & \text	\end

and the singular values for

I-P

are

\sigma_i = \begin		\sqrt 	& 1 \le i \le k \\		1 				& k+1 \le i \le n-k \\		0 & \text	\end

This implies that the largest singular values of

and

I-P

are equal, and thus that the matrix norm of the oblique projections are the same. However, the condition number satisfies the relation

\kappa(I-P)=

	\sigma₁
	1

\ge

	\sigma₁
	\sigma_k

=\kappa(P)

, and is therefore not necessarily equal.

Finding projection with an inner product

Let

be a vector space (in this case a plane) spanned by orthogonal vectors

u_1,u_2,...,u_p

. Let

be a vector. One can define a projection of

onto

\operatorname_V \mathbf y = \frac \mathbf u^i

where repeated indices are summed over (Einstein sum notation). The vector

can be written as an orthogonal sum such that

y=\operatorname{proj}_Vy+z

\operatorname{proj}_Vy

is sometimes denoted as

\hat{y}

. There is a theorem in linear algebra that states that this

is the smallest distance (the orthogonal distance) from

and is commonly used in areas such as machine learning.

Canonical forms

Any projection

P=P²

on a vector space of dimension

over a field is a diagonalizable matrix, since its minimal polynomial divides

x^2-x

, which splits into distinct linear factors. Thus there exists a basis in which

has the form

P=I_{r ⊕}0_d-r

where

is the rank of

. Here

I_r

is the identity matrix of size

0_d-r

is the zero matrix of size

d-r

, and

⊕

is the direct sum operator. If the vector space is complex and equipped with an inner product, then there is an orthonormal basis in which the matrix of P is^[10]

P=\begin{bmatrix}1&\sigma₁\ 0&0\end{bmatrix} ⊕ … ⊕ \begin{bmatrix}1&\sigma_k\ 0&0\end{bmatrix} ⊕ I_m ⊕ 0_s.

where

\sigma₁\geq\sigma_2\geq...\geq\sigma_k>0

. The integers

k,s,m

and the real numbers

\sigma_i

are uniquely determined.

2k+s+m=d

. The factor

I_m ⊕ 0_s

corresponds to the maximal invariant subspace on which

acts as an orthogonal projection (so that P itself is orthogonal if and only if

k=0

) and the

\sigma_i

-blocks correspond to the oblique components.

Projections on normed vector spaces

When the underlying vector space

is a (not necessarily finite-dimensional) normed vector space, analytic questions, irrelevant in the finite-dimensional case, need to be considered. Assume now

is a Banach space.

Many of the algebraic results discussed above survive the passage to this context. A given direct sum decomposition of

into complementary subspaces still specifies a projection, and vice versa. If

is the direct sum

X=U ⊕ V

, then the operator defined by

P(u+v)=u

is still a projection with range

and kernel

. It is also clear that

P²=P

. Conversely, if

is projection on

, i.e.

P²=P

, then it is easily verified that

(1-P)²=(1-P)

. In other words,

1-P

is also a projection. The relation

P²=P

implies

1=P+(1-P)

and

is the direct sum

\operatorname{rg}(P) ⊕ \operatorname{rg}(1-P)

However, in contrast to the finite-dimensional case, projections need not be continuous in general. If a subspace

is not closed in the norm topology, then the projection onto

is not continuous. In other words, the range of a continuous projection

must be a closed subspace. Furthermore, the kernel of a continuous projection (in fact, a continuous linear operator in general) is closed. Thus a continuous projection

gives a decomposition of

into two complementary closed subspaces:

X=\operatorname{rg}(P) ⊕ \ker(P)=\ker(1-P) ⊕ \ker(P)

The converse holds also, with an additional assumption. Suppose

is a closed subspace of

. If there exists a closed subspace

such that, then the projection

with range

and kernel

is continuous. This follows from the closed graph theorem. Suppose and . One needs to show that

Px=y

. Since

is closed and, y lies in

, i.e. . Also, . Because

is closed and, we have

x-y\inV

, i.e.

P(x-y)=Px-Py=Px-y=0

, which proves the claim.

The above argument makes use of the assumption that both

and

are closed. In general, given a closed subspace

, there need not exist a complementary closed subspace

, although for Hilbert spaces this can always be done by taking the orthogonal complement. For Banach spaces, a one-dimensional subspace always has a closed complementary subspace. This is an immediate consequence of Hahn–Banach theorem. Let

be the linear span of

. By Hahn–Banach, there exists a bounded linear functional

\varphi

such that . The operator

P(x)=\varphi(x)u

satisfies

P^2=P

, i.e. it is a projection. Boundedness of

\varphi

implies continuity of

and therefore

\ker(P)=\operatorname{rg}(I-P)

is a closed complementary subspace of

Applications and further considerations

Projections (orthogonal and otherwise) play a major role in algorithms for certain linear algebra problems:

QR decomposition (see Householder transformation and Gram–Schmidt decomposition);
Singular value decomposition
Reduction to Hessenberg form (the first step in many eigenvalue algorithms)
Linear regression
Projective elements of matrix algebras are used in the construction of certain K-groups in Operator K-theory

As stated above, projections are a special case of idempotents. Analytically, orthogonal projections are non-commutative generalizations of characteristic functions. Idempotents are used in classifying, for instance, semisimple algebras, while measure theory begins with considering characteristic functions of measurable sets. Therefore, as one can imagine, projections are very often encountered in the context of operator algebras. In particular, a von Neumann algebra is generated by its complete lattice of projections.

Generalizations

More generally, given a map between normed vector spaces

T\colonV\toW,

one can analogously ask for this map to be an isometry on the orthogonal complement of the kernel: that

(\kerT)^\perp\toW

be an isometry (compare Partial isometry); in particular it must be onto. The case of an orthogonal projection is when W is a subspace of V. In Riemannian geometry, this is used in the definition of a Riemannian submersion.

References

- Book: N. . Dunford . J. T.. Linear Operators, Part I: General Theory. Interscience. 1958. Schwartz.
Book: Meyer, Carl D.. Matrix Analysis and Applied Linear Algebra. Society for Industrial and Applied Mathematics. 2000. 978-0-89871-454-8.
Brezinski, Claude: Projection Methods for Systems of Equations, North-Holland, ISBN 0-444-82777-3 (1997).

External links

, from MIT OpenCourseWare
, by Pavel Grinfeld.
Planar Geometric Projections Tutorial – a simple-to-follow tutorial explaining the different types of planar geometric projections.

Notes and References

Meyer, pp 386+387
Book: Matrix Analysis, second edition . Roger A. . Horn . Charles R. . Johnson . 9780521839402 . Cambridge University Press. 2013.
Book: Matrix Analysis, second edition . Roger A. . Horn . Charles R. . Johnson . 9780521839402 . Cambridge University Press. 2013.
Meyer, p. 433
Meyer, p. 431
Meyer, equation (5.13.4)
Meyer, equation (5.13.3)
See also Linear least squares (mathematics) § Properties of the least-squares estimators.
Meyer, equation (7.10.39)
Doković. D. Ž. . Unitary similarity of projectors. Aequationes Mathematicae. 42. 1. 220–224. August 1991. 10.1007/BF01818492. 122704926 .

Projection (linear algebra) explained

Definitions

Projection matrix

Examples

Orthogonal projection

Oblique projection

Properties and classification

Idempotence

Open map

Complementarity of image and kernel

Spectrum

Product of projections

Orthogonal projections

Properties and special cases

Formulas

Oblique projections

A matrix representation formula for a nonzero projection operator

Singular values

Finding projection with an inner product

Canonical forms

Projections on normed vector spaces

Applications and further considerations

Generalizations

See also

References

External links

Notes and References