In mathematics, and in particular linear algebra, the Moore–Penrose inverse of a matrix, often called the pseudoinverse, is the most widely known generalization of the inverse matrix. It was independently described by E. H. Moore in 1920,[1] Arne Bjerhammar in 1951,[2] and Roger Penrose in 1955.[3] Earlier, Erik Ivar Fredholm had introduced the concept of a pseudoinverse of integral operators in 1903. The terms pseudoinverse and generalized inverse are sometimes used as synonyms for the Moore–Penrose inverse of a matrix, but sometimes applied to other elements of algebraic structures which share some but not all properties expected for an inverse element.
A common use of the pseudoinverse is to compute a "best fit" (least squares) approximate solution to a system of linear equations that lacks an exact solution (see below under § Applications).Another use is to find the minimum (Euclidean) norm solution to a system of linear equations with multiple solutions. The pseudoinverse facilitates the statement and proof of results in linear algebra.
The pseudoinverse is defined and unique for all matrices whose entries are real or complex numbers. It can be computed using the singular value decomposition. In the special case where is a normal matrix (for example, a Hermitian matrix), the pseudoinverse annihilates the kernel of and acts as a traditional inverse of on the subspace orthogonal to the kernel.
In the following discussion, the following conventions are adopted.
K=R
A*=A\operatorname{T}
For
A\inKm x
Note that
A+A
AA+
(AA+)2=AA+
(A+A)2=A+A
A+A
AT
A
AA+
A
A
A+A
AA+
AA+
A
(AA+)A=A
A+A
AT
(A+A)A+=A+
The pseudoinverse
A+
A\inKm x
A
A+A=I
A
A
AA+=I
In the more general case, the pseudoinverse can be expressed leveraging the singular value decomposition. Any matrix can be decomposed as
A=UDV*
U,V
D
A+=VD+U*
D+
D
AA+=UU*
A+A=VV*
A
As discussed above, for any matrix there is one and only one pseudoinverse .[4]
A matrix satisfying only the first of the conditions given above, namely , is known as a generalized inverse. If the matrix also satisfies the second condition, namely , it is called a generalized reflexive inverse. Generalized inverses always exist but are not in general unique. Uniqueness is a consequence of the last two conditions.
Proofs for the properties below can be found at .
A+=A-1
\left(A+\right)+=A
\ker\left(A+\right)=\ker\left(A*\right)
\operatorname{ran}\left(A+\right)=\operatorname{ran}\left(A*\right)
The following identity formula can be used to cancel or expand certain subexpressions involving pseudoinverses:Equivalently, substituting
A+
A
A*
A
The computation of the pseudoinverse is reducible to its construction in the Hermitian case. This is possible through the equivalences:
as and are Hermitian.
The equality does not hold in general. Rather, suppose . Then the following are equivalent:[6]
A^+ A BB^* A^* & = BB^* A^*, \\BB^+ A^* A B & = A^* A B.\end
\left(A^+ A BB^*\right)^* & = A^+ A BB^*, \\\left(A^* A BB^+\right)^* & = A^* A BB^+.\end
A^+ A B & = B (AB)^+ AB, \\BB^+ A^* & = A^* A B (AB)^+.\end
The following are sufficient conditions for :
A*A=A+A=In
BB*=BB+=In
A+A=I
BB+=I
B=A*
B=A+
The following is a necessary condition for :
(A+A)(BB+)=(BB+)(A+A)
The fourth sufficient condition yields the equalities
Here is a counterexample where :
P=AA+
Q=A+A
P=P*
Q=Q*
P2=P
Q2=Q
PA=AQ=A
A+P=QA+=A+
I-Q=I-A+A
I-P=I-AA+
The last two properties imply the following identities:
A \left(I-A+A\right)=\left(I-AA+\right)A =0
A*\left(I-AA+\right)=\left(I-A+A\right)A*=0
Another property is the following: if is Hermitian and idempotent (true if and only if it represents an orthogonal projection), then, for any matrix the following equation holds:[7]
This can be proven by defining matrices
C=BA
D=A(BA)+
From the last property it follows that, if is Hermitian and idempotent, for any matrix
Finally, if is an orthogonal projection matrix, then its pseudoinverse trivially coincides with the matrix itself, that is,
A+=A
If we view the matrix as a linear map over the field then can be decomposed as follows. We write for the direct sum, for the orthogonal complement, for the kernel of a map, and for the image of a map. Notice that
Kn=\left(\kerA\right)\perp ⊕ \kerA
Km=\operatorname{ran}A ⊕ \left(\operatorname{ran}A\right)\perp
A:\left(\kerA\right)\perp\to\operatorname{ran}A
\left(\operatorname{ran}A\right)\perp.
In other words: To find for given in, first project orthogonally onto the range of, finding a point in the range. Then form, that is, find those vectors in that sends to . This will be an affine subspace of parallel to the kernel of . The element of this subspace that has the smallest length (that is, is closest to the origin) is the answer we are looking for. It can be found by taking an arbitrary member of and projecting it orthogonally onto the orthogonal complement of the kernel of .
This description is closely related to the minimum-norm solution to a linear system.
The pseudoinverse are limits:(see Tikhonov regularization). These limits exist even if or do not exist.[4]
In contrast to ordinary matrix inversion, the process of taking pseudoinverses is not continuous: if the sequence converges to the matrix (in the maximum norm or Frobenius norm, say), then need not converge to . However, if all the matrices have the same rank as, will converge to .[8]
Let
x\mapstoA(x)
x\mapstoA+(x)
x0
A
x0
A
A+
x0
A:=A(x0)
A+:=
+(x | |
A | |
0) |
Since for invertible matrices the pseudoinverse equals the usual inverse, only examples of non-invertible matrices are considered below.
A=\begin{pmatrix}0&0\ 0&0\end{pmatrix},
A+=\begin{pmatrix}0&0\ 0&0\end{pmatrix}.
A+=A+AA+
A=\begin{pmatrix}1&0\ 1&0\end{pmatrix},
A+=\begin{pmatrix}
1 | |
2 |
&
1 | |
2 |
\ 0&0\end{pmatrix}
Indeed,
AA+=\begin{pmatrix}
1 | |
2 |
&
1 | \ | |
2 |
1 | |
2 |
&
1 | |
2 |
\end{pmatrix}
AA+A=\begin{pmatrix}1&0\ 1&0\end{pmatrix}=A
A+A=\begin{pmatrix}1&0\ 0&0\end{pmatrix}
A+AA+=\begin{pmatrix}
1 | |
2 |
&
1 | |
2 |
\ 0&0\end{pmatrix}=A+
Note that is neither injective nor surjective, and thus the pseudoinverse cannot be computed via
A+=\left(A*A\right)-1A*
A+=A*\left(AA*\right)-1
A*A
AA*
A+
Nonetheless, the pseudoinverse can be computed via SVD observing that
A=\sqrt2\left(
e1+e2 | |
\sqrt2 |
\right)
* | |
e | |
1 |
| ||||
A |
e1\left(
e1+e2 | |
\sqrt2 |
\right)*
A=\begin{pmatrix}1&0\ -1&0\end{pmatrix},
A+=\begin{pmatrix}
1 | |
2 |
&-
1 | |
2 |
\ 0&0\end{pmatrix}.
A=\begin{pmatrix}1&0\ 2&0\end{pmatrix},
A+=\begin{pmatrix}
1 | |
5 |
&
2 | |
5 |
\ 0&0\end{pmatrix}
5=12+22
A=\begin{pmatrix}1&1\ 1&1\end{pmatrix},
A+=\begin{pmatrix}
1 | |
4 |
&
1 | \ | |
4 |
1 | |
4 |
&
1 | |
4 |
\end{pmatrix}.
A=\begin{pmatrix}1&0\ 0&1\ 0&1\end{pmatrix},
A+=\begin{pmatrix}1&0&0\ 0&
1 | |
2 |
&
1 | |
2 |
\end{pmatrix}
For this matrix, the left inverse exists and thus equals
A+
A+A=\begin{pmatrix}1&0\ 0&1\end{pmatrix}.
It is also possible to define a pseudoinverse for scalars and vectors. This amounts to treating these as matrices. The pseudoinverse of a scalar is zero if is zero and the reciprocal of otherwise:
The pseudoinverse of the null (all zero) vector is the transposed null vector. The pseudoinverse of a non-null vector is the conjugate transposed vector divided by its squared magnitude:
The pseudoinverse of a squared diagonal matrix is obtained by taking the reciprocal of the nonzero diagonal elements. Formally, if
D
D=\tildeD ⊕ 0k x
\tildeD>0
D+=\tildeD-1 ⊕ 0k x
A
m x n
Aij=\deltaijai
ai\inK
A+
n x m
Aii ≠ 0\implies
+ | |
A | |
ii |
=1/Aii
If the rank of is identical to its column rank,, (for,) there are linearly independent columns, and is invertible. In this case, an explicit formula is:
It follows that is then a left inverse of :
A+A=In
If the rank of is identical to its row rank,, (for,) there are linearly independent rows, and is invertible. In this case, an explicit formula is:
It follows that is a right inverse of :
AA+=Im
This is a special case of either full column rank or full row rank (treated above). If has orthonormal columns (
A*A=In
AA*=Im
If is normal, that is, it commutes with its conjugate transpose, then its pseudoinverse can be computed by diagonalizing it, mapping all nonzero eigenvalues to their inverses, and mapping zero eigenvalues to zero. A corollary is that commuting with its transpose implies that it commutes with its pseudoinverse.
A (square) matrix is said to be an EP matrix if it commutes with its pseudoinverse. In such cases (and only in such cases), it is possible to obtain the pseudoinverse as a polynomial in . A polynomial
p(t)
A+=p(A)
This is a special case of a normal matrix with eigenvalues 0 and 1. If is an orthogonal projection matrix, that is,
A=A*
A2=A
For a circulant matrix, the singular value decomposition is given by the Fourier transform, that is, the singular values are the Fourier coefficients. Let be the Discrete Fourier Transform (DFT) matrix; then[13]
Let denote the rank of . Then can be (rank) decomposed as
A=BC
A+=C+B+=C*\left(CC*\right)-1\left(B*B\right)-1B*
For
K\in\{R,C\}
Consider the case when is of full column rank, so that
A+=\left(A*A\right)-1A*
A*A=R*R
which may be solved by forward substitution followed by back substitution.
The Cholesky decomposition may be computed without forming explicitly, by alternatively using the QR decomposition of
A=QR
Q
Q*Q=I
so is the Cholesky factor of .
The case of full row rank is treated similarly by using the formula
A+=A*\left(AA*\right)-1
For an arbitrary, one has that
A*A
p(t)
(A*A)+=p(A*A)
A computationally simple and accurate way to compute the pseudoinverse is by using the singular value decomposition.[4] [14] If
A=U\SigmaV*
A+=V\Sigma+U*
The computational cost of this method is dominated by the cost of computing the SVD, which is several times higher than matrix–matrix multiplication, even if a state-of-the art implementation (such as that of LAPACK) is used.
The above procedure shows why taking the pseudoinverse is not a continuous operation: if the original matrix has a singular value 0 (a diagonal entry of the matrix above), then modifying slightly may turn this zero into a tiny positive number, thereby affecting the pseudoinverse dramatically as we now have to take the reciprocal of a tiny number.
Optimized approaches exist for calculating the pseudoinverse of block-structured matrices.
Another method for computing the pseudoinverse (cf. Drazin inverse) uses the recursion
which is sometimes referred to as hyper-power sequence. This recursion produces a sequence converging quadratically to the pseudoinverse of if it is started with an appropriate satisfying
A0A=\left(A0A\right)*
A0=\alphaA*
0<\alpha<
2 | |
2/\sigma | |
1(A) |
A0A=\left(A0A\right)*
A0:=\left(A*A+\deltaI\right)-1A*
For the cases where has full row or column rank, and the inverse of the correlation matrix (for with full row rank or for full column rank) is already known, the pseudoinverse for matrices related to can be computed by applying the Sherman–Morrison–Woodbury formula to update the inverse of the correlation matrix, which may need less work. In particular, if the related matrix differs from the original one by only a changed, added or deleted row or column, incremental algorithms exist that exploit the relationship.[17] [18]
Similarly, it is possible to update the Cholesky factor when a row or column is added, without creating the inverse of the correlation matrix explicitly. However, updating the pseudoinverse in the general rank-deficient case is much more complicated.[19] [20]
High-quality implementations of SVD, QR, and back substitution are available in standard libraries, such as LAPACK. Writing one's own implementation of SVD is a major programming project that requires a significant numerical expertise. In special circumstances, such as parallel computing or embedded computing, however, alternative implementations by QR or even the use of an explicit inverse might be preferable, and custom implementations may be unavoidable.
The Python package NumPy provides a pseudoinverse calculation through its functions matrix.I
and linalg.pinv
; its pinv
uses the SVD-based algorithm. SciPy adds a function scipy.linalg.pinv
that uses a least-squares solver.
The MASS package for R provides a calculation of the Moore–Penrose inverse through the ginv
function.[21] The ginv
function calculates a pseudoinverse using the singular value decomposition provided by the svd
function in the base R package. An alternative is to employ the pinv
function available in the pracma package.
The Octave programming language provides a pseudoinverse through the standard package function pinv
and the pseudo_inverse
method.
In Julia (programming language), the LinearAlgebra package of the standard library provides an implementation of the Moore–Penrose inverse pinv
implemented via singular-value decomposition.[22]
See also: Linear least squares (mathematics).
The pseudoinverse provides a least squares solution to a system of linear equations.[23] For, given a system of linear equations
in general, a vector that solves the system may not exist, or if one does exist, it may not be unique. More specifically, a solution exists if and only if
b
A
A
\left\|Ax-b\right\|2\ge\left\|Az-b\right\|2
z=A+b
\| ⋅ \|2
x=A+b+\left(I-A+A\right)w
This result is easily extended to systems with multiple right-hand sides, when the Euclidean norm is replaced by the Frobenius norm. Let .
\|AX-B\|F\ge\|AZ-B\|F
Z=A+B
\| ⋅ \|F
If the linear system
has any solutions, they are all given by[25]
for arbitrary vector . Solution(s) exist if and only if
AA+b=b
For linear systems
Ax=b,
\|x\|2
Ax=b
z=A+b
\|z\|2\le\|x\|2
This result is easily extended to systems with multiple right-hand sides, when the Euclidean norm is replaced by the Frobenius norm. Let .
AX=B
Z=A+B
\|Z\|F\le\|X\|F
Using the pseudoinverse and a matrix norm, one can define a condition number for any matrix:
A large condition number implies that the problem of finding least-squares solutions to the corresponding system of linear equations is ill-conditioned in the sense that small errors in the entries of can lead to huge errors in the entries of the solution.
In order to solve more general least-squares problems, one can define Moore–Penrose inverses for all continuous linear operators between two Hilbert spaces and, using the same four conditions as in our definition above. It turns out that not every continuous linear operator has a continuous linear pseudoinverse in this sense.[26] Those that do are precisely the ones whose range is closed in .
A notion of pseudoinverse exists for matrices over an arbitrary field equipped with an arbitrary involutive automorphism. In this more general setting, a given matrix doesn't always have a pseudoinverse. The necessary and sufficient condition for a pseudoinverse to exist is that
\operatorname{rank}(A)=\operatorname{rank}\left(A*A\right)=\operatorname{rank}\left(AA*\right)
A*
A
A=\begin{bmatrix}1&i\end{bmatrix}\operatorname{T}
\operatorname{rank}\left(AA\operatorname{T}\right)=1
\operatorname{rank}\left(A\operatorname{T}A\right)=0
In abstract algebra, a Moore–Penrose inverse may be defined on a