In linear algebra, the Cayley–Hamilton theorem (named after the mathematicians Arthur Cayley and William Rowan Hamilton) states that every square matrix over a commutative ring (such as the real or complex numbers or the integers) satisfies its own characteristic equation.
The characteristic polynomial of an matrix is defined as
pA(λ)=\det(λIn-A)
(λIn-A)
(λIn-A)
A
λ
pA(A)
pA(A)=0;
pA
A.
One use for the Cayley–Hamilton theorem is that it allows to be expressed as a linear combination of the lower matrix powers of : When the ring is a field, the Cayley–Hamilton theorem is equivalent to the statement that the minimal polynomial of a square matrix divides its characteristic polynomial.
A special case of the theorem was first proved by Hamilton in 1853 in terms of inverses of linear functions of quaternions. This corresponds to the special case of certain real or complex matrices. Cayley in 1858 stated the result for and smaller matrices, but only published a proof for the case. As for matrices, Cayley stated “..., I have not thought it necessary to undertake the labor of a formal proof of the theorem in the general case of a matrix of any degree”. The general case was first proved by Ferdinand Frobenius in 1878.
For a matrix, the characteristic polynomial is given by, and so is trivial.
As a concrete example, letIts characteristic polynomial is given by
The Cayley–Hamilton theorem claims that, if we definethenWe can verify by computation that indeed,
For a generic matrix,
the characteristic polynomial is given by, so the Cayley–Hamilton theorem states thatwhich is indeed always the case, evident by working out the entries of .
For a general invertible matrix, i.e., one with nonzero determinant, −1 can thus be written as an order polynomial expression in : As indicated, the Cayley–Hamilton theorem amounts to the identity
The coefficients are given by the elementary symmetric polynomials of the eigenvalues of . Using Newton identities, the elementary symmetric polynomials can in turn be expressed in terms of power sum symmetric polynomials of the eigenvalues: where is the trace of the matrix . Thus, we can express in terms of the trace of powers of .
In general, the formula for the coefficients is given in terms of complete exponential Bell polynomials as[1]
In particular, the determinant of equals . Thus, the determinant can be written as the trace identity:
Likewise, the characteristic polynomial can be written asand, by multiplying both sides by (note), one is led to an expression for the inverse of as a trace identity,
Another method for obtaining these coefficients for a general matrix, provided no root be zero, relies on the following alternative expression for the determinant,Hence, by virtue of the Mercator series,where the exponential only needs be expanded to order, since is of order, the net negative powers of automatically vanishing by the C–H theorem. (Again, this requires a ring containing the rational numbers.) Differentiation of this expression with respect to allows one to express the coefficients of the characteristic polynomial for general as determinants of matrices,[2]
Using these to specify the coefficients of the characteristic polynomial of a matrix yields
The coefficient gives the determinant of the matrix, minus its trace, while its inverse is given by
It is apparent from the general formula for cn−k, expressed in terms of Bell polynomials, that the expressions
always give the coefficients of and of in the characteristic polynomial of any matrix, respectively. So, for a matrix, the statement of the Cayley–Hamilton theorem can also be written aswhere the right-hand side designates a matrix with all entries reduced to zero. Likewise, this determinant in the case, is nowThis expression gives the negative of coefficient of in the general case, as seen below.
Similarly, one can write for a matrix,
where, now, the determinant is,
and so on for larger matrices. The increasingly complex expressions for the coefficients is deducible from Newton's identities or the Faddeev–LeVerrier algorithm.
The Cayley–Hamilton theorem always provides a relationship between the powers of (though not always the simplest one), which allows one to simplify expressions involving such powers, and evaluate them without having to compute the power n or any higher powers of .
As an example, for
A=\begin{pmatrix}1&2\\3&4\end{pmatrix}
Then, to calculate, observeLikewise,
Notice that we have been able to write the matrix power as the sum of two terms. In fact, matrix power of any order can be written as a matrix polynomial of degree at most, where is the size of a square matrix. This is an instance where Cayley–Hamilton theorem can be used to express a matrix function, which we will discuss below systematically.
Given an analytic functionand the characteristic polynomial of degree of an matrix, the function can be expressed using long division as where is some quotient polynomial and is a remainder polynomial such that .
By the Cayley–Hamilton theorem, replacing by the matrix gives, so one has
Thus, the analytic function of the matrix can be expressed as a matrix polynomial of degree less than .
Let the remainder polynomial beSince, evaluating the function at the eigenvalues of yieldsThis amounts to a system of linear equations, which can be solved to determine the coefficients . Thus, one has
When the eigenvalues are repeated, that is for some, two or more equations are identical; and hence the linear equations cannot be solved uniquely. For such cases, for an eigenvalue with multiplicity, the first derivatives of vanish at the eigenvalue. This leads to the extra linearly independent solutions which, combined with others, yield the required equations to solve for .
Finding a polynomial that passes through the points is essentially an interpolation problem, and can be solved using Lagrange or Newton interpolation techniques, leading to Sylvester's formula.
For example, suppose the task is to find the polynomial representation of
The characteristic polynomial is, and the eigenvalues are . Let . Evaluating at the eigenvalues, one obtains two linear equations, and .
Solving the equations yields and . Thus, it follows that
If, instead, the function were, then the coefficients would have been and ; hence
As a further example, when consideringthen the characteristic polynomial is, and the eigenvalues are .
As before, evaluating the function at the eigenvalues gives us the linear equations and ; the solution of which gives, and . Thus, for this case,which is a rotation matrix.
Standard examples of such usage is the exponential map from the Lie algebra of a matrix Lie group into the group. It is given by a matrix exponential,Such expressions have long been known for,where the are the Pauli matrices and for,which is Rodrigues' rotation formula. For the notation, see 3D rotation group#A note on Lie algebras.
More recently, expressions have appeared for other groups, like the Lorentz group, and, as well as . The group is the conformal group of spacetime, its simply connected cover (to be precise, the simply connected cover of the connected component of). The expressions obtained apply to the standard representation of these groups. They require knowledge of (some of) the eigenvalues of the matrix to exponentiate. For (and hence for), closed expressions have been obtained for all irreducible representations, i.e. of any spin.
The Cayley–Hamilton theorem is an effective tool for computing the minimal polynomial of algebraic integers. For example, given a finite extension
Q[\alpha1,\ldots,\alphak]
Q
\alpha\inQ[\alpha1,\ldots,\alphak]
n1 | |
\alpha | |
1 |
nk | |
… \alpha | |
k |
\alpha
Q
A
A
The Cayley–Hamilton theorem is an immediate consequence of the existence of the Jordan normal form for matrices over algebraically closed fields, see . In this section, direct proofs are presented.
As the examples above show, obtaining the statement of the Cayley–Hamilton theorem for an matrix
requires two steps: first the coefficients of the characteristic polynomial are determined by development as a polynomial in of the determinant
and then these coefficients are used in a linear combination of powers of that is equated to the zero matrix:
The left-hand side can be worked out to an matrix whose entries are (enormous) polynomial expressions in the set of entries of, so the Cayley–Hamilton theorem states that each of these expressions equals . For any fixed value of, these identities can be obtained by tedious but straightforward algebraic manipulations. None of these computations, however, can show why the Cayley–Hamilton theorem should be valid for matrices of all possible sizes, so a uniform proof for all is needed.
If a vector of size is an eigenvector of with eigenvalue, in other words if, thenwhich is the zero vector since (the eigenvalues of are precisely the roots of). This holds for all possible eigenvalues, so the two matrices equated by the theorem certainly give the same (null) result when applied to any eigenvector. Now if admits a basis of eigenvectors, in other words if is diagonalizable, then the Cayley–Hamilton theorem must hold for, since two matrices that give the same values when applied to each element of a basis must be equal.
Consider now the function
e\colonMn\toMn
e(A)=pA(A)
A
D
e\colon
n2 | |
\C |
\to\C
n2 | |
n2
and since the set
D
\Q
\R
While this provides a valid proof, the argument is not very satisfactory, since the identities represented by the theorem do not in any way depend on the nature of the matrix (diagonalizable or not), nor on the kind of entries allowed (for matrices with real entries the diagonalizable ones do not form a dense set, and it seems strange one would have to consider complex matrices to see that the Cayley–Hamilton theorem holds for them). We shall therefore now consider only arguments that prove the theorem directly for any matrix using algebraic manipulations only; these also have the benefit of working for matrices with entries in any commutative ring.
There is a great variety of such proofs of the Cayley–Hamilton theorem, of which several will be given here. They vary in the amount of abstract algebraic notions required to understand the proof. The simplest proofs use just those notions needed to formulate the theorem (matrices, polynomials with numeric entries, determinants), but involve technical computations that render somewhat mysterious the fact that they lead precisely to the correct conclusion. It is possible to avoid such details, but at the price of involving more subtle algebraic notions: polynomials with coefficients in a non-commutative ring, or matrices with unusual kinds of entries.
All proofs below use the notion of the adjugate matrix of an matrix, the transpose of its cofactor matrix. This is a matrix whose coefficients are given by polynomial expressions in the coefficients of (in fact, by certain determinants), in such a way that the following fundamental relations hold,These relations are a direct consequence of the basic properties of determinants: evaluation of the entry of the matrix product on the left gives the expansion by column of the determinant of the matrix obtained from by replacing column by a copy of column, which is if and zero otherwise; the matrix product on the right is similar, but for expansions by rows.
Being a consequence of just algebraic expression manipulation, these relations are valid for matrices with entries in any commutative ring (commutativity must be assumed for determinants to be defined in the first place). This is important to note here, because these relations will be applied below for matrices with non-numeric entries such as polynomials.
This proof uses just the kind of objects needed to formulate the Cayley–Hamilton theorem: matrices with polynomials as entries. The matrix whose determinant is the characteristic polynomial of is such a matrix, and since polynomials form a commutative ring, it has an adjugateThen, according to the right-hand fundamental relation of the adjugate, one has
Since is also a matrix with polynomials in as entries, one can, for each, collect the coefficients of in each entry to form a matrix of numbers, such that one has(The way the entries of are defined makes clear that no powers higher than occur). While this looks like a polynomial with matrices as coefficients, we shall not consider such a notion; it is just a way to write a matrix with polynomial entries as a linear combination of constant matrices, and the coefficient has been written to the left of the matrix to stress this point of view.
Now, one can expand the matrix product in our equation by bilinearity:
Writingone obtains an equality of two matrices with polynomial entries, written as linear combinations of constant matrices with powers of as coefficients.
Such an equality can hold only if in any matrix position the entry that is multiplied by a given power is the same on both sides; it follows that the constant matrices with coefficient in both expressions must be equal. Writing these equations then for from down to 0, one finds
Finally, multiply the equation of the coefficients of from the left by, and sum up:
The left-hand sides form a telescoping sum and cancel completely; the right-hand sides add up to
p(A)
This proof is similar to the first one, but tries to give meaning to the notion of polynomial with matrix coefficients that was suggested by the expressions occurring in that proof. This requires considerable care, since it is somewhat unusual to consider polynomials with coefficients in a non-commutative ring, and not all reasoning that is valid for commutative polynomials can be applied in this setting.
Notably, while arithmetic of polynomials over a commutative ring models the arithmetic of polynomial functions, this is not the case over a non-commutative ring (in fact there is no obvious notion of polynomial function in this case that is closed under multiplication). So when considering polynomials in with matrix coefficients, the variable must not be thought of as an "unknown", but as a formal symbol that is to be manipulated according to given rules; in particular one cannot just set to a specific value.
Let
M(n,R)
tIn-A
M(n,R[t])
By collecting like powers of, such matrices can be written as "polynomials" in with constant matrices as coefficients; write
M(n,R)[t]
M(n,R[t])
Thus, the identityfrom the first proof can be viewed as one involving a multiplication of elements in
M(n,R)[t]
At this point, it is tempting to simply set equal to the matrix, which makes the first factor on the left equal to the zero matrix, and the right hand side equal to ; however, this is not an allowed operation when coefficients do not commute. It is possible to define a "right-evaluation map", which replaces each by the matrix power of, where one stipulates that the power is always to be multiplied on the right to the corresponding coefficient. But this map is not a ring homomorphism: the right-evaluation of a product differs in general from the product of the right-evaluations. This is so because multiplication of polynomials with matrix coefficients does not model multiplication of expressions containing unknowns: a product
MtiNtj=(M ⋅ N)ti+j
One can work around this difficulty in the particular situation at hand, since the above right-evaluation map does become a ring homomorphism if the matrix is in the center of the ring of coefficients, so that it commutes with all the coefficients of the polynomials (the argument proving this is straightforward, exactly because commuting with coefficients is now justified after evaluation).
Now, is not always in the center of, but we may replace with a smaller ring provided it contains all the coefficients of the polynomials in question:
In
Bi
This centralizer obviously contains
In
Bi
Equating the coefficients shows that for each, we have as desired. Having found the proper setting in which is indeed a homomorphism of rings, one can complete the proof as suggested above:This completes the proof.
In the first proof, one was able to determine the coefficients of based on the right-hand fundamental relation for the adjugate only. In fact the first equations derived can be interpreted as determining the quotient of the Euclidean division of the polynomial on the left by the monic polynomial, while the final equation expresses the fact that the remainder is zero. This division is performed in the ring of polynomials with matrix coefficients. Indeed, even over a non-commutative ring, Euclidean division by a monic polynomial is defined, and always produces a unique quotient and remainder with the same degree condition as in the commutative case, provided it is specified at which side one wishes to be a factor (here that is to the left).
To see that quotient and remainder are unique (which is the important part of the statement here), it suffices to write
PQ+r=PQ'+r'
P(Q-Q')=r'-r
But the dividend and divisor used here both lie in the subring, where is the subring of the matrix ring generated by : the -linear span of all powers of . Therefore, the Euclidean division can in fact be performed within that commutative polynomial ring, and of course it then gives the same quotient and remainder 0 as in the larger ring; in particular this shows that in fact lies in .
But, in this commutative setting, it is valid to set to in the equation
in other words, to apply the evaluation map
which is a ring homomorphism, giving
just like in the second proof, as desired.
In addition to proving the theorem, the above argument tells us that the coefficients of are polynomials in, while from the second proof we only knew that they lie in the centralizer of ; in general is a larger subring than, and not necessarily commutative. In particular the constant term lies in . Since is an arbitrary square matrix, this proves that can always be expressed as a polynomial in (with coefficients that depend on .
In fact, the equations found in the first proof allow successively expressing
Bn-1,\ldots,B1,B0
Note that this identity also implies the statement of the Cayley–Hamilton theorem: one may move to the right hand side, multiply the resulting equation (on the left or on the right) by, and use the fact that
See also: Faddeev–LeVerrier algorithm.
As was mentioned above, the matrix p(A) in statement of the theorem is obtained by first evaluating the determinant and then substituting the matrix A for t; doing that substitution into the matrix
tIn-A
Ai,j
Ai,j
Ai,jIn
e1,\ldots,en
Ai,j
In
\varphiIn-A
\det(\varphiIn-A)
In this form, the following proof can be obtained from that of (which in fact is the more general statement related to the Nakayama lemma; one takes for the ideal in that proposition the whole ring). The fact that is the matrix of in the basis means thatOne can interpret these as components of one equation in, whose members can be written using the matrix-vector product that is defined as usual, but with individual entries and in being "multiplied" by forming
\psi(v)
E\inVn
\varphiIn-A
\varphi
\operatorname{tr} | |
I | |
n-A |
One additional fact that follows from this proof is that the matrix whose characteristic polynomial is taken need not be identical to the value substituted into that polynomial; it suffices that be an endomorphism of satisfying the initial equations
for some sequence of elements that generate (which space might have smaller dimension than, or in case the ring is not a field it might not be a free module at all).
One persistent elementary but incorrect argument for the theorem is to "simply" take the definitionand substitute for, obtaining
There are many ways to see why this argument is wrong. First, in the Cayley–Hamilton theorem, is an matrix. However, the right hand side of the above equation is the value of a determinant, which is a scalar. So they cannot be equated unless (i.e. is just a scalar). Second, in the expression
\det(λIn-A)
λIn-A
If one substitutes the entire matrix for in those positions, one obtains
in which the "matrix" expression is simply not a valid one. Note, however, that if scalar multiples of identity matricesinstead of scalars are subtracted in the above, i.e. if the substitution is performed as
then the determinant is indeed zero, but the expanded matrix in question does not evaluate to
AIn-A
p(A)=\det(AIn-A)=0
Actually, if such an argument holds, it should also hold when other multilinear forms instead of determinant is used. For instance, if we consider the permanent function and define
q(λ)=\operatorname{perm}(λIn-A)
So, for the matrix in the previous example,
Yet one can verify that
One of the proofs for Cayley–Hamilton theorem above bears some similarity to the argument that
p(A)=\det(AIn-A)=0
AIn
Basic properties of Hasse–Schmidt derivations on the exterior algebra of some -module (supposed to be free and of finite rank) have been used by to prove the Cayley–Hamilton theorem. See also .
A proof based on developing the Leibniz formula for the characteristic polynomial was given by Straubing[4] and a generalization was given using trace monoid theory of Foata and Cartier.
The above proofs show that the Cayley–Hamilton theorem holds for matrices with entries in any commutative ring, and that will hold whenever is an endomorphism of an -module generated by elements that satisfies
This more general version of the theorem is the source of the celebrated Nakayama lemma in commutative algebra and algebraic geometry.
The Cayley-Hamilton theorem also holds for matrices over the quaternions, a noncommutative ring.[5]
. Lectures on Quaternions. Dublin. 1853. William Rowan Hamilton.
. The Theory of Matrices in Numerical Analysis . Dover Books on Mathematics. 2006. Alston Scott Householder . 978-0486449722.
There also exists an equivalent, related recursive algorithm introduced by Urbain Le Verrier and Dmitry Konstantinovich Faddeev—the Faddeev–LeVerrier algorithm, which reads
(see, e.g., .) Observe as the recursion terminates.See the algebraic proof in the following section, which relies on the modes of the adjugate, .Specifically,
(λI-A)B=Ip(λ)
, and the above recursions, in turn.