Eigenvalue perturbation explained

See main article: Perturbation theory. In mathematics, an eigenvalue perturbation problem is that of finding the eigenvectors and eigenvalues of a system

Ax=λx

that is perturbed from one with known eigenvectors and eigenvalues

A₀x_0=λ_0x₀

. This is useful for studying how sensitive the original system's eigenvectors and eigenvalues

x_0i,λ_0i,i=1,...n

are to changes in the system. This type of analysis was popularized by Lord Rayleigh, in his investigation of harmonic vibrations of a string perturbed by small inhomogeneities.^[1]

The derivations in this article are essentially self-contained and can be found in many texts on numerical linear algebra or numerical functional analysis.This article is focused on the case of the perturbation of a simple eigenvalue (see in multiplicity of eigenvalues).

Why generalized eigenvalues?

In the entry applications of eigenvalues and eigenvectors we find numerous scientific fields in which eigenvalues are used to obtain solutions. Generalized eigenvalue problems are less widespread but are a key in the study of vibrations.They are useful when we use the Galerkin method or Rayleigh-Ritz method to find approximate solutions of partial differential equations modeling vibrations of structures such as strings and plates; the paper of Courant (1943) ^[2] is fundamental. The Finite element method is a widespread particular case.

In classical mechanics, we may find generalized eigenvalues when we look for vibrations of multiple degrees of freedom systems close to equilibrium; the kinetic energy provides the mass matrix

, the potential strain energy provides the rigidity matrix

.To get details, for example see the first section of this article of Weinstein (1941, in French)^[3]

With both methods, we obtain a system of differential equations or Matrix differential equation

M\ddotx+B

•

+Kx=0

with the mass matrix

, the damping matrix

and the rigidity matrix

. If we neglect the damping effect, we use

B=0

, we can look for a solution of the following form

x=eⁱu

; we obtain that

and

\omega²

are solution of the generalized eigenvalue problem

-\omega²Mu+Ku=0

Setting of perturbation for a generalized eigenvalue problem

Suppose we have solutions to the generalized eigenvalue problem,

K₀x_0i=λ_0iM₀x_0i. (0)

where

K₀

and

M₀

are matrices. That is, we know the eigenvalues and eigenvectors for . It is also required that the eigenvalues are distinct.

Now suppose we want to change the matrices by a small amount. That is, we want to find the eigenvalues and eigenvectors of

Kx_i=λ_iMx_i (1)

where

\begin{align} K&=K₀+\deltaK\\ M&=M₀+\deltaM \end{align}

with the perturbations

\deltaK

and

\deltaM

much smaller than

and

respectively. Then we expect the new eigenvalues and eigenvectors to be similar to the original, plus small perturbations:

\begin{align} λ_i&=λ_0i+\deltaλ_i\\ x_i&=x_0i+\deltax_i\end{align}

Steps

We assume that the matrices are symmetric and positive definite, and assume we have scaled the eigenvectors such that

	\top
x
	0j

M_0x_0i=\delta_ij,

	T
x
	i

Mx_j=\delta_ij (2)

where is the Kronecker delta. Now we want to solve the equation

Kx_i-λ_iMx_i=0.

In this article we restrict the study to first order perturbation.

First order expansion of the equation

Substituting in (1), we get

(K_0+\deltaK)(x_0i+\deltax_i)=\left(λ_0i+\deltaλ_i\right)\left(M₀₊\deltaM\right)\left(x_0i+\deltax_i\right),

which expands to

\begin{align} K_0x_0i&+\deltaKx_0i+K_0\deltax_i+\deltaK\deltax_i=\\[6pt] &λ_0iM_0x_0i+λ_0iM_0\deltax_i+λ_0i\deltaMx_0i+\deltaλ_iM_0x_0i+\\ & λ_0i\deltaM\deltax_i+\deltaλ_i\deltaMx_0i+\deltaλ_iM_0\deltax_i+\deltaλ_i\deltaM\deltax_{i.
\end{align}}

Canceling from (0) (

K₀x_0i=λ_0iM₀x_0i

) leaves

\begin{align} \deltaKx_0i+&K_0\deltax_i+\deltaK\deltax_i=λ_0iM_0\deltax_i+λ_0i\deltaMx_0i+\deltaλ_iM_0x_0i+\ &λ_0i\deltaM\deltax_i+\deltaλ_i\deltaMx_0i+\deltaλ_iM_0\deltax_i+\deltaλ_i\deltaM\deltax_{i.
\end{align}}

Removing the higher-order terms, this simplifies to

K₀\deltax_i+\deltaKx_0i=λ_0iM₀\deltax_i+λ_0i\deltaMx_0i+\deltaλ_iM_0x_0i. (3)

In other words,

\deltaλ_i

no longer denotes the exact variation of the eigenvalue but its first order approximation.

As the matrix is symmetric, the unperturbed eigenvectors are

orthogonal and so we use them as a basis for the perturbed eigenvectors. That is, we want to construct

\deltax_i=

	N
\sum
	j=1

\varepsilon_ijx_0j (4)

with

\varepsilon_ij

	T
=x
	0j

M\deltax_i

,where the are small constants that are to be determined.

In the same way, substituting in (2), and removing higher order terms, we get

\deltax_jM₀x_0i+x_0jM₀\deltax_i+x_0j\deltaM₀x_0i=0 {(5)}

The derivation can go on with two forks.

First fork: get first eigenvalue perturbation

Eigenvalue perturbation

We start with (3)

K₀\deltax_i+\deltaKx_0i=λ_0iM₀\deltax_i+λ_0i\deltaMx_0i+\deltaλ_iM_0x_0i;

we left multiply with

	T
x
	0i

and use (2) as well as its first order variation (5); we get

	T
x
	0i

\deltaKx_0i=λ_0i

	T\delta
x
	0i

Mx_0i+\deltaλ_i

\deltaλ_i=x

	T

	0i

\deltaKx_0i-λ_0i

	T\delta
x
	0i

Mx_0i

We notice that it is the first order perturbation of the generalized Rayleigh quotient with fixed

x_0i

R(K,M;x_0i

	T
)=x
	0i

Kx_0i

	TMx
/x
	0i

	TMx
withx
	0i

Moreover, for

M=I

, the formula

\deltaλ_i=x_0i^T\deltaKx_0i

should be compared with Bauer-Fike theorem which provides a bound for eigenvalue perturbation.

Eigenvector perturbation

We left multiply (3) with

	T
x
	0j

for

j ≠ i

and get

	TK
x
	0

\deltax_i+

	T
x
	0j

\deltaKx_0i=λ_0i

	T
x
	0j

M₀\deltax_i+λ_0i

	T\delta
x
	0j

Mx_0i+\deltaλ_i

	TM
x
	0x

_0i.

We use

	T
x
	0j

K=λ_0j

	TM
x
	0j

and

	TM
x
	0x

_0i=0,

for

j ≠ i

λ_0j

	TM
x
	0

\deltax_i+

	T
x
	0j

\deltaKx_0i=λ_0i

	T
x
	0j

M₀\deltax_i+λ_0i

	T\delta
x
	0j

Mx_0i.

(λ_0j-λ_0i)

	TM
x
	0

\deltax_i+

	T
x
	0j

\deltaKx_0i=λ_0i

	T\delta
x
	0j

Mx_0i.

As the eigenvalues are assumed to be simple, for

j ≠ i

\epsilon_ij

	TM
=x
	0

\deltax_i=

-x

\deltaKx_0i+λ_0i

	T\delta
x
	0j

Mx_0i

(λ_0j-λ_0i)

,i=1,...N;j=1,...N;j ≠ i.

Moreover (5) (the first order variation of (2)) yields

2\epsilon_ii=2

	T
x
	0i

M₀\deltax_i=-x

	T

	0i

\deltaMx_0i.

We have obtained all the components of

\deltax_i

Second fork: Straightforward manipulations

Substituting (4) into (3) and rearranging gives

\begin{align} K₀

	N
\sum
	j=1

\varepsilon_ijx_0j+\deltaKx_0i&=λ_0iM₀

	N
\sum
	j=1

\varepsilon_ijx_0j+λ_0i\deltaMx_0i+\deltaλ_iM_0x_0i&&(5)

	N
\\ \sum
	j=1

\varepsilon_ijK₀x_0j+\deltaKx_0i&=λ_0iM₀

	N
\sum
	j=1

\varepsilon_ijx_0j+λ_0i\deltaMx_0i+\deltaλ_iM₀x_0i&&\ (applyingK₀tothesum

	N
)\\ \sum
	j=1

\varepsilon_ijλ_0jM₀x_0j+\deltaKx_0i&=λ_0iM₀

	N
\sum
	j=1

\varepsilon_ijx_0j+λ_0i\deltaMx_0i+\deltaλ_iM₀x_0i&&(usingEq.(1)) \end{align}

Because the eigenvectors are -orthogonal when is positive definite, we can remove the summations by left-multiplying by

	\top
x
	0i

	\top
x
	0i

\varepsilon_iiλ_0iM₀x_0i+

	\top
x
	0i

\deltaKx_0i=λ_0i

	\top
x
	0i

M₀\varepsilon_iix_0i+λ_0i

	\top
x
	0i

\deltaMx_0i+\deltaλ_ix

	\top

	0i

M₀x_0i.

By use of equation (1) again:

	\top
x
	0i

K₀\varepsilon_iix_0i+

	\top
x
	0i

\deltaKx_0i=λ_0i

	\top
x
	0i

M_0\varepsilon_iix_0i+λ_0i

	\top
x
	0i

\deltaMx_0i+\deltaλ_ix

	\top

	0i

M₀x_0i. (6)

The two terms containing are equal because left-multiplying (1) by

	\top
x
	0i

gives

	\topK
x
	0x

_0i=λ_0i

	\top
x
	0i

M₀x_0i.

Canceling those terms in (6) leaves

	\top
x
	0i

\deltaKx_0i=λ_0i

	\top
x
	0i

\deltaMx_0i+\deltaλ_i

	\top
x
	0i

M_0x_0i.

Rearranging gives

\deltaλ_i=

	\top
x		\left(\deltaK-λ_0i\deltaM\right)x_0i
	0i

	\topM
x		x_0i
	0

But by (2), this denominator is equal to 1. Thus

\deltaλ_i=

	\top
x
	0i

\left(\deltaK-λ_0i\deltaM\right)x_0i.

Then, as

λ_i ≠ λ_k

for

i ≠ k

(assumption simple eigenvalues) by left-multiplying equation (5) by

	\top
x
	0k

\varepsilon_ik=

	\top
x		\left(\deltaK-λ_0i\deltaM\right)x_0i
	0k

λ_0i-λ_0k

, i ≠ k.

Or by changing the name of the indices:

\varepsilon_ij=

	\top
x		\left(\deltaK-λ_0i\deltaM\right)x_0i
	0j

λ_0i-λ_0j

, i ≠ j.

To find, use the fact that:

	\top
x
	i

Mx_i=1

implies:

\varepsilon_ii

	\top
=-\tfrac{1}{2}x
	0i

\deltaMx_0i.

Summary of the first order perturbation result

In the case where all the matrices are Hermitian positive definite and all the eigenvalues are distinct,

\begin{align} λ_i&=λ_0i+

	\top
x
	0i

\left(\deltaK-λ_0i\deltaM\right)x_0i\ x_i&=x_0i\left(1-\tfrac{1}{2}

	\top
x
	0i

\deltaMx_0i\right)+

	N
\sum
	j=1\atopj ≠ i

	\top
x		\left(\deltaK-λ_0i\deltaM\right)x_0i
	0j

λ_0i-λ_0j

x_0j\end{align}

for infinitesimal

\deltaK

and

\deltaM

(the higher order terms in (3) being neglected).

So far, we have not proved that these higher order terms may be neglected. This point may be derived using the implicit function theorem; in next section, we summarize the use of this theorem in order to obtain a first order expansion.

Theoretical derivation

Perturbation of an implicit function.

In the next paragraph, we shall use the Implicit function theorem (Statement of the theorem); we notice that for a continuously differentiable function

f:\R^n+m\to\R^m, f:(x,y)\mapstof(x,y)

, with an invertible Jacobian matrix

J_f,b(x_0,y₀₎

, from a point

(x_0,y₀₎

solution of

f(x_0,y₀₎₌₀

, we get solutions of

f(x,y)=0

with

close to

x₀

in the form

y=g(x)

where

is a continuously differentiable function ; moreover the Jacobian marix of

is provided by the linear system

J_f,y(x,g(x))J_g,x(x)+J_f,x(x,g(x))=0 (6)

.As soon as the hypothesis of the theorem is satisfied, the Jacobian matrix of

may be computed with a first order expansion of

f(x₀₊\deltax,y_0+\deltay)=0

, we get

J_f,x(x,g(x))\deltax+J_f,y(x,g(x))\deltay=0

as

\deltay=J_g,x(x)\deltax

, it is equivalent to equation

(6)

Eigenvalue perturbation: a theoretical basis.

We use the previous paragraph (Perturbation of an implicit function) with somewhat different notations suited to eigenvalue perturbation; we introduce

\tilde{f}:

	2n²
\R

x \Rⁿ⁺¹\to\Rⁿ⁺¹

, with

\tilde{f}(K,M,λ,x)=\binom{f(K,M,λ,x)}{f_n+1(x)}

with

f(K,M,λ,x)=Kx-λx,f_n+1(M,x)=x^TMx-1

. In order to use the Implicit function theorem, we study the invertibility of the Jacobian

J_\tilde{f;λ,x}(K,M;λ_0i,x_0i)

with

J_\tilde{f;λ,x}(K,M;λ_i,x_i)(\deltaλ,\deltax)=\binom{-Mx_i}{0}\deltaλ+\binom{K-λM}{2

	T
x
	i

M}\deltax_i

. Indeed, the solution of

J_\tilde{f;λ_0i,x_0i}(K,M;λ_0i,x_0i)(\deltaλ_i,\deltax_i)=

\binom{y}{y_n+1

} may be derived with computations similar to the derivation of the expansion.

$\delta \lambda_i= -x_^T y, \; \text (\lambda_-\lambda_)x_^T M \delta x_i=x_j^T y, j=1, \dots, n, j \neq i\;;$

	T
orx
	0j

M\deltax_i=x

	T

	j

y/(λ_0i-λ_0j),and

	TM
2x
	0i

\deltax_i=y_n+1

When

λ_i

is a simple eigenvalue, as the eigenvectors

x_0j,j=1,...,n

form an orthonormal basis, for any right-hand side, we have obtained one solution therefore, the Jacobian is invertible.

The implicit function theorem provides a continuously differentiable function

(K,M)\mapsto(λ_i(K,M),x_i(K,M))

hence the expansion with little o notation:

λ_i=λ_0i+\deltaλ_i+o(\|\deltaK\|+\|\deltaM\|)

x_i=x_0i+\deltax_i+o(\|\deltaK\|+\|\deltaM\|)

.with

\deltaλ_i=x

	T

	0i

\deltaKx_0i-λ_0i

	T\delta
x
	0i

Mx_0i;

\deltax_i=x

	TM

	0

\deltax_ix_0jwith

	TM
x
	0

\deltax_i=

-x

\deltaKx_0i+λ_0i

	T\delta
x
	0j

Mx_0i

(λ_0j-λ_0i)

,i=1,...n;j=1,...n;j ≠ i.

This is the first order expansion of the perturbed eigenvalues and eigenvectors. which is proved.

Results of sensitivity analysis with respect to the entries of the matrices

The results

This means it is possible to efficiently do a sensitivity analysis on as a function of changes in the entries of the matrices. (Recall that the matrices are symmetric and so changing will also change, hence the term.)

\begin{align}	\partialλ_i
	\partialK_(k\ell)

	\partial
	\partialK_(k\ell)

\left(λ_0i+

	\top
x
	0i

\left(\deltaK-λ_0i\deltaM\right)x_0i\right)=x_0i(k)x_0i(\ell)\left(2-\delta_k\ell\right)\\

	\partialλ_i
	\partialM_(k\ell)

	\partial
	\partialM_(k\ell)

\left(λ_0i+

	\top
x
	0i

\left(\deltaK-λ_0i\deltaM\right)x_0i\right)=-λ_ix_0i(k)x_0i(\ell)\left(2-\delta_k\ell\right). \end{align}

Similarly

\begin{align}	\partialx_i
	\partialK_(k\ell)

	N
\sum
	j=1\atopj ≠ i

	x_0j(k)x_0i(\ell)\left(2-\delta_k\ell\right)
	λ_0i-λ_0j

x_0j\\

	\partialx_i
	\partialM_(k\ell)

&=-x_0i

	x_0i(k)x_0i(\ell)
	2

(2-\delta_k\ell)-

	N
\sum
	j=1\atopj ≠ i

	λ_0ix_0j(k)x_0i(\ell)
	λ_0i-λ_0j

x_0j\left(2-\delta_k\ell\right). \end{align}

Eigenvalue sensitivity, a small example

A simple case is

K=\begin{bmatrix}2&b\ b&0\end{bmatrix}

; however you can compute eigenvalues and eigenvectors with the help of online tools such as https://wims.univ-cotedazur.fr/wims/wims.cgi (see introduction in Wikipedia WIMS) or using Sage SageMath. You get the smallest eigenvalue

λ=-\left[\sqrt{b^2+1}+1\right]

and an explicit computation

	\partialλ	=
	\partialb

	-x
	\sqrt{x²⁺¹

}; more over, an associated eigenvector is

\tilde

	2+1}+1))]
x
	0=[x,-(\sqrt{x

; it is not an unitary vector; so

x₀₁x₀₂=\tildex₀₁\tildex₀₂/\|\tildex₀\|²

; we get

\|\tildex₀\|²⁼²\sqrt{x^2+1}(\sqrt{x^2+1}+1)

and

\tildex₀₁\tildex₀₂=-x(\sqrt{x^2+1}+1)

; hence

x₀₁x₀₂=-

	x
	2\sqrt{x²⁺¹

}; for this example, we have checked that

	\partialλ
	\partialb

=2x₀₁x₀₂

\deltaλ=2x₀₁x₀₂\deltab

Existence of eigenvectors

Note that in the above example we assumed that both the unperturbed and the perturbed systems involved symmetric matrices, which guaranteed the existence of

linearly independent eigenvectors. An eigenvalue problem involving non-symmetric matrices is not guaranteed to have

linearly independent eigenvectors, though a sufficient condition is that

and

be simultaneously diagonalizable.

The case of repeated eigenvalues

A technical report of Rellich ^[4] for perturbation of eigenvalue problems provides several examples. The elementary examples are in chapter 2. The report may be downloaded from archive.org. We draw an example in which the eigenvectors have a nasty behavior.

Example 1

Consider the following matrix

B(\epsilon)=\epsilon\begin{bmatrix} \cos(2/\epsilon)&,\sin(2/\epsilon)\\ \sin(2/\epsilon)&,s\cos(2/\epsilon)\end{bmatrix}

and

A(\epsilon)=I-

	-1/\epsilon²
e

A(0)=I.

For

\epsilon ≠ 0

, the matrix

A(\epsilon)

has eigenvectors

\Phi^{1=[\cos(1/\epsilon),}-\sin(1/\epsilon)]^T;\Phi^{2=[\sin(1/\epsilon),}-\cos(1/\epsilon)]^T

belonging to eigenvalues

λ₁₌

	-1/\epsilon²⁾
1-e

,λ₂₌

	-1/\epsilon²⁾
1+e

.Since

λ₁ ≠ λ₂

for

\epsilon ≠ 0

u^j(\epsilon), j=1,2,

are any normalized eigenvectors belonging to

λ_{j(\epsilon),j=1,2}

respectivelythen

u^j=e

	\alpha_j(\epsilon)

\Phi^j(\epsilon)

where

\alpha_j,j=1,2

are real for

\epsilon ≠ 0.

It is obviously impossible to define

\alpha_1(\epsilon)

, say, in such a way that

u¹(\epsilon)

tends to a limit as

\epsilon → 0,

because

|u^{1(\epsilon)|=|\cos(1/\epsilon)|}

has no limit as

\epsilon → 0.

Note in this example that

A_jk(\epsilon)

is not only continuous but also has continuous derivatives of all orders.Rellich draws the following important consequence.<< Since in general the individual eigenvectors do not depend continuously on the perturbation parameter even though the operator

A(\epsilon)

does, it is necessary to work, not with an eigenvector, but rather with the space spanned by all the eigenvectors belonging to the same eigenvalue. >>

Example 2

This example is less nasty that the previous one. Suppose

[K_0]

is the 2x2 identity matrix, any vector is an eigenvector; then

u_0=[1,1]^T/\sqrt{2}

is one possible eigenvector. But if one makes a small perturbation, such as

[K]=[K_0]+\begin{bmatrix}\epsilon&0\\0&0\end{bmatrix}

Then the eigenvectors are

v_1=[1,0]^T

and

v_2=[0,1]^T

; they are constant with respect to

\epsilon

so that

\|u_0-v₁\|

is constant and does not go to zero.

References

Book: Rayleigh, J. W. S. . The theory of Sound . Macmillan . 1894 . 1-152-06023-6 . 2nd . 1 . London . 114–118.
Courant. R. . Variational Methods for the Solution of Problems of Equilibrium and Vibrations. . 1943. Bulletin of the American Mathematical Society. 49. 1–23. 10.1090/S0002-9904-1943-07818-4 . free.
Weinstein . A. . 1941. Les vibrations et le calcul des variations.. 2 . 36–55 . Portugaliae Mathematica . 2 . French .
Book: Rellich, F. . Perturbation theory of eigenvalue problems . 1954. CRC Press.

Book: Trefethen, Lloyd N. . 1997 . Numerical Linear Algebra . SIAM . Philadelphia, PA. 0-89871-361-7.

Rellich, F., & Berkowitz, J. . 1969 . Perturbation theory of eigenvalue problems. . Portugaliae Mathematica . 2 . 1 . 36–55 . CRC_Press. .

Eigenvalue perturbation explained

Why generalized eigenvalues?

Setting of perturbation for a generalized eigenvalue problem

Steps

First order expansion of the equation

First fork: get first eigenvalue perturbation

Eigenvalue perturbation

Eigenvector perturbation

Second fork: Straightforward manipulations

Summary of the first order perturbation result

Theoretical derivation

Perturbation of an implicit function.

Eigenvalue perturbation: a theoretical basis.

Results of sensitivity analysis with respect to the entries of the matrices

The results

Eigenvalue sensitivity, a small example

Existence of eigenvectors

The case of repeated eigenvalues

Example 1

Example 2

See also

References

Further reading

Books

Report

Journal papers