Eigenvalue perturbation explained

See main article: Perturbation theory. In mathematics, an eigenvalue perturbation problem is that of finding the eigenvectors and eigenvalues of a system

Axx

that is perturbed from one with known eigenvectors and eigenvalues

A0x0=λ0x0

. This is useful for studying how sensitive the original system's eigenvectors and eigenvalues

x0i,λ0i,i=1,...n

are to changes in the system. This type of analysis was popularized by Lord Rayleigh, in his investigation of harmonic vibrations of a string perturbed by small inhomogeneities.[1]

The derivations in this article are essentially self-contained and can be found in many texts on numerical linear algebra or numerical functional analysis.This article is focused on the case of the perturbation of a simple eigenvalue (see in multiplicity of eigenvalues).

Why generalized eigenvalues?

In the entry applications of eigenvalues and eigenvectors we find numerous scientific fields in which eigenvalues are used to obtain solutions. Generalized eigenvalue problems are less widespread but are a key in the study of vibrations.They are useful when we use the Galerkin method or Rayleigh-Ritz method to find approximate solutions of partial differential equations modeling vibrations of structures such as strings and plates; the paper of Courant (1943) [2] is fundamental. The Finite element method is a widespread particular case.

In classical mechanics, we may find generalized eigenvalues when we look for vibrations of multiple degrees of freedom systems close to equilibrium; the kinetic energy provides the mass matrix

M

, the potential strain energy provides the rigidity matrix

K

.To get details, for example see the first section of this article of Weinstein (1941, in French)[3]

With both methods, we obtain a system of differential equations or Matrix differential equation

M\ddotx+B

x

+Kx=0

with the mass matrix

M

, the damping matrix

B

and the rigidity matrix

K

. If we neglect the damping effect, we use

B=0

, we can look for a solution of the following form

x=eiu

; we obtain that

u

and

\omega2

are solution of the generalized eigenvalue problem

-\omega2Mu+Ku=0

Setting of perturbation for a generalized eigenvalue problem

Suppose we have solutions to the generalized eigenvalue problem,

K0x0i=λ0iM0x0i.    (0)

where

K0

and

M0

are matrices. That is, we know the eigenvalues and eigenvectors for . It is also required that the eigenvalues are distinct.

Now suppose we want to change the matrices by a small amount. That is, we want to find the eigenvalues and eigenvectors of

Kxi=λiMxi    (1)

where

\begin{align} K&=K0+\deltaK\\ M&=M0+\deltaM \end{align}

with the perturbations

\deltaK

and

\deltaM

much smaller than

K

and

M

respectively. Then we expect the new eigenvalues and eigenvectors to be similar to the original, plus small perturbations:

\begin{align} λi&=λ0i+\deltaλi\\ xi&=x0i+\deltaxi\end{align}

Steps

We assume that the matrices are symmetric and positive definite, and assume we have scaled the eigenvectors such that

\top
x
0j

M0x0i=\deltaij,

T
x
i

Mxj=\deltaij    (2)

where is the Kronecker delta. Now we want to solve the equation

Kxi-λiMxi=0.

In this article we restrict the study to first order perturbation.

First order expansion of the equation

Substituting in (1), we get

(K0+\deltaK)(x0i+\deltaxi)=\left(λ0i+\deltaλi\right)\left(M0+\deltaM\right)\left(x0i+\deltaxi\right),

which expands to

\begin{align} K0x0i&+\deltaKx0i+K0\deltaxi+\deltaK\deltaxi=\\[6pt] &λ0iM0x0i0iM0\deltaxi+λ0i\deltaMx0i+\deltaλiM0x0i+\\ &λ0i\deltaM\deltaxi+\deltaλi\deltaMx0i+\deltaλiM0\deltaxi+\deltaλi\deltaM\deltaxi. \end{align}

Canceling from (0) (

K0x0i=λ0iM0x0i

) leaves

\begin{align} \deltaKx0i+&K0\deltaxi+\deltaK\deltaxi=λ0iM0\deltaxi+λ0i\deltaMx0i+\deltaλiM0x0i+\&λ0i\deltaM\deltaxi+\deltaλi\deltaMx0i+\deltaλiM0\deltaxi+\deltaλi\deltaM\deltaxi. \end{align}

Removing the higher-order terms, this simplifies to

K0\deltaxi+\deltaKx0i=λ0iM0\deltaxi+λ0i\deltaMx0i+\deltaλiM0x0i.    (3)

In other words,

\deltaλi

no longer denotes the exact variation of the eigenvalue but its first order approximation.

As the matrix is symmetric, the unperturbed eigenvectors are

M

orthogonal and so we use them as a basis for the perturbed eigenvectors. That is, we want to construct

\deltaxi=

N
\sum
j=1

\varepsilonijx0j    (4)

with

\varepsilonij

T
=x
0j

M\deltaxi

,where the are small constants that are to be determined.

In the same way, substituting in (2), and removing higher order terms, we get

\deltaxjM0x0i+x0jM0\deltaxi+x0j\deltaM0x0i=0{(5)}

The derivation can go on with two forks.

First fork: get first eigenvalue perturbation

Eigenvalue perturbation

We start with (3)

K0\deltaxi+\deltaKx0i=λ0iM0\deltaxi+λ0i\deltaMx0i+\deltaλiM0x0i;

we left multiply with
T
x
0i
and use (2) as well as its first order variation (5); we get
T
x
0i

\deltaKx0i=λ0i

T\delta
x
0i

Mx0i+\deltaλi

or

\deltaλi=x

T
0i

\deltaKx0i0i

T\delta
x
0i

Mx0i

We notice that it is the first order perturbation of the generalized Rayleigh quotient with fixed

x0i

:

R(K,M;x0i

T
)=x
0i

Kx0i

TMx
/x
0i

,

TMx
withx
0i

=1

Moreover, for

M=I

, the formula

\deltaλi=x0iT\deltaKx0i

should be compared with Bauer-Fike theorem which provides a bound for eigenvalue perturbation.
Eigenvector perturbation

We left multiply (3) with

T
x
0j
for

ji

and get
TK
x
0

\deltaxi+

T
x
0j

\deltaKx0i=λ0i

T
x
0j

M0\deltaxi+λ0i

T\delta
x
0j

Mx0i+\deltaλi

TM
x
0x

0i.

We use
T
x
0j

K0j

TM
x
0j

and

TM
x
0x

0i=0,

for

ji

.

λ0j

TM
x
0

\deltaxi+

T
x
0j

\deltaKx0i=λ0i

T
x
0j

M0\deltaxi+λ0i

T\delta
x
0j

Mx0i.

or

(λ0j0i)

TM
x
0

\deltaxi+

T
x
0j

\deltaKx0i=λ0i

T\delta
x
0j

Mx0i.

As the eigenvalues are assumed to be simple, for

ji

\epsilonij

TM
=x
0

\deltaxi=

T
-x\deltaKx0i+λ0i
T\delta
x
0j
Mx0i
0j
(λ0j0i)

,i=1,...N;j=1,...N;ji.

Moreover (5) (the first order variation of (2)) yields

2\epsilonii=2

T
x
0i

M0\deltaxi=-x

T
0i

\deltaMx0i.

We have obtained all the components of

\deltaxi

.

Second fork: Straightforward manipulations

Substituting (4) into (3) and rearranging gives

\begin{align} K0

N
\sum
j=1

\varepsilonijx0j+\deltaKx0i&=λ0iM0

N
\sum
j=1

\varepsilonijx0j+λ0i\deltaMx0i+\deltaλiM0x0i&&(5)

N
\\ \sum
j=1

\varepsilonijK0x0j+\deltaKx0i&=λ0iM0

N
\sum
j=1

\varepsilonijx0j+λ0i\deltaMx0i+\deltaλiM0x0i&&\(applyingK0tothesum

N
)\\ \sum
j=1

\varepsilonijλ0jM0x0j+\deltaKx0i&=λ0iM0

N
\sum
j=1

\varepsilonijx0j+λ0i\deltaMx0i+\deltaλiM0x0i&&(usingEq.(1)) \end{align}

Because the eigenvectors are -orthogonal when is positive definite, we can remove the summations by left-multiplying by

\top
x
0i
:
\top
x
0i

\varepsiloniiλ0iM0x0i+

\top
x
0i

\deltaKx0i=λ0i

\top
x
0i

M0\varepsiloniix0i+λ0i

\top
x
0i

\deltaMx0i+\deltaλix

\top
0i

M0x0i.

By use of equation (1) again:

\top
x
0i

K0\varepsiloniix0i+

\top
x
0i

\deltaKx0i=λ0i

\top
x
0i

M0\varepsiloniix0i+λ0i

\top
x
0i

\deltaMx0i+\deltaλix

\top
0i

M0x0i.    (6)

The two terms containing are equal because left-multiplying (1) by

\top
x
0i
gives
\topK
x
0x

0i=λ0i

\top
x
0i

M0x0i.

Canceling those terms in (6) leaves

\top
x
0i

\deltaKx0i=λ0i

\top
x
0i

\deltaMx0i+\deltaλi

\top
x
0i

M0x0i.

Rearranging gives

\deltaλi=

\top
x\left(\deltaK-λ0i\deltaM\right)x0i
0i
\topM
xx0i
0

But by (2), this denominator is equal to 1. Thus

\deltaλi=

\top
x
0i

\left(\deltaK-λ0i\deltaM\right)x0i.

Then, as

λiλk

for

ik

(assumption simple eigenvalues) by left-multiplying equation (5) by
\top
x
0k
:

\varepsilonik=

\top
x\left(\deltaK-λ0i\deltaM\right)x0i
0k
λ0i0k

,    ik.

Or by changing the name of the indices:

\varepsilonij=

\top
x\left(\deltaK-λ0i\deltaM\right)x0i
0j
λ0i0j

,    ij.

To find, use the fact that:

\top
x
i

Mxi=1

implies:

\varepsilonii

\top
=-\tfrac{1}{2}x
0i

\deltaMx0i.

Summary of the first order perturbation result

In the case where all the matrices are Hermitian positive definite and all the eigenvalues are distinct,

\begin{align} λi&=λ0i+

\top
x
0i

\left(\deltaK-λ0i\deltaM\right)x0i\ xi&=x0i\left(1-\tfrac{1}{2}

\top
x
0i

\deltaMx0i\right)+

N
\sum
j=1\atopji
\top
x\left(\deltaK-λ0i\deltaM\right)x0i
0j
λ0i0j

x0j\end{align}

for infinitesimal

\deltaK

and

\deltaM

(the higher order terms in (3) being neglected).

So far, we have not proved that these higher order terms may be neglected. This point may be derived using the implicit function theorem; in next section, we summarize the use of this theorem in order to obtain a first order expansion.

Theoretical derivation

Perturbation of an implicit function.

In the next paragraph, we shall use the Implicit function theorem (Statement of the theorem); we notice that for a continuously differentiable function

f:\Rn+m\to\Rm,f:(x,y)\mapstof(x,y)

, with an invertible Jacobian matrix

Jf,b(x0,y0)

, from a point

(x0,y0)

solution of

f(x0,y0)=0

, we get solutions of

f(x,y)=0

with

x

close to

x0

in the form

y=g(x)

where

g

is a continuously differentiable function ; moreover the Jacobian marix of

g

is provided by the linear system

Jf,y(x,g(x))Jg,x(x)+Jf,x(x,g(x))=0(6)

.As soon as the hypothesis of the theorem is satisfied, the Jacobian matrix of

g

may be computed with a first order expansion of

f(x0+\deltax,y0+\deltay)=0

, we get

Jf,x(x,g(x))\deltax+Jf,y(x,g(x))\deltay=0

as

\deltay=Jg,x(x)\deltax

, it is equivalent to equation

(6)

.

Eigenvalue perturbation: a theoretical basis.

We use the previous paragraph (Perturbation of an implicit function) with somewhat different notations suited to eigenvalue perturbation; we introduce

\tilde{f}:

2n2
\R

x \Rn+1\to\Rn+1

, with

\tilde{f}(K,M,λ,x)=\binom{f(K,M,λ,x)}{fn+1(x)}

with

f(K,M,λ,x)=Kxx,fn+1(M,x)=xTMx-1

. In order to use the Implicit function theorem, we study the invertibility of the Jacobian

J\tilde{f;λ,x}(K,M;λ0i,x0i)

with

J\tilde{f;λ,x}(K,M;λi,xi)(\deltaλ,\deltax)=\binom{-Mxi}{0}\deltaλ+\binom{KM}{2

T
x
i

M}\deltaxi

. Indeed, the solution of

J\tilde{f;λ0i,x0i}(K,M;λ0i,x0i)(\deltaλi,\deltaxi)=

\binom{y}{yn+1

} may be derived with computations similar to the derivation of the expansion.

\delta \lambda_i= -x_^T y, \; \text (\lambda_-\lambda_)x_^T M \delta x_i=x_j^T y, j=1, \dots, n, j \neq i\;;

T
orx
0j

M\deltaxi=x

T
j

y/(λ0i0j),and

TM
2x
0i

\deltaxi=yn+1

When

λi

is a simple eigenvalue, as the eigenvectors

x0j,j=1,...,n

form an orthonormal basis, for any right-hand side, we have obtained one solution therefore, the Jacobian is invertible.

The implicit function theorem provides a continuously differentiable function

(K,M)\mapsto(λi(K,M),xi(K,M))

hence the expansion with little o notation:

λi0i+\deltaλi+o(\|\deltaK\|+\|\deltaM\|)

xi=x0i+\deltaxi+o(\|\deltaK\|+\|\deltaM\|)

.with

\deltaλi=x

T
0i

\deltaKx0i0i

T\delta
x
0i

Mx0i;

\deltaxi=x

TM
0

\deltaxix0jwith

TM
x
0

\deltaxi=

T
-x\deltaKx0i+λ0i
T\delta
x
0j
Mx0i
0j
(λ0j0i)

,i=1,...n;j=1,...n;ji.

This is the first order expansion of the perturbed eigenvalues and eigenvectors. which is proved.

Results of sensitivity analysis with respect to the entries of the matrices

The results

This means it is possible to efficiently do a sensitivity analysis on as a function of changes in the entries of the matrices. (Recall that the matrices are symmetric and so changing will also change, hence the term.)

\begin{align} \partialλi
\partialK(k\ell)

&=

\partial
\partialK(k\ell)

\left(λ0i+

\top
x
0i

\left(\deltaK-λ0i\deltaM\right)x0i\right)=x0i(k)x0i(\ell)\left(2-\deltak\ell\right)\\

\partialλi
\partialM(k\ell)

&=

\partial
\partialM(k\ell)

\left(λ0i+

\top
x
0i

\left(\deltaK-λ0i\deltaM\right)x0i\right)=-λix0i(k)x0i(\ell)\left(2-\deltak\ell\right). \end{align}

Similarly

\begin{align} \partialxi
\partialK(k\ell)

&=

N
\sum
j=1\atopji
x0j(k)x0i(\ell)\left(2-\deltak\ell\right)
λ0i0j

x0j\\

\partialxi
\partialM(k\ell)

&=-x0i

x0i(k)x0i(\ell)
2

(2-\deltak\ell)-

N
\sum
j=1\atopji
λ0ix0j(k)x0i(\ell)
λ0i0j

x0j\left(2-\deltak\ell\right). \end{align}

Eigenvalue sensitivity, a small example

A simple case is

K=\begin{bmatrix}2&b\b&0\end{bmatrix}

; however you can compute eigenvalues and eigenvectors with the help of online tools such as https://wims.univ-cotedazur.fr/wims/wims.cgi (see introduction in Wikipedia WIMS) or using Sage SageMath. You get the smallest eigenvalue

λ=-\left[\sqrt{b2+1}+1\right]

and an explicit computation
\partialλ=
\partialb
-x
\sqrt{x2+1
}; more over, an associated eigenvector is

\tilde

2+1}+1))]
x
0=[x,-(\sqrt{x

T

; it is not an unitary vector; so

x01x02=\tildex01\tildex02/\|\tildex0\|2

; we get

\|\tildex0\|2=2\sqrt{x2+1}(\sqrt{x2+1}+1)

and

\tildex01\tildex02=-x(\sqrt{x2+1}+1)

; hence

x01x02=-

x
2\sqrt{x2+1
}; for this example, we have checked that
\partialλ
\partialb

=2x01x02

or

\deltaλ=2x01x02\deltab

.

Existence of eigenvectors

Note that in the above example we assumed that both the unperturbed and the perturbed systems involved symmetric matrices, which guaranteed the existence of

N

linearly independent eigenvectors. An eigenvalue problem involving non-symmetric matrices is not guaranteed to have

N

linearly independent eigenvectors, though a sufficient condition is that

K

and

M

be simultaneously diagonalizable.

The case of repeated eigenvalues

A technical report of Rellich [4] for perturbation of eigenvalue problems provides several examples. The elementary examples are in chapter 2. The report may be downloaded from archive.org. We draw an example in which the eigenvectors have a nasty behavior.

Example 1

Consider the following matrix

B(\epsilon)=\epsilon\begin{bmatrix} \cos(2/\epsilon)&,\sin(2/\epsilon)\\ \sin(2/\epsilon)&,s\cos(2/\epsilon)\end{bmatrix}

and

A(\epsilon)=I-

-1/\epsilon2
e

B;

A(0)=I.

For

\epsilon0

, the matrix

A(\epsilon)

has eigenvectors

\Phi1=[\cos(1/\epsilon),-\sin(1/\epsilon)]T;\Phi2=[\sin(1/\epsilon),-\cos(1/\epsilon)]T

belonging to eigenvalues

λ1=

-1/\epsilon2)
1-e

,λ2=

-1/\epsilon2)
1+e

.Since

λ1λ2

for

\epsilon0

if

uj(\epsilon), j=1,2,

are any normalized eigenvectors belonging to

λj(\epsilon),j=1,2

respectivelythen

uj=e

\alphaj(\epsilon)

\Phij(\epsilon)

where

\alphaj,j=1,2

are real for

\epsilon0.

It is obviously impossible to define

\alpha1(\epsilon)

, say, in such a way that

u1(\epsilon)

tends to a limit as

\epsilon0,

because

|u1(\epsilon)|=|\cos(1/\epsilon)|

has no limit as

\epsilon0.

Note in this example that

Ajk(\epsilon)

is not only continuous but also has continuous derivatives of all orders.Rellich draws the following important consequence.<< Since in general the individual eigenvectors do not depend continuously on the perturbation parameter even though the operator

A(\epsilon)

does, it is necessary to work, not with an eigenvector, but rather with the space spanned by all the eigenvectors belonging to the same eigenvalue. >>

Example 2

This example is less nasty that the previous one. Suppose

[K0]

is the 2x2 identity matrix, any vector is an eigenvector; then

u0=[1,1]T/\sqrt{2}

is one possible eigenvector. But if one makes a small perturbation, such as

[K]=[K0]+\begin{bmatrix}\epsilon&0\\0&0\end{bmatrix}

Then the eigenvectors are

v1=[1,0]T

and

v2=[0,1]T

; they are constant with respect to

\epsilon

so that

\|u0-v1\|

is constant and does not go to zero.

See also

References

  1. Book: Rayleigh, J. W. S. . The theory of Sound . Macmillan . 1894 . 1-152-06023-6 . 2nd . 1 . London . 114–118.
  2. Courant. R. . Variational Methods for the Solution of Problems of Equilibrium and Vibrations. . 1943. Bulletin of the American Mathematical Society. 49. 1–23. 10.1090/S0002-9904-1943-07818-4 . free.
  3. Weinstein . A. . 1941. Les vibrations et le calcul des variations.. 2 . 36–55 . Portugaliae Mathematica . 2 . French .
  4. Book: Rellich, F. . Perturbation theory of eigenvalue problems . 1954. CRC Press.

Rellich, F., & Berkowitz, J. . 1969 . Perturbation theory of eigenvalue problems. . Portugaliae Mathematica . 2 . 1 . 36–55 . CRC_Press. .

Further reading

Books

Report

Journal papers