Rayleigh–Ritz method explained

The Rayleigh–Ritz method is a direct numerical method of approximating eigenvalues, originated in the context of solving physical boundary value problems and named after Lord Rayleigh and Walther Ritz.

In this method, an infinite-dimensional linear operator is approximated by a finite-dimensional compression, on which we can use an eigenvalue algorithm.

It is used in all applications that involve approximating eigenvalues and eigenvectors, often under different names. In quantum mechanics, where a system of particles is described using a Hamiltonian, the Ritz method uses trial wave functions to approximate the ground state eigenfunction with the lowest energy. In the finite element method context, mathematically the same algorithm is commonly called the Ritz-Galerkin method. The Rayleigh–Ritz method or Ritz method terminology is typical in mechanical and structural engineering to approximate the eigenmodes and resonant frequencies of a structure.

Naming and attribution

The name of the method and its origin story have been debated by historians.^[1] ^[2] It has been called Ritz method after Walther Ritz, since the numerical procedure has been published by Walther Ritz in 1908-1909. According to A. W. Leissa, Lord Rayleigh wrote a paper congratulating Ritz on his work in 1911, but stating that he himself had used Ritz's method in many places in his book and in another publication. This statement, although later disputed, and the fact that the method in the trivial case of a single vector results in the Rayleigh quotient make the case for the name Rayleigh–Ritz method. According to S. Ilanko,^[2] citing Richard Courant, both Lord Rayleigh and Walther Ritz independently conceived the idea of utilizing the equivalence between boundary value problems of partial differential equations on the one hand and problems of the calculus of variations on the other hand for numerical calculation of the solutions, by substituting for the variational problems simpler approximating extremum problems in which a finite number of parameters need to be determined. Ironically for the debate, the modern justification of the algorithm drops the calculus of variations in favor of the simpler and more general approach of orthogonal projection as in Galerkin method named after Boris Galerkin, thus leading also to the Ritz-Galerkin method naming.

Method

Let

be a linear operator on a Hilbert space

l{H}

, with inner product

( ⋅ , ⋅ )

. Now consider a finite set of functions

l{L}=\{\varphi_1,...,\varphi_n\}

. Depending on the application these functions may be:

A subset of the orthonormal basis of the original operator;^[3]
A space of splines (as in the Galerkin method);^[4]
A set of functions which approximate the eigenfunctions of the operator.^[5]

One could use the orthonormal basis generated from the eigenfunctions of the operator, which will produce diagonal approximating matrices, but in this case we would have already had to calculate the spectrum.

We now approximate

T_l{L

}, which is defined as the matrix with entries

$(T_)_ = (T \varphi_i, \varphi_j).$

and solve the eigenvalue problem

T_l{L

}u = \lambda u. It can be shown that the matrix

T_l{L

} is the compression of

l{L}

For differential operators (such as Sturm-Liouville operators), the inner product

( ⋅ , ⋅ )

can be replaced by the weak formulation

l{A}(⋅, ⋅ )

.^[6]

If a subset of the orthonormal basis was used to find the matrix, the eigenvectors of

T_l{L

} will be linear combinations of orthonormal basis functions, and as a result they will be approximations of the eigenvectors of

.^[7]

Properties

Spectral pollution

It is possible for the Rayleigh–Ritz method to produce values which do not converge to actual values in the spectrum of the operator as the truncation gets large. These values are known as spectral pollution.^[8] In some cases (such as for the Schrödinger equation), there is no approximation which both includes all eigenvalues of the equation, and contains no pollution.^[9]

The spectrum of the compression (and thus pollution) is bounded by the numerical range of the operator; in many cases it is bounded by a subset of the numerical range known as the essential numerical range.^[10] ^[11]

For matrix eigenvalue problems

In numerical linear algebra, the Rayleigh–Ritz method is commonly^[12] applied to approximate an eigenvalue problem $A \mathbf = \lambda \mathbf$ for the matrix

A\inC^N

of size

using a projected matrix of a smaller size

m<N

, generated from a given matrix

V\inC^N

with orthonormal columns. The matrix version of the algorithm is the most simple:

Compute the

m x m

matrix

V^*AV

, where

V^*

denotes the complex-conjugate transpose of

Solve the eigenvalue problem

V^*AVy_i=\mu_iy_i

Compute the Ritz vectors

\tilde{x

}_i = V \mathbf_i and the Ritz value

\tilde{λ}_i=\mu_i

Output approximations

(\tilde{λ}_i,\tilde{x

}_i), called the Ritz pairs, to eigenvalues and eigenvectors of the original matrix

If the subspace with the orthonormal basis given by the columns of the matrix

V\inC^N

contains

k\leqm

vectors that are close to eigenvectors of the matrix

, the Rayleigh–Ritz method above finds

Ritz vectors that well approximate these eigenvectors. The easily computable quantity

\|A\tilde{x

}_i - \tilde_i \tilde_i\| determines the accuracy of such an approximation for every Ritz pair.

In the easiest case

m=1

, the

N x m

matrix

turns into a unit column-vector

, the

m x m

matrix

V^*AV

is a scalar that is equal to the Rayleigh quotient

\rho(v)=v^*Av/v^*v

, the only

i=1

solution to the eigenvalue problem is

y_i=1

and

\mu_i=\rho(v)

, and the only one Ritz vector is

itself. Thus, the Rayleigh–Ritz method turns into computing of the Rayleigh quotient if

m=1

Another useful connection to the Rayleigh quotient is that

\mu_i=\rho(v_i)

for every Ritz pair

(\tilde{λ}_i,\tilde{x

}_i), allowing to derive some properties of Ritz values

\mu_i

from the corresponding theory for the Rayleigh quotient. For example, if

is a Hermitian matrix, its Rayleigh quotient (and thus its every Ritz value) is real and takes values within the closed interval of the smallest and largest eigenvalues of

Example

The matrix $A = \begin 2 & 0 & 0 \\ 0 & 2 & 1 \\ 0 & 1 & 2\end$ has eigenvalues

1,2,3

and the corresponding eigenvectors

\mathbf x_ = \begin 0 \\ 1 \\ -1 \end, \quad\mathbf x_ = \begin 1 \\ 0 \\ 0 \end, \quad\mathbf x_ = \begin 0 \\ 1 \\ 1 \end.

Let us take

V = \begin 0 & 0 \\ 1 & 0 \\ 0 & 1\end,

then

V^* A V = \begin 2 & 1 \\ 1 & 2\end

with eigenvalues

1,3

and the corresponding eigenvectors

\mathbf y_ = \begin 1 \\ -1 \end, \quad\mathbf y_ = \begin 1 \\ 1 \end,

so that the Ritz values are

1,3

and the Ritz vectors are

\mathbf \tilde_ = \begin 0 \\ 1 \\ -1 \end, \quad\mathbf \tilde_ = \begin 0 \\ 1 \\ 1 \end.

We observe that each one of the Ritz vectors is exactly one of the eigenvectors of

for the given

as well as the Ritz values give exactly two of the three eigenvalues of

. A mathematical explanation for the exact approximation is based on the fact that the column space of the matrix

happens to be exactly the same as the subspace spanned by the two eigenvectors

x_λ=1

and

x_λ=3

in this example.

For matrix singular value problems

Truncated singular value decomposition (SVD) in numerical linear algebra can also use the Rayleigh–Ritz method to find approximations to left and right singular vectors of the matrix

M\inC^M

of size

M x N

in given subspaces by turning the singular value problem into an eigenvalue problem.

Using the normal matrix

The definition of the singular value

\sigma

and the corresponding left and right singular vectors is

Mv=\sigmau

and

M^*u=\sigmav

. Having found one set (left of right) of approximate singular vectors and singular values by applying naively the Rayleigh–Ritz method to the Hermitian normal matrix

M^*M\inC^N

MM^*\inC^M

, whichever one is smaller size, one could determine the other set of left of right singular vectors simply by dividing by the singular values, i.e.,

u=Mv/\sigma

and

v=M^*u/\sigma

. However, the division is unstable or fails for small or zero singular values.

An alternative approach, e.g., defining the normal matrix as

A=M^*M\inC^N

of size

N x N

, takes advantage of the fact that for a given

N x m

matrix

W\inC^N

with orthonormal columns the eigenvalue problem of the Rayleigh–Ritz method for the

m x m

matrix

W^* A W = W^* M^* M W = (M W)^* M W

can be interpreted as a singular value problem for the

N x m

matrix

. This interpretation allows simple simultaneous calculation of both left and right approximate singular vectors as follows.

Compute the

N x m

matrix

MW=U\SigmaV_h,

with

N x m

matrix

m x m

diagonal matrix

\Sigma

, and

m x m

matrix

V_h

Compute the matrices of the Ritz left

U=U

and right

V_h=V_hW^*

singular vectors.

Output approximations

U,\Sigma,V_h

, called the Ritz singular triplets, to selected singular values and the corresponding left and right singular vectors of the original matrix

representing an approximate Truncated singular value decomposition (SVD) with left singular vectors restricted to the column-space of the matrix

The algorithm can be used as a post-processing step where the matrix

is an output of an eigenvalue solver, e.g., such as LOBPCG, approximating numerically selected eigenvectors of the normal matrix

A=M^*M

Example

The matrix $M = \begin 1 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 4 \\ 0 & 0 & 0 & 0\end$ has its normal matrix $A = M^* M = \begin 1 & 0 & 0 & 0 \\ 0 & 4 & 0 & 0 \\ 0 & 0 & 9 & 0 \\ 0 & 0 & 0 & 16 \\\end,$ singular values

1,2,3,4

and the corresponding thin SVD

A =\begin 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0\end\begin 4 & 0 & 0 & 0 \\ 0 & 3 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 1\end\begin 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0\end,

where the columns of the first multiplier from the complete set of the left singular vectors of the matrix

, the diagonal entries of the middle term are the singular values, and the columns of the last multiplier transposed (although the transposition does not change it)

\begin 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0\end^*\quad = \quad\begin 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0\end

are the corresponding right singular vectors.

Let us take $W = \begin 1 / \sqrt & 1 / \sqrt \\ 1 / \sqrt & -1 / \sqrt \\ 0 & 0 \\ 0 & 0\end$ with the column-space that is spanned by the two exact right singular vectors $\begin0 & 1 \\1 & 0 \\0 & 0 \\0 & 0\end$ corresponding to the singular values 1 and 2.

MW=U{{\Sigma}}V_h

with

\mathbf = \begin 0 & 1 \\ 1 & 0 \\ 0 & 0 \\ 0 & 0 \\ 0 & 0\end,\quad\Sigma = \begin 2 & 0 \\ 0 & 1\end,\quad\mathbf _h = \begin 1 / \sqrt & -1 / \sqrt \\ 1 / \sqrt & 1 / \sqrt\end.

Thus we already obtain the singular values 2 and 1 from

\Sigma

and from

the corresponding two left singular vectors

[0,1,0,0,0]^*

and

[1,0,0,0,0]^*

, which span the column-space of the matrix

, explaining why the approximations are exact for the given

Finally, step 3 computes the matrix

V_h=V_hW^*

\mathbf _h =\begin 1 / \sqrt & -1 / \sqrt \\ 1 / \sqrt & 1 / \sqrt\end \,\begin 1 / \sqrt & 1 / \sqrt & 0 & 0 \\ 1 / \sqrt & -1 / \sqrt & 0 & 0\end =\begin 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0\end

recovering from its rows the two right singular vectors

[0,1,0,0]^*

and

[1,0,0,0]^*

.We validate the first vector:

Mv=\sigmau

\begin 1 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 4 \\ 0 & 0 & 0 & 0\end\,\begin 0 \\ 1 \\ 0 \\ 0 \end= \, 2 \,\begin 0 \\ 1 \\ 0 \\ 0 \\ 0 \end

and

M^*u=\sigmav

\begin 1 & 0 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 & 0 \\ 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 4 & 0\end\,\begin 0 \\ 1 \\ 0 \\ 0 \\ 0 \end= \, 2 \,\begin 0 \\ 1 \\ 0 \\ 0 \end.

Thus, for the given matrix

with its column-space that is spanned by two exact right singular vectors, we determine these right singular vectors, as well as the corresponding left singular vectors and the singular values, all exactly. For an arbitrary matrix

, we obtain approximate singular triplets which are optimal given

in the sense of optimality of the Rayleigh–Ritz method.

Applications and examples

In quantum physics

In quantum physics, where the spectrum of the Hamiltonian is the set of discrete energy levels allowed by a quantum mechanical system, the Rayleigh–Ritz method is used to approximate the energy states and wavefunctions of a complicated atomic or nuclear system. In fact, for any system more complicated than a single hydrogen atom, there is no known exact solution for the spectrum of the Hamiltonian.

In this case, a trial wave function,

\Psi

, is tested on the system. This trial function is selected to meet boundary conditions (and any other physical constraints). The exact function is not known; the trial function contains one or more adjustable parameters, which are varied to find a lowest energy configuration.

It can be shown that the ground state energy,

E₀

, satisfies an inequality:

E_0 \le \frac.

That is, the ground-state energy is less than this value.The trial wave-function will always give an expectation value larger than or equal to the ground-energy.

If the trial wave function is known to be orthogonal to the ground state, then it will provide a boundary for the energy of some excited state.

The Ritz ansatz function is a linear combination of N known basis functions

\left\lbrace\Psi_{i\right\rbrace}

, parametrized by unknown coefficients:

\Psi = \sum_^N c_i \Psi_i.

With a known Hamiltonian, we can write its expected value as $\varepsilon = \frac = \frac \equiv \frac.$

The basis functions are usually not orthogonal, so that the overlap matrix S has nonzero nondiagonal elements. Either

\left\lbracec_i\right\rbrace

\left\lbrace

	*
c
	i

\right\rbrace

(the conjugation of the first) can be used to minimize the expectation value. For instance, by making the partial derivatives of

\varepsilon

over

\left\lbrace

	*
c
	i

\right\rbrace

zero, the following equality is obtained for every k = 1, 2, ..., N:

\frac = \frac = 0,

which leads to a set of N secular equations:

\sum_^N c_j \left(H_ - \varepsilon S_ \right) = 0 \quad \text \quad k = 1,2,\dots,N.

In the above equations, energy

\varepsilon

and the coefficients

\left\lbracec_j\right\rbrace

are unknown. With respect to c, this is a homogeneous set of linear equations, which has a solution when the determinant of the coefficients to these unknowns is zero:

\det \left(H - \varepsilon S \right) = 0,

which in turn is true only for N values of

\varepsilon

. Furthermore, since the Hamiltonian is a hermitian operator, the H matrix is also hermitian and the values of

\varepsilon_i

will be real. The lowest value among

\varepsilon_i

(i=1,2,..,N),

\varepsilon₀

, will be the best approximation to the ground state for the basis functions used. The remaining N-1 energies are estimates of excited state energies. An approximation for the wave function of state i can be obtained by finding the coefficients

\left\lbracec_j\right\rbrace

from the corresponding secular equation.

In mechanical engineering

The Rayleigh–Ritz method is often used in mechanical engineering for finding the approximate real resonant frequencies of multi degree of freedom systems, such as spring mass systems or flywheels on a shaft with varying cross section. It is an extension of Rayleigh's method. It can also be used for finding buckling loads and post-buckling behaviour for columns.

Consider the case whereby we want to find the resonant frequency of oscillation of a system. First, write the oscillation in the form, $y(x,t) = Y(x) \cos\omega t$ with an unknown mode shape

Y(x)

. Next, find the total energy of the system, consisting of a kinetic energy term and a potential energy term. The kinetic energy term involves the square of the time derivative of

y(x,t)

and thus gains a factor of

\omega²

. Thus, we can calculate the total energy of the system and express it in the following form:

E = T + V \equiv A[Y(x)] \omega^2\sin^2 \omega t + B[Y(x)] \cos^2 \omega t

By conservation of energy, the average kinetic energy must be equal to the average potential energy. Thus, $\omega^2 = \frac = R[Y(x)]$ which is also known as the Rayleigh quotient. Thus, if we knew the mode shape

Y(x)

, we would be able to calculate

A[Y(x)]

and

B[Y(x)]

, and in turn get the eigenfrequency. However, we do not yet know the mode shape. In order to find this, we can approximate

Y(x)

as a combination of a few approximating functions

Y_i(x)

Y(x) = \sum_^N c_i Y_i(x)

where

c_1,c_{2, … ,c}_N

are constants to be determined. In general, if we choose a random set of

c_1,c_{2, … ,c}_N

, it will describe a superposition of the actual eigenmodes of the system. However, if we seek

c_1,c_{2, … ,c}_N

such that the eigenfrequency

\omega²

is minimised, then the mode described by this set of

c_1,c_{2, … ,c}_N

will be close to the lowest possible actual eigenmode of the system. Thus, this finds the lowest eigenfrequency. If we find eigenmodes orthogonal to this approximated lowest eigenmode, we can approximately find the next few eigenfrequencies as well.

In general, we can express

A[Y(x)]

and

B[Y(x)]

as a collection of terms quadratic in the coefficients

c_i

B[Y(x)] = \sum_i \sum_j c_i c_j K_ = \mathbf^\mathsf K \mathbf

A[Y(x)] = \sum_i \sum_j c_i c_j M_ = \mathbf^\mathsf M \mathbf

where

and

are the stiffness matrix and mass matrix of a discrete system respectively.

The minimization of

\omega²

becomes:

\frac = \frac \frac = 0

Solving this, $\mathbf^\mathsf M \mathbf\frac - \mathbf^\mathsf K \mathbf \frac = 0$ $K \mathbf c - \fracM\mathbf = \mathbf$ $K \mathbf - \omega^2 M \mathbf = \mathbf$

For a non-trivial solution of c, we require determinant of the matrix coefficient of c to be zero. $\det(K - \omega^2 M)=0$

This gives a solution for the first N eigenfrequencies and eigenmodes of the system, with N being the number of approximating functions.

Simple case of double spring-mass system

The following discussion uses the simplest case, where the system has two lumped springs and two lumped masses, and only two mode shapes are assumed. Hence and .

\omega

times the deflection (y) at time of maximum deflection. In this example the kinetic energy (KE) for each mass is

\frac\omega^2 Y_1^2 m_1

etc., and the potential energy (PE) for each spring is

\frac k_1 Y_1^2

etc.

We also know that without damping, the maximal KE equals the maximal PE. Thus, $\sum_^2 \left(\frac \omega^2 Y_i^2 M_i\right)=\sum_^2 \left(\frac K_i Y_i^2\right)$

The overall amplitude of the mode shape cancels out from each side, always. That is, the actual size of the assumed deflection does not matter, just the mode shape.

Mathematical manipulations then obtain an expression for

\omega

, in terms of B, which can be differentiated with respect to B, to find the minimum, i.e. when

d\omega/dB=0

. This gives the value of B for which

\omega

is lowest. This is an upper bound solution for

\omega

is hoped to be the predicted fundamental frequency of the system because the mode shape is assumed, but we have found the lowest value of that upper bound, given our assumptions, because B is used to find the optimal 'mix' of the two assumed mode shape functions.

There are many tricks with this method, the most important is to try and choose realistic assumed mode shapes. For example, in the case of beam deflection problems it is wise to use a deformed shape that is analytically similar to the expected solution. A quartic may fit most of the easy problems of simply linked beams even if the order of the deformed solution may be lower. The springs and masses do not have to be discrete, they can be continuous (or a mixture), and this method can be easily used in a spreadsheet to find the natural frequencies of quite complex distributed systems, if you can describe the distributed KE and PE terms easily, or else break the continuous elements up into discrete parts.

This method could be used iteratively, adding additional mode shapes to the previous best solution, or you can build up a long expression with many Bs and many mode shapes, and then differentiate them partially.

In dynamical systems

The Koopman operator allows a finite-dimensional nonlinear system to be encoded as an infinite-dimensional linear system. In general, both of these problems are difficult to solve, but for the latter we can use the Ritz-Galerkin method to approximate a solution.^[13]

The relationship with the finite element method

In the language of the finite element method, the matrix

H_kj

is precisely the stiffness matrix of the Hamiltonian in the piecewise linear element space, and the matrix

S_kj

is the mass matrix. In the language of linear algebra, the value

\epsilon

is an eigenvalue of the discretized Hamiltonian, and the vector

is a discretized eigenvector.

Notes and references

Ritz. Walther. Walther Ritz. Über eine neue Methode zur Lösung gewisser Variationsprobleme der mathematischen Physik. Journal für die Reine und Angewandte Mathematik. 135. 1-61. 1909.
MacDonald. J. K.. Successive Approximations by the Rayleigh-Ritz Variation Method. Phys. Rev.. 43. 1933.

External links

Course on Calculus of Variations, has a section on Rayleigh–Ritz method.
Ritz method in the Encyclopedia of Mathematics
From Euler, Ritz, and Galerkin to Modern Computing . Gander . Martin J.. Wanner . Gerhard . SIAM Review . 2012 . 54 . 4 . 627–666 . 10.1137/100804036. 10.1.1.297.5697 .

Notes and References

Leissa. A.W.. The historical bases of the Rayleigh and Ritz methods. Journal of Sound and Vibration. 287. 4–5. 2005. 961–978. 10.1016/j.jsv.2004.12.021. 2005JSV...287..961L. subscription.
Ilanko. Sinniah. Comments on the historical bases of the Rayleigh and Ritz methods. Journal of Sound and Vibration. 319. 1–2. 2009. 731–733 . 10.1016/j.jsv.2008.06.001. 2009JSV...319..731I .
Davies. E. B.. Plum. M.. Spectral Pollution. IMA Journal of Numerical Analysis. E. Brian Davies. 2003.
Book: Süli. Endre. Endre Süli. Mayers. David. An Introduction to Numerical Analysis. Cambridge University Press. 0521007941. 2003.
Levitin. Michael. Shargorodsky. Eugene. Spectral pollution and second order relative spectra for self-adjoint operators. IMA Journal of Numerical Analysis. 2004.
Book: Pryce. John D.. Numerical Solution of Sturm-Liouville Problems. 0198534159. Oxford University Press. 1994.
Book: Arfken. George B.. George B. Arfken. Weber. Hans J.. 2005. Mathematical Methods For Physicists. 6th. Academic Press.
Unscrambling the Infinite: Can we Compute Spectra?. Colbrook. Matthew. Mathematics Today. Institute of Mathematics and its Applications.
Colbrook. Matthew. Roman. Bogdan. Hansen. Anders. How to Compute Spectra with Error Control. Physical Review Letters. 2019.
Pokrzywa. Andrzej. Method of orthogonal projections and approximation of the spectrum of a bounded operator. 1979. Studia Mathematica.
Bögli. Sabine. Marletta. Marco. Tretter. Christiane. The essential numerical range for unbounded linear operators. Journal of Functional Analysis. 2020.
Book: Trefethen. Lloyd N. . Bau, III. David. Numerical Linear Algebra. 1997. SIAM. 978-0-89871-957-4. 254.
Web site: Servadio. Simone. Arnas. David. Linares. Richard. A Koopman Operator Tutorial with Orthogonal Polynomials. arXiv.

Rayleigh–Ritz method explained

Naming and attribution

Method

Properties

Spectral pollution

For matrix eigenvalue problems

Example

For matrix singular value problems

Using the normal matrix

Example

Applications and examples

In quantum physics

In mechanical engineering

Simple case of double spring-mass system

In dynamical systems

The relationship with the finite element method

See also

Notes and references

External links

Notes and References