In mathematics, the generalized minimal residual method (GMRES) is an iterative method for the numerical solution of an indefinite nonsymmetric system of linear equations. The method approximates the solution by the vector in a Krylov subspace with minimal residual. The Arnoldi iteration is used to find this vector.
The GMRES method was developed by Yousef Saad and Martin H. Schultz in 1986.[1] It is a generalization and improvement of the MINRES method due to Paige and Saunders in 1975.[2] [3] The MINRES method requires that the matrix is symmetric, but has the advantage that it only requires handling of three vectors. GMRES is a special case of the DIIS method developed by Peter Pulay in 1980. DIIS is applicable to non-linear systems.
Denote the Euclidean norm of any vector v by
\|v\|
\|b\|=1
The n-th Krylov subspace for this problem is where
r0=b-Ax0
x0\ne0
r0=b
x0=0
GMRES approximates the exact solution of
Ax=b
xn\inx0+Kn
rn=b-Axn
The vectors
r0,Ar0,\ldotsAn-1r0
q1,q2,\ldots,qn
Kn
q1=\|r0\|
-1 | |
2 |
r0
Therefore, the vector
xn\inx0+Kn
xn=x0+Qnyn
yn\inRn
Qn
q1,\ldots,qn
xn
yn
The Arnoldi process also constructs
\tilde{H}n
n+1
n
yn
Because columns of
Qn
Rn+1
r0
b
xn
This yields the GMRES method. On the
n
qn
yn
\|rn\|
xn=x0+Qnyn
At every iteration, a matrix-vector product
Aqn
2m2
m
O(m)
O(nm)
The nth iterate minimizes the residual in the Krylov subspace
Kn
This does not happen in general. Indeed, a theorem of Greenbaum, Pták and Strakoš states that for every nonincreasing sequence a1, ..., am-1, am = 0, one can find a matrix A such that the = an for all n, where rn is the residual defined above. In particular, it is possible to find a matrix for which the residual stays constant for m - 1 iterations, and only drops to zero at the last iteration.
In practice, though, GMRES often performs well. This can be proven in specific situations. If the symmetric part of A, that is
(AT+A)/2
λmin(M)
λmax(M)
M
If A is symmetric and positive definite, then we even havewhere
\kappa2(A)
In the general case, where A is not positive definite, we havewhere Pn denotes the set of polynomials of degree at most n with p(0) = 1, V is the matrix appearing in the spectral decomposition of A, and σ(A) is the spectrum of A. Roughly speaking, this says that fast convergence occurs when the eigenvalues of A are clustered away from the origin and A is not too far from normality.[5]
All these inequalities bound only the residuals instead of the actual error, that is, the distance between the current iterate xn and the exact solution.
Like other iterative methods, GMRES is usually combined with a preconditioning method in order to speed up convergence.
The cost of the iterations grow as O(n2), where n is the iteration number. Therefore, the method is sometimes restarted after a number, say k, of iterations, with xk as initial guess. The resulting method is called GMRES(k) or Restarted GMRES. For non-positive definite matrices, this method may suffer from stagnation in convergence as the restarted subspace is often close to the earlier subspace.
The shortcomings of GMRES and restarted GMRES are addressed by the recycling of Krylov subspace in the GCRO type methods such as GCROT and GCRODR.[6] Recycling of Krylov subspaces in GMRES can also speed up convergence when sequences of linear systems need to be solved.[7]
The Arnoldi iteration reduces to the Lanczos iteration for symmetric matrices. The corresponding Krylov subspace method is the minimal residual method (MinRes) of Paige and Saunders. Unlike the unsymmetric case, the MinRes method is given by a three-term recurrence relation. It can be shown that there is no Krylov subspace method for general matrices, which is given by a short recurrence relation and yet minimizes the norms of the residuals, as GMRES does.
Another class of methods builds on the unsymmetric Lanczos iteration, in particular the BiCG method. These use a three-term recurrence relation, but they do not attain the minimum residual, and hence the residual does not decrease monotonically for these methods. Convergence is not even guaranteed.
The third class is formed by methods like CGS and BiCGSTAB. These also work with a three-term recurrence relation (hence, without optimality) and they can even terminate prematurely without achieving convergence. The idea behind these methods is to choose the generating polynomials of the iteration sequence suitably.
None of these three classes is the best for all matrices; there are always examples in which one class outperforms the other. Therefore, multiple solvers are tried in practice to see which one is the best for a given problem.
One part of the GMRES method is to find the vector
yn
\tilde{H}n
\tilde{R}n
Rn
The QR decomposition can be updated cheaply from one iteration to the next, because the Hessenberg matrices differ only by a row of zeros and a column: where hn+1 = (h1,n+1, ..., hn+1,n+1)T. This implies that premultiplying the Hessenberg matrix with Ωn, augmented with zeroes and a row with multiplicative identity, yields almost a triangular matrix:This would be triangular if σ is zero. To remedy this, one needs the Givens rotationwhereWith this Givens rotation, we formIndeed, is a triangular matrix with .
Given the QR decomposition, the minimization problem is easily solved by noting thatDenoting the vector
\beta\Omegane1
gn
% use x as the initial vector r = b - A * x;
b_norm = norm(b); error = norm(r) / b_norm;
% initialize the 1D vectors sn = zeros(m, 1); cs = zeros(m, 1); %e1 = zeros(n, 1); e1 = zeros(m+1, 1); e1(1) = 1; e = [error]; r_norm = norm(r); Q(:,1) = r / r_norm; % Note: this is not the beta scalar in section "The method" above but % the beta scalar multiplied by e1 beta = r_norm * e1; for k = 1:m
% run arnoldi [H(1:k+1, k), Q(:, k+1)] = arnoldi(A, Q, k); % eliminate the last element in H ith row and update the rotation matrix [H(1:k+1, k), cs(k), sn(k)] = apply_givens_rotation(H(1:k+1,k), cs, sn, k); % update the residual vector beta(k + 1) = -sn(k) * beta(k); beta(k) = cs(k) * beta(k); error = abs(beta(k + 1)) / b_norm;
% save the error e = [e; error];
if (error <= threshold) break; end end % if threshold is not reached, k = m at this point (and not m+1) % calculate the result y = H(1:k, 1:k) \ beta(1:k); x = x + Q(:, 1:k) * y;end
%----------------------------------------------------%% Arnoldi Function %%----------------------------------------------------%function [h, q] = arnoldi(A, Q, k) q = A*Q(:,k); % Krylov Vector for i = 1:k % Modified Gram-Schmidt, keeping the Hessenberg matrix h(i) = q' * Q(:, i); q = q - h(i) * Q(:, i); end h(k + 1) = norm(q); q = q / h(k + 1);end
%---------------------------------------------------------------------%% Applying Givens Rotation to H col %%---------------------------------------------------------------------%function [h, cs_k, sn_k] = apply_givens_rotation(h, cs, sn, k) % apply for ith column for i = 1:k-1 temp = cs(i) * h(i) + sn(i) * h(i + 1); h(i+1) = -sn(i) * h(i) + cs(i) * h(i + 1); h(i) = temp; end
% update the next sin cos values for rotation [cs_k, sn_k] = givens_rotation(h(k), h(k + 1));
% eliminate H(i + 1, i) h(k) = cs_k * h(k) + sn_k * h(k + 1); h(k + 1) = 0.0;end
%%----Calculate the Givens rotation matrix----%%function [cs, sn] = givens_rotation(v1, v2)% if (v1