In numerical linear algebra, the conjugate gradient method is an iterative method for numerically solving the linear system
\boldsymbol{Ax}=\boldsymbol{b}
where
\boldsymbol{A}
\boldsymbol{A}-1
The intent of this article is to document the important steps in these derivations.
The conjugate gradient method can be seen as a special case of the conjugate direction method applied to minimization of the quadratic function
f(\boldsymbol{x})=\boldsymbol{x}T\boldsymbol{A}\boldsymbol{x}-2\boldsymbol{b}T\boldsymbol{x}.
Geometrically, the quadratic function can be equivalently presented by writing down its value at every point in space. The points of equal value make up its contour surfaces, which are concentric ellipsoids with the equation for varying
C
C
Minimizing the quadratic function is then a problem of moving around the plane, searching for that shared center of all those ellipsoids. The center can be found by computing
\boldsymbol{A}-1
The simplest method is greedy line search, where we start at some point
\boldsymbol{x}0
\boldsymbol{p}0
f(\boldsymbol{x}0+\boldsymbol{p}0\alpha0)
\boldsymbol{x}0
We can now repeat this procedure, starting at our new point
\boldsymbol{x}1=\boldsymbol{x}0+\alpha0\boldsymbol{p}0
\boldsymbol{p}1
\alpha1
We can summarize this as the following algorithm:
Start by picking an initial guess
\boldsymbol{x}0
\boldsymbol{r}0=\boldsymbol{b}-\boldsymbol{Ax}0
T\boldsymbol{r} | |
\begin{align} \alpha | |
i}{\boldsymbol{p} |
T\boldsymbol{Ap} | |
i},\\ \boldsymbol{x} |
i+1&=\boldsymbol{x}i+\alphai\boldsymbol{p}i,\\ \boldsymbol{r}i+1&=\boldsymbol{r}i-\alphai\boldsymbol{Ap}i \end{align}
where
\boldsymbol{p}0,\boldsymbol{p}1,\boldsymbol{p}2,\ldots
\alphai=0
\boldsymbol{p}0,\boldsymbol{p}1,\boldsymbol{p}2,\ldots
If the directions
\boldsymbol{p}0,\boldsymbol{p}1,\boldsymbol{p}2,\ldots
n
n
t0\boldsymbol{p}0,...,tn-1\boldsymbol{p}n-1
i
\boldsymbol{c}+ti\boldsymbol{p}i
\{\boldsymbol{p}j:j ≠ i\}
\boldsymbol{c}
Note that we need to scale each directional vector
\boldsymbol{p}i
ti
\boldsymbol{c}+ti\boldsymbol{p}i
Given an ellipsoid with equation for some constant
C
C'
T\boldsymbol{Ap} | |
\boldsymbol{p} | |
j=0 |
i ≠ j
The conjugate direction method is imprecise in the sense that no formulae are given for selection of the directions
\boldsymbol{p}0,\boldsymbol{p}1,\boldsymbol{p}2,\ldots
We can tabulate the equations that we need to set to zero:
1 | 2 | 3 | … | ||||||||||||||||||||
0 |
|
|
| … | |||||||||||||||||||
1 |
|
| … | ||||||||||||||||||||
2 |
| … | |||||||||||||||||||||
\vdots | \ddots |
This resembles the problem of orthogonalization, which requires
T\boldsymbol{p} | |
\boldsymbol{p} | |
j=0 |
i ≠ j
T\boldsymbol{p} | |
\boldsymbol{p} | |
j=1 |
i=j
\boldsymbol{p}0
\boldsymbol{p}10
\boldsymbol{p}1=\boldsymbol{p}10-
\boldsymbol{p | |
0 |
T\boldsymbol{Ap} | |
10 |
\boldsymbol{p}20
\boldsymbol{p}2=\boldsymbol{p}20-
| ||||
\sum | ||||
i=0 |
T\boldsymbol{Ap} | |
20 |
\boldsymbol{p}n-1,0
\boldsymbol{p}n-1=\boldsymbol{p}n-1,0-
n-2 | |
\sum | |
i=0 |
\boldsymbol{p | |
i |
T\boldsymbol{Ap} | |
n-1,0 |
\boldsymbol{p}k,0
\boldsymbol{p}k,0=\nablaf(\boldsymbol{x}k)
-1/2
\boldsymbol{p}k=\boldsymbol{r}k-
k-1 | |
\sum | |
i=0 |
\boldsymbol{p | |
i |
T\boldsymbol{Ar} | |
k |
\alphak=0
\nablaf(xk+1)=0
Proof. By construction, it would mean that
xk+1=xk
This algorithm can be significantly simplified by some lemmas, resulting in the conjugate gradient algorithm.
Lemma 1.
T | |
p | |
i |
rj=0, \foralli<j
T | |
r | |
i |
rj=0, \foralli<j
Proof. By the geometric construction, the tangent plane to the ellipsoid at
xj
p0,p1,...,pj-1
rj
T | |
p | |
i |
rj=0, \foralli<j
r0,r1,...,rj-1
p0,p1,...,pj-1
Lemma 2.
T | |
p | |
k |
rk=
Tr | |
r | |
k |
Proof. By construction,
pk:=\boldsymbol{r}k-
k-1 | |
\sum | |
i=0 |
\boldsymbol{p | |
i |
T\boldsymbol{Ar} | |
k-1 |
Lemma 3.
T\boldsymbol{Ar} | |
\boldsymbol{p} | |
k+1 |
=\begin{cases} 0, i<k
T | |
\\ -\boldsymbol{r} | |
k+1 |
\boldsymbol{r}k+1/\alphak, i=k \end{cases}
Proof. By construction, we have
ri+1=ri-\alphakApi
Plugging lemmas 1-3 in, we have
\alpha | ||||||||||||||||||||||
|
pk+1:=\boldsymbol{r}k+1+
| ||||||||||
|
pk
See also: Arnoldi iteration and Lanczos iteration. The conjugate gradient method can also be seen as a variant of the Arnoldi/Lanczos iteration applied to solving linear systems.
In the Arnoldi iteration, one starts with a vector
\boldsymbol{r}0
\{\boldsymbol{v}1,\boldsymbol{v}2,\boldsymbol{v}3,\ldots\}
l{K}(\boldsymbol{A},\boldsymbol{r}0)=span\{\boldsymbol{r}0,\boldsymbol{Ar}
2\boldsymbol{r} | |
0,\ldots\} |
by defining
\boldsymbol{v}i=\boldsymbol{w}i/\lVert\boldsymbol{w}i\rVert2
\boldsymbol{v}i=\begin{cases} \boldsymbol{r}0&ifi=1,\\ \boldsymbol{Av}i-1
i-1 | |
-\sum | |
j=1 |
T\boldsymbol{Av} | |
(\boldsymbol{v} | |
i-1 |
)\boldsymbol{v}j&ifi>1. \end{cases}
In other words, for
i>1
\boldsymbol{v}i
\boldsymbol{Av}i-1
\{\boldsymbol{v}1,\boldsymbol{v}2,\ldots,\boldsymbol{v}i-1\}
Put in matrix form, the iteration is captured by the equation
\boldsymbol{AV}i=\boldsymbol{V}i+1\boldsymbol{\tilde{H}}i
where
\begin{align} \boldsymbol{V}i&=\begin{bmatrix} \boldsymbol{v}1&\boldsymbol{v}2& … &\boldsymbol{v}i \end{bmatrix},\\ \boldsymbol{\tilde{H}}i&=\begin{bmatrix} h11&h12&h13& … &h1,i\\ h21&h22&h23& … &h2,i\\ &h32&h33& … &h3,i\\ &&\ddots&\ddots&\vdots\\ &&&hi,i-1&hi,i\\ &&&&hi+1,i\end{bmatrix}=\begin{bmatrix} \boldsymbol{H}i\\ hi+1,i
T \end{bmatrix} \end{align} | |
\boldsymbol{e} | |
i |
with
hji
T\boldsymbol{Av} | |
=\begin{cases} \boldsymbol{v} | |
i |
&ifj\leqi,\\ \lVert\boldsymbol{w}i+1\rVert2&ifj=i+1,\\ 0&ifj>i+1. \end{cases}
When applying the Arnoldi iteration to solving linear systems, one starts with
\boldsymbol{r}0=\boldsymbol{b}-\boldsymbol{Ax}0
\boldsymbol{x}0
\boldsymbol{y}i=\boldsymbol{H}
-1 | |
i |
(\lVert\boldsymbol{r}0\rVert2\boldsymbol{e}1)
\boldsymbol{x}i=\boldsymbol{x}0+\boldsymbol{V}i\boldsymbol{y}i
For the rest of discussion, we assume that
\boldsymbol{A}
\boldsymbol{A}
\boldsymbol{H}i=\boldsymbol{V}
T\boldsymbol{AV} | |
i |
\boldsymbol{H}i=\begin{bmatrix} a1&b2\\ b2&a2&b3\\ &\ddots&\ddots&\ddots\\ &&bi-1&ai-1&bi\\ &&&bi&ai \end{bmatrix}.
This enables a short three-term recurrence for
\boldsymbol{v}i
Since
\boldsymbol{A}
\boldsymbol{H}i
\boldsymbol{H}i
\boldsymbol{H}i=\boldsymbol{L}i\boldsymbol{U}i=\begin{bmatrix} 1\\ c2&1\\ &\ddots&\ddots\\ &&ci-1&1\\ &&&ci&1 \end{bmatrix}\begin{bmatrix} d1&b2\\ &d2&b3\\ &&\ddots&\ddots\\ &&&di-1&bi\\ &&&&di \end{bmatrix}
with convenient recurrences for
ci
di
\begin{align} ci&=bi/di-1,\\ di&=\begin{cases} a1&ifi=1,\\ ai-cibi&ifi>1. \end{cases} \end{align}
Rewrite
\boldsymbol{x}i=\boldsymbol{x}0+\boldsymbol{V}i\boldsymbol{y}i
\begin{align} \boldsymbol{x}i&=\boldsymbol{x}0+\boldsymbol{V}i\boldsymbol{H}
-1 | |
i |
(\lVert\boldsymbol{r}0\rVert2\boldsymbol{e}1)\\ &=\boldsymbol{x}0+\boldsymbol{V}i\boldsymbol{U}
-1 | |
i |
-1 | |
\boldsymbol{L} | |
i |
(\lVert\boldsymbol{r}0\rVert2\boldsymbol{e}1)\\ &=\boldsymbol{x}0+\boldsymbol{P}i\boldsymbol{z}i \end{align}
with
\begin{align} \boldsymbol{P}i&=\boldsymbol{V}i
-1 | |
\boldsymbol{U} | |
i |
,\\ \boldsymbol{z}i&=\boldsymbol{L}
-1 | |
i |
(\lVert\boldsymbol{r}0\rVert2\boldsymbol{e}1). \end{align}
It is now important to observe that
\begin{align} \boldsymbol{P}i&=\begin{bmatrix} \boldsymbol{P}i-1&\boldsymbol{p}i \end{bmatrix},\\ \boldsymbol{z}i&=\begin{bmatrix} \boldsymbol{z}i-1\\ \zetai \end{bmatrix}. \end{align}
In fact, there are short recurrences for
\boldsymbol{p}i
\zetai
\begin{align} \boldsymbol{p} | ||||
|
(\boldsymbol{v}i-bi\boldsymbol{p}i-1),\\ \zetai&=-ci\zetai-1. \end{align}
With this formulation, we arrive at a simple recurrence for
\boldsymbol{x}i
\begin{align} \boldsymbol{x}i&=\boldsymbol{x}0+\boldsymbol{P}i\boldsymbol{z}i\\ &=\boldsymbol{x}0+\boldsymbol{P}i-1\boldsymbol{z}i-1+\zetai\boldsymbol{p}i\\ &=\boldsymbol{x}i-1+\zetai\boldsymbol{p}i. \end{align}
The relations above straightforwardly lead to the direct Lanczos method, which turns out to be slightly more complex.
If we allow
\boldsymbol{p}i
\begin{align} \boldsymbol{x}i&=\boldsymbol{x}i-1+\alphai-1\boldsymbol{p}i-1,\\ \boldsymbol{r}i&=\boldsymbol{r}i-1-\alphai-1\boldsymbol{Ap}i-1,\\ \boldsymbol{p}i&=\boldsymbol{r}i+\betai-1\boldsymbol{p}i-1. \end{align}
As premises for the simplification, we now derive the orthogonality of
\boldsymbol{r}i
\boldsymbol{p}i
i ≠ j
T\boldsymbol{r} | |
\begin{align} \boldsymbol{r} | |
j&=0,\\ \boldsymbol{p} |
T\boldsymbol{Ap} | |
j&=0. \end{align} |
The residuals are mutually orthogonal because
\boldsymbol{r}i
\boldsymbol{v}i+1
i=0
\boldsymbol{r}0=\lVert\boldsymbol{r}0\rVert2\boldsymbol{v}1
i>0
\begin{align} \boldsymbol{r}i&=\boldsymbol{b}-\boldsymbol{Ax}i\\ &=\boldsymbol{b}-\boldsymbol{A}(\boldsymbol{x}0+\boldsymbol{V}i\boldsymbol{y}i)\\ &=\boldsymbol{r}0-\boldsymbol{AV}i\boldsymbol{y}i\\ &=\boldsymbol{r}0-\boldsymbol{V}i+1\boldsymbol{\tilde{H}}i\boldsymbol{y}i\\ &=\boldsymbol{r}0-\boldsymbol{V}i\boldsymbol{H}i\boldsymbol{y}i-hi+1,i
T\boldsymbol{y} | |
(\boldsymbol{e} | |
i)\boldsymbol{v} |
i+1\\ &=\lVert\boldsymbol{r}0\rVert2\boldsymbol{v}1-\boldsymbol{V}i(\lVert\boldsymbol{r}0\rVert2\boldsymbol{e}1)-hi+1,i
T\boldsymbol{y} | |
(\boldsymbol{e} | |
i)\boldsymbol{v} |
i+1\\ &=-hi+1,i
T\boldsymbol{y} | |
(\boldsymbol{e} | |
i)\boldsymbol{v} |
i+1.\end{align}
To see the conjugacy of
\boldsymbol{p}i
T\boldsymbol{AP} | |
\boldsymbol{P} | |
i |
T\boldsymbol{AP} | |
\begin{align} \boldsymbol{P} | |
i&=\boldsymbol{U} |
-T | |
i |
T\boldsymbol{AV} | |
\boldsymbol{V} | |
i\boldsymbol{U} |
-1 | |
i |
-T | |
\\ &=\boldsymbol{U} | |
i |
\boldsymbol{H}i\boldsymbol{U}
-1 | |
i |
-T | |
\\ &=\boldsymbol{U} | |
i |
\boldsymbol{L}i\boldsymbol{U}i\boldsymbol{U}
-1 | |
i |
-T | |
\\ &=\boldsymbol{U} | |
i |
\boldsymbol{L}i \end{align}
is symmetric and lower triangular simultaneously and thus must be diagonal.
Now we can derive the constant factors
\alphai
\betai
\boldsymbol{p}i
\boldsymbol{r}i
\boldsymbol{p}i
Due to the orthogonality of
\boldsymbol{r}i
T\boldsymbol{r} | |
\boldsymbol{r} | |
i=(\boldsymbol{r} |
i-\alphai\boldsymbol{Ap}
T\boldsymbol{r} | |
i=0 |
T\boldsymbol{r} | |
\begin{align} \alpha | |
i}{\boldsymbol{r} |
T\boldsymbol{r} | |
i}{(\boldsymbol{p} |
i-\betai-1\boldsymbol{p}i-1
T\boldsymbol{r} | |
) | |
i}{\boldsymbol{p} |
T\boldsymbol{Ap} | |
i}. \end{align} |
Similarly, due to the conjugacy of
\boldsymbol{p}i
T\boldsymbol{Ap} | |
\boldsymbol{p} | |
i=(\boldsymbol{r} |
i+1+\betai\boldsymbol{p}
T\boldsymbol{Ap} | |
i=0 |
T\boldsymbol{Ap} | |
\begin{align} \beta | |
i}{\boldsymbol{p} |
T(\boldsymbol{r} | |
i-\boldsymbol{r} |
i+1)}{\alphai\boldsymbol{p}
T\boldsymbol{r} | |
i+1 |
This completes the derivation.