The Kaczmarz method or Kaczmarz's algorithm is an iterative algorithm for solving linear equation systems
Ax=b
The Kaczmarz method is applicable to any linear system of equations, but its computational advantage relative to other methods depends on the system being sparse. It has been demonstrated to be superior, in some biomedical imaging applications, to other methods such as the filtered backprojection method.
It has many applications ranging from computed tomography (CT) to signal processing. It can be obtained also by applying to the hyperplanes, described by the linear system, the method of successive projections onto convex sets (POCS).
Let
Ax=b
m
ai
i
A
x0
Ax=b
k=0,1,\ldots
where
i=k\bmodm+1,i=1,2,\ldotsm
\overline{ai}
ai
If the system is consistent,
xk
A more general algorithm can be defined using a relaxation parameter
λk
xk+1=xk+λk
bi-\langleai,xk\rangle | |
\|ai\|2 |
\overline{ai
There are versions of the method that converge to a regularized weighted least squares solution when applied to a system of inconsistent equations and, at least as far as initial behavior is concerned, at a lesser cost than other iterative methods, such as the conjugate gradient method.[1]
In 2009, a randomized version of the Kaczmarz method for overdetermined linear systems was introduced by Thomas Strohmer and Roman Vershynin in which the i-th equation is selected randomly with probability proportional to
\|ai\|2.
This method can be seen as a particular case of stochastic gradient descent.
Under such circumstances
xk
Ax=b,
\kappa(A)
Theorem. Let
x
Ax=b.
x
E\|xk-x\|2\leq\left(1-\kappa(A)-2\right)k ⋅ \|x0-x\|2.
We have
Using
\|A
m | |
\| | |
j=1 |
\|aj\|2
we can write as
The main point of the proof is to view the left hand side in as an expectation of some random variable. Namely, recall that the solution space of the
j-th
Ax=b
\{y:\langley,aj\rangle=bj\},
whose normal is
\tfrac{aj}{\|aj\|2}.
Ax=b
Z= | aj |
\|aj\| |
\|aj\|2 | |
\|A\|2 |
j=1,\ldots,m
Then says that
The orthogonal projection
P
Ax=b
Pz=z-\langlez-x,Z\rangleZ.
Now we are ready to analyze our algorithm. We want to show that the error
{\|xk-x\|2}
(1-\kappa(A)-2).
xk
xk-1
xk=Pkxk-1,
P1,P2,\ldots
P.
xk-1-xk
Pk.
Pk
xk-x
x
\|xk-x\|2=\|xk-1-x\|2-\|xk-1-xk\|2.
To complete the proof, we have to bound
\|xk-1-xk\|2
xk
\|xk-1-xk\|=\langlexk-1-x,Zk\rangle
where
Z1,Z2,\ldots
Z.
Thus
\|xk-x\|2\leq\left(1-\left|\left\langle
xk-1-x | |
\|xk-1-x\| |
,
2\right){\| | |
Z | |
k\right\rangle\right| |
xk-1-x\|2}.
Now we take the expectation of both sides conditional upon the choice of the random vectors
Z1,\ldots,Zk-1
P1,\ldots,Pk-1
x1,\ldots,xk-1
Zk
E | |
Z1,\ldots,Zk-1 |
{\|xk-x\|2}=
\left(1-E | \left|\left\langle | |
Z1,\ldots,Zk-1,Zk |
xk-1-x | |
\|xk-1-x\| |
2\right){\| | |
,Z | |
k\right\rangle\right| |
xk-1-x\|2}.
By and the independence,
E | |
Z1,\ldots,Zk-1 |
{\|xk-x\|2}\leq(1-\kappa(A)-2){\|xk-1-x\|2}.
Taking the full expectation of both sides, we conclude that
E\|xk-x\|2\leq(1-\kappa(A)-2)E{\|xk-1-x\|2}.\blacksquare
The superiority of this selection was illustrated with the reconstruction of a bandlimited function from its nonuniformly spaced sampling values. However, it has been pointed out that the reported success by Strohmer and Vershynin depends on the specific choices that were made there in translating the underlying problem, whose geometrical nature is to find a common point of a set of hyperplanes, into a system of algebraic equations. There will always be legitimate algebraic representations of the underlying problem for which the selection method in will perform in an inferior manner.
The Kaczmarz iteration has a purely geometric interpretation: the algorithm successively projects the current iterate onto the hyperplane defined by the next equation. Hence, any scaling of the equations is irrelevant; it can also be seen from that any (nonzero) scaling of the equations cancels out. Thus, in RK, one can use
\|ai\|
In 2015, Robert M. Gower and Peter Richtarik developed a versatile randomized iterative method for solving a consistent system of linear equations
Ax=b
Interesting new insights about the randomized Kaczmarz method that can be gained from the analysis of the method include:
A
A
f(x)=\tfrac{1}{2}xTAx-bTx.
f
f
\nablaf(x)=0
Ax=b.
f
x*=A-1b.
The Gower-Richtarik method enjoys six seemingly different but equivalent formulations, shedding additional light on how to interpret it (and, as a consequence, how to interpret its many variants, including randomized Kaczmarz):
We now describe some of these viewpoints. The method depends on 2 parameters:
B
\langlex,y\rangleB:=xTBy
\|x\|B=\left(\langlex,x\rangleB\right
| ||||
) |
,
S
A
Given previous iterate
xk,
xk+1
S
xk+1=\undersetx\operatorname{arg min}\|x-xk\|BsubjecttoSTAx=STb.
That is,
xk+1
xk
STAx=STb
S
Ax=b
B
S
ith
pi=
2 | |
\|a | |
2/\|A\| |
2. | |
F |
B
S
A seemingly different but entirely equivalent formulation of the method (obtained via Lagrangian duality) is
xk+1=\undersetx\operatorname{arg min}\left\|x-x*\right\|Bsubjecttox=xk+B-1ATSy,
where
y
x*
Ax=b.
xk+1
B-1ATS
\left\{h:h=B-1ATSy, ycanvary\right\},
and then choosing the point
x
x*
x*
xk+1
xk+1
x*
The update can also be written explicitly as
xk+1=xk-B-1ATS\left(STAB-1ATS\right)\daggerST\left(Axk-b\right),
where by
M\dagger
M
xk+1=xk+hk
hk
Letting
M=STAB-1ATS,
My=ST(Axk-b)
yk
xk+1-B-1ATSyk
xk+1=xk-B-1ATSyk
If we subtract
x*
Z:=ATS\left(STAB-1ATS\right)\daggerSTA,
and use the fact that
Ax*=b,
xk+1-x*=\left(I-B-1Z\right)\left(xk-x*\right),
where
I
I-B-1Z,
By taking conditional expectations in the 6th formulation (conditional on
xk
E\left.\left[xk+1-x*\right|xk\right]=\left(I-B-1E[Z]\right)\left[xk-x*\right].
By taking expectation again, and using the tower property of expectations, we obtain
E\left[xk+1-x*\right]=(I-B-1E[Z])E\left[xk-x*\right].
Gower and Richtarik show that
\rho:=\left
| ||||
\|I-B |
| ||||
E[Z]B |
\right\|B=λmax\left(I-B-1E[Z]\right),
where the matrix norm is defined by
\|M\|B:=maxx ≠
\|Mx\|B | |
\|x\|B |
.
Moreover, without any assumptions on
S
0\leq\rho\leq1.
\left\|E\left[xk-x*\right]\right\|B\leq\rhok\|x0-x*\|B.
Remark. A sufficient condition for the expected residuals to converge to 0 is
\rho<1.
A
S.
It is also possible to show a stronger result:
The expected squared norms (rather than norms of expectations) converge at the same rate:
E\left\|\left[xk-x*\right]\right
2 | |
\| | |
B |
\leq\rhok\left\|x0-x*\right
2 | |
\| | |
B. |
Remark. This second type of convergence is stronger due to the following identity which holds for any random vector
x
x*
\left\|E\left[x-x*\right]\right\|2=E\left[\left\|x-x*\right\|2\right]-E\left[\|x-E[x]\|2\right].
We have seen that the randomized Kaczmarz method appears as a special case of the Gower-Richtarik method for
B=I
S
ith
pi=\|ai\|
2, | |
F |
ai
ith
A.
\rho=\|I-B-1E[Z]\|B=1-
λmin(ATA) | ||||||
|
.
Since the convergence of the (randomized) Kaczmarz method depends on a rate of convergence the method may make slow progress on some practical problems. To ensure finite termination of the method, Johannes Brust and Michael Saunders (academic) have developed a process that generalizes the (randomized) Kaczmarz iteration andterminates in at most
m
Ax=b
xk+1=xk+
T | |
A | |
:,1:k |
(A1:k,:
T | |
A | |
:,1:k |
)\dagger(b1:k-A1:k,:xk)
where
A1:k,:
k
A
k
\{i1,\ldots,ik-1,ik\}
ij
1,2,...,m
k=m
A1:m,:=A
Axm+1=Axm+AAT(AAT)\dagger(b-Axm)=b
and therefore
xm+1
algorithm PLSS-Kaczmarz is input: matrix A right hand side b output: solution x such that Ax=b x := 0, P = [0] for k in 1,2,...,m do a := A(ik,:)' // Select an index ik in 1,...,m without resampling d := P' * a c1 := norm(a) c2 := norm(d) c3 := (bik-x'*a)/((c1-c2)*(c1+c2)) p := c3*(a - P*(P'*a)) P := [P, p/norm(p) ] // Append a normalized update x := x + p return x