Randomized (Block) Coordinate Descent Method is an optimization algorithm popularized by Nesterov (2010) and Richtárik and Takáč (2011). The first analysis of this method, when applied to the problem of minimizing a smooth convex function, was performed by Nesterov (2010). In Nesterov's analysis the method needs to be applied to a quadratic perturbation of the original function with an unknown scaling factor. Richtárik and Takáč (2011) give iteration complexity bounds which do not require this, i.e., the method is applied to the objective function directly. Furthermore, they generalize the setting to the problem of minimizing a composite function, i.e., sum of a smooth convex and a (possibly nonsmooth) convex block-separable function:
F(x)=f(x)+\Psi(x),
where
\Psi(x)=
n | |
\sum | |
i=1 |
(i) | |
\Psi | |
i(x |
),
x\inRN
n
x=(x(1),...,x(n))
\Psi1,...,\Psin
Example (block decomposition): If
x=(x1,x2,...,x5)\inR5
n=3
x(1)=(x1,x3),x(2)=(x2,x5)
x(3)=x4
Example (block-separable regularizers):
n=N;\Psi(x)=\|x\|1=
n | |
\sum | |
i=1 |
|xi|
N=N1+N2+...+Nn;\Psi(x)=
n | |
\sum | |
i=1 |
\|x(i)\|2
x(i)\in
Ni | |
R |
\| ⋅ \|2
Consider the optimization problem
min | |
x\inRn |
f(x),
where
f
Smoothness: By smoothness we mean the following: we assumethe gradient of
f
L1,L2,...,Ln
|\nablaif(x+hei)-\nablaif(x)|\leqLi|h|,
for all
x\inRn
h\inR
\nablai
x(i)
Nesterov, and Richtarik and Takac showed that the following algorithm converges to the optimal point: Input:
x0\inRn
x
i\in\{1,2,...,n\}
x(i)=x(i)-
1{L | |
i} |
\nablaif(x)
Since the iterates of this algorithm are random vectors, a complexity result would give a bound on the number of iterations needed for the method to output an approximate solution with high probability. It was shown in that if
k\geq
2nRL(x0) | |
\epsilon |
log\left(
| |||||||
\epsilon\rho |
\right)
RL(x)=maxy
max | |
x*\inX* |
\{
*\| | |
\|y-x | |
L |
:f(y)\leqf(x)\}
f*
f*=
min | |
x\inRn |
\{f(x)\}
\rho\in(0,1)
\epsilon>0
*> | |
Prob(f(x | |
k)-f |
\epsilon)\leq\rho
The following Figure showshow
xk
f(x)=\tfrac{1}{2}xT\left(\begin{array}{cc} 1&0.5\ 0.5&1 \end{array} \right) x-\left(\begin{array}{cc} 1.5&1.5 \end{array} \right)x, x0=\left(\begin{array}{cc} 0&0 \end{array} \right)T
One can naturally extend this algorithm not only just to coordinates, but to blocks of coordinates. Assume that we have space
R5
e1=
T, e | |
(1,0,0,0,0) | |
2 |
=
T, e | |
(0,1,0,0,0) | |
3 |
=
T, e | |
(0,0,1,0,0) | |
4 |
=
T, e | |
(0,0,0,1,0) | |
5 |
=(0,0,0,0,1)T