The Frank–Wolfe algorithm is an iterative first-order optimization algorithm for constrained convex optimization. Also known as the conditional gradient method,[1] reduced gradient algorithm and the convex combination algorithm, the method was originally proposed by Marguerite Frank and Philip Wolfe in 1956.[2] In each iteration, the Frank–Wolfe algorithm considers a linear approximation of the objective function, and moves towards a minimizer of this linear function (taken over the same domain).
Suppose
l{D}
f\colonl{D}\toR
Minimize
f(x)
subject to
x\inl{D}
Initialization: Let
k\leftarrow0
x0
l{D}
Step 1. Direction-finding subproblem: Find
sk
Minimize
sT\nablaf(xk)
Subject to
s\inl{D}
(Interpretation: Minimize the linear approximation of the problem given by the first-order Taylor approximation of
f
xk
l{D}
Step 2. Step size determination: Set
\alpha\leftarrow
2 | |
k+2 |
\alpha
f(xk+\alpha(sk-xk))
0\le\alpha\le1
Step 3. Update: Let
xk+1\leftarrowxk+\alpha(sk-xk)
k\leftarrowk+1
While competing methods such as gradient descent for constrained optimization require a projection step back to the feasible set in each iteration, the Frank–Wolfe algorithm only needs the solution of a convex problem over the same set in each iteration, and automatically stays in the feasible set.
The convergence of the Frank–Wolfe algorithm is sublinear in general: the error in the objective function to the optimum is
O(1/k)
The iterations of the algorithm can always be represented as a sparse convex combination of the extreme points of the feasible set, which has helped to the popularity of the algorithm for sparse greedy optimization in machine learning and signal processing problems,[4] as well as for example the optimization of minimum–cost flows in transportation networks.[5]
If the feasible set is given by a set of linear constraints, then the subproblem to be solved in each iteration becomes a linear program.
While the worst-case convergence rate with
O(1/k)
Since
f
x,y\inl{D}
f(y)\geqf(x)+(y-x)T\nablaf(x)
This also holds for the (unknown) optimal solution
x*
f(x*)\gef(x)+(x*-x)T\nablaf(x)
x
\begin{align} f(x*)&\gef(x)+(x*-x)T\nablaf(x)\ &\geqminy\left\{f(x)+(y-x)T\nablaf(x)\right\}\\ &=f(x)-xT\nablaf(x)+minyyT\nablaf(x) \end{align}
The latter optimization problem is solved in every iteration of the Frank–Wolfe algorithm, therefore the solution
sk
k
lk
l0=-infty
lk:=max(lk,f(xk)+(sk-
T | |
x | |
k) |
\nablaf(xk))
lk\leqf(x*)\leqf(xk)
It has been shown that this corresponding duality gap, that is the difference between
f(xk)
lk
f(xk)-lk=O(1/k).