Wolfe conditions explained
In the unconstrained minimization problem, the Wolfe conditions are a set of inequalities for performing inexact line search, especially in quasi-Newton methods, first published by Philip Wolfe in 1969.[1] [2]
. Each step often involves approximately solving the subproblem
where
is the current best guess,
is a search direction, and
is the step length.
The inexact line searches provide an efficient way of computing an acceptable step length
that reduces the
objective function 'sufficiently', rather than minimizing the objective function over
exactly. A line search algorithm can use Wolfe conditions as a requirement for any guessed
, before finding a new search direction
.
Armijo rule and curvature
A step length
is said to satisfy the
Wolfe conditions, restricted to the direction
, if the following two inequalities hold:with
. (In examining condition (ii), recall that to ensure that
is a descent direction, we have
, as in the case of
gradient descent, where
, or
Newton–Raphson, where
with
positive definite.)
is usually chosen to be quite small while
is much larger;
Nocedal and Wright give example values of
and
for Newton or quasi-Newton methods and
for the nonlinear
conjugate gradient method.
[3] Inequality i) is known as the
Armijo rule[4] and ii) as the
curvature condition; i) ensures that the step length
decreases
'sufficiently', and ii) ensures that the slope has been reduced sufficiently. Conditions i) and ii) can be interpreted as respectively providing an upper and lower bound on the admissible step length values.
Strong Wolfe condition on curvature
Denote a univariate function
restricted to the direction
as
\varphi(\alpha)=f(xk+\alphapk)
. The Wolfe conditions can result in a value for the step length that is not close to a minimizer of
. If we modify the curvature condition to the following,
then i) and iii) together form the so-called strong Wolfe conditions, and force
to lie close to a
critical point of
.
Rationale
The principal reason for imposing the Wolfe conditions in an optimization algorithm where
is to ensure convergence of the gradient to zero. In particular, if the cosine of the angle between
and the gradient,
is bounded away from zero and the i) and ii) conditions hold, then
.
An additional motivation, in the case of a quasi-Newton method, is that if
, where the matrix
is updated by the
BFGS or
DFP formula, then if
is positive definite ii) implies
is also positive definite.
Comments
Wolfe's conditions are more complicated than Armijo's condition, and a gradient descent algorithm based on Armijo's condition has a better theoretical guarantee than one based on Wolfe conditions (see the sections on "Upper bound for learning rates" and "Theoretical guarantee" in the Backtracking line search article).
See also
Further reading
- Book: 10.1007/978-0-387-40065-5_3 . Line Search Methods . Numerical Optimization . Springer Series in Operations Research and Financial Engineering . 30–32 . 2006 . 978-0-387-30303-1 .
- Book: 10.1007/978-0-387-40065-5_6 . Quasi-Newton Methods . Numerical Optimization . Springer Series in Operations Research and Financial Engineering . 135–163 . 2006 . 978-0-387-30303-1 .
Notes and References
- Wolfe . P. . Convergence Conditions for Ascent Methods . 10.1137/1011036 . . 11 . 2 . 226–235 . 1969 . 2028111.
- Wolfe . P. . Convergence Conditions for Ascent Methods. II: Some Corrections . 10.1137/1013035 . . 13 . 2 . 185–188 . 1971 . 2028821 .
- Book: Numerical Optimization . Nocedal . Jorge . Jorge Nocedal . Wright . Stephen . 1999 . 38 .
- Armijo . Larry . 1966 . Minimization of functions having Lipschitz continuous first partial derivatives . Pacific J. Math. . 16 . 1 . 1–3 . 10.2140/pjm.1966.16.1. free .