Wolfe conditions explained

In the unconstrained minimization problem, the Wolfe conditions are a set of inequalities for performing inexact line search, especially in quasi-Newton methods, first published by Philip Wolfe in 1969.^[1] ^[2]

f\colonR^n\toR

. Each step often involves approximately solving the subproblem

\min_ f(\mathbf_k + \alpha \mathbf_k)

where

x_k

is the current best guess,

p_k\inRⁿ

is a search direction, and

\alpha\inR

is the step length.

The inexact line searches provide an efficient way of computing an acceptable step length

\alpha

that reduces the objective function 'sufficiently', rather than minimizing the objective function over

\alpha\inR⁺

exactly. A line search algorithm can use Wolfe conditions as a requirement for any guessed

\alpha

, before finding a new search direction

p_k

Armijo rule and curvature

A step length

\alpha_k

is said to satisfy the Wolfe conditions, restricted to the direction

p_k

, if the following two inequalities hold:with

0<c₁<c₂<1

. (In examining condition (ii), recall that to ensure that

p_k

is a descent direction, we have

	T
p
	k

\nablaf(x_k)<0

, as in the case of gradient descent, where

p_k=-\nablaf(x_k)

, or Newton–Raphson, where

p_k=-H^-1\nablaf(x_k)

with

positive definite.)

c₁

is usually chosen to be quite small while

c₂

is much larger; Nocedal and Wright give example values of

c₁=10^-4

and

c₂=0.9

for Newton or quasi-Newton methods and

c_2=0.1

for the nonlinear conjugate gradient method.^[3] Inequality i) is known as the Armijo rule^[4] and ii) as the curvature condition; i) ensures that the step length

\alpha_k

decreases

'sufficiently', and ii) ensures that the slope has been reduced sufficiently. Conditions i) and ii) can be interpreted as respectively providing an upper and lower bound on the admissible step length values.

Strong Wolfe condition on curvature

Denote a univariate function

\varphi

restricted to the direction

p_k

\varphi(\alpha)=f(x_k+\alphap_k)

. The Wolfe conditions can result in a value for the step length that is not close to a minimizer of

\varphi

. If we modify the curvature condition to the following,

then i) and iii) together form the so-called strong Wolfe conditions, and force

\alpha_k

to lie close to a critical point of

\varphi

Rationale

The principal reason for imposing the Wolfe conditions in an optimization algorithm where

x_k+1=x_k+\alphap_k

is to ensure convergence of the gradient to zero. In particular, if the cosine of the angle between

p_k

and the gradient,

\cos \theta_k = \frac

is bounded away from zero and the i) and ii) conditions hold, then

\nablaf(x_k) → 0

An additional motivation, in the case of a quasi-Newton method, is that if

p_k=

	-1
-B
	k

\nablaf(x_k)

, where the matrix

B_k

is updated by the BFGS or DFP formula, then if

B_k

is positive definite ii) implies

B_k+1

is also positive definite.

Comments

Wolfe's conditions are more complicated than Armijo's condition, and a gradient descent algorithm based on Armijo's condition has a better theoretical guarantee than one based on Wolfe conditions (see the sections on "Upper bound for learning rates" and "Theoretical guarantee" in the Backtracking line search article).

Notes and References

Wolfe . P. . Convergence Conditions for Ascent Methods . 10.1137/1011036 . . 11 . 2 . 226–235 . 1969 . 2028111.
Wolfe . P. . Convergence Conditions for Ascent Methods. II: Some Corrections . 10.1137/1013035 . . 13 . 2 . 185–188 . 1971 . 2028821 .
Book: Numerical Optimization . Nocedal . Jorge . Jorge Nocedal . Wright . Stephen . 1999 . 38 .
Armijo . Larry . 1966 . Minimization of functions having Lipschitz continuous first partial derivatives . Pacific J. Math. . 16 . 1 . 1–3 . 10.2140/pjm.1966.16.1. free .

Wolfe conditions explained

Armijo rule and curvature

Strong Wolfe condition on curvature

Rationale

Comments

See also

Further reading

Notes and References