The Barzilai-Borwein method[1] is an iterative gradient descent method for unconstrained optimization using either of two step sizes derived from the linear trend of the most recent two iterates. This method, and modifications, are globally convergent under mild conditions,[2] and perform competitively with conjugate gradient methods for many problems.[3] Not depending on the objective itself, it can also solve some systems of linear and non-linear equations.
To minimize a convex function
f:Rn → R
g
x
gk-1(xk-1)
gk(xk)
xk=xk-1-\alphak-1gk-1
\alphak-1
\Deltax=xk-xk-1
\Deltag=gk-gk-1
A Barzilai-Borwein (BB) iteration is
xk+1=xk-\alphakgk
\alphak
[long BB step]
\alpha
LONG | ||
= | ||
k |
\Deltax ⋅ \Deltax | |
\Deltax ⋅ \Deltag |
[short BB step]
\alpha
SHORT | ||
= | ||
k |
\Deltax ⋅ \Deltag | |
\Deltag ⋅ \Deltag |
Barzilai-Borwein also applies to systems of equations
g(x)=0
g:Rn → Rn
g
\Deltax ⋅ \Deltag
Despite its simplicity and optimality properties, Cauchy's classical steepest-descent method[4] for unconstrained optimization often performs poorly.[5] This has motivated many to propose alternate search directions, such as the conjugate gradient method. Jonathan Barzilai and Jonathan Borwein instead proposed new step sizes for the gradient by approximating the quasi-Newton method, creating a scalar approximation of the Hessian estimated from the finite differences between two evaluation points of the gradient, these being the most recent two iterates.
In a quasi-Newton iteration,
xk+1
-1 | |
=x | |
k-B |
g(xk)
where
B
g
Bk\Deltaxk=\Deltagk
B
1/\alpha
1 | |
\alpha |
\Deltax ≈ \Deltag
[1] Minimize
\|\Deltax/\alpha-\Deltag\|2
\alpha
[2] Minimize
\|\Deltax-\alpha\Deltag\|2
\alpha
In one dimension, both BB step sizes are equal and same as the classical secant method.
The long BB step size is the same as a linearized Cauchy step, i.e. the first estimate using a secant-method for the line search (also, for linear problems). The short BB step size is same as a linearized minimum-residual step. BB applies the step sizes upon the forward direction vector for the next iterate, instead of the prior direction vector as if for another line-search step.
Barzilai and Borwein proved their method converges R-superlinearly for quadratic minimization in two dimensions. Raydan demonstrates convergence in general for quadratic problems. Convergence is usually non-monotone, that is, neither the objective function nor the residual or gradient magnitude necessarily decrease with each iteration along a successful convergence toward the solution.
If
f
A
1/\alphaLONG
A
\Deltax
1/\alphaSHORT
A
\sqrt{A}\Deltax
\sqrtA
(\sqrt{A})T\sqrt{A}=A
Fletcher compared its computational performance to conjugate gradient (CG) methods, finding CG tending faster for linear problems, but BB often faster for non-linear problems versus applicable CG-based methods.
BB has low storage requirements, suitable for large systems with millions of elements in
x
\alphaSHORT | |
\alphaLONG |
=cos2(
\Deltax
\Deltag)
Since being demonstrated by Raydan,[6] BB is often applied with the non-monotone safeguarding strategy of Grippo, Lampariello, and Lucidi.[7] This tolerates some rise of the objective, but excessive rise initiates a backtracking line search using smaller step sizes, to assure global convergence. Fletcher finds that allowing wider limits for non-monotonicity tend to result in more efficient convergence.
Others[8] [9] [10] [11] have identified a step size being the geometric mean between the long and short BB step sizes, which exhibits similar properties.