In numerical optimization, the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm is an iterative method for solving unconstrained nonlinear optimization problems. Like the related Davidon–Fletcher–Powell method, BFGS determines the descent direction by preconditioning the gradient with curvature information. It does so by gradually improving an approximation to the Hessian matrix of the loss function, obtained only from gradient evaluations (or approximate gradient evaluations) via a generalized secant method.
Since the updates of the BFGS curvature matrix do not require matrix inversion, its computational complexity is only
l{O}(n2)
l{O}(n3)
The algorithm is named after Charles George Broyden, Roger Fletcher, Donald Goldfarb and David Shanno.
The optimization problem is to minimize
f(x)
x
Rn
f
x
The algorithm begins at an initial estimate
x0
The search direction pk at stage k is given by the solution of the analogue of the Newton equation:
Bkpk=-\nablaf(xk),
where
Bk
xk
\nablaf(xk)
f(xk+\gammapk)
\gamma>0.
The quasi-Newton condition imposed on the update of
Bk
Bk+1(xk+1-xk)=\nablaf(xk+1)-\nablaf(xk).
Let
yk=\nablaf(xk+1)-\nablaf(xk)
sk=xk+1-xk
Bk+1
Bk+1sk=yk
which is the secant equation.
The curvature condition
\top | |
s | |
k |
yk>0
Bk+1
T | |
s | |
k |
Instead of requiring the full Hessian matrix at the point
xk+1
Bk+1
Bk+1=Bk+Uk+Vk.
Both
Uk
Vk
Bk+1
Bk+1=Bk+\alphauu\top+\betavv\top
Bk+1sk=yk
u=yk
v=Bksk
\alpha=
1 | |||||||||
|
,
\beta=-
1 | |||||||||
|
.
Finally, we substitute
\alpha
\beta
Bk+1=Bk+\alphauu\top+\betavv\top
Bk+1
Bk+1=Bk+
| |||||||||||||
|
-
| |||||||||||||||||||
|
.
Consider the following unconstrained optimization problem
From an initial guess
x0\inRn
B0\inRn x
xk
pk
Bkpk=-\nablaf(xk)
\alphak
\alphak=\argminf(xk+\alphapk)
\alphak
sk=\alphakpk
xk+1=xk+sk
yk={\nablaf(xk+1)-\nablaf(xk)}
Bk+1=Bk+
| |||||||||||||
|
-
| |||||||||||||||||||
|
Convergence can be determined by observing the norm of the gradient; given some
\epsilon>0
||\nablaf(xk)||\leq\epsilon.
B0
B0=I
Bk
The first step of the algorithm is carried out using the inverse of the matrix
Bk
-1 | |
B | |
k+1 |
=\left(I-
| |||||||||||||
|
\right)
-1 | |
B | |
k |
\left(I-
| |||||||||||||
|
\right)+
| |||||||||||||
|
.
This can be computed efficiently without temporary matrices, recognizing that
-1 | |
B | |
k |
T | |
y | |
k |
-1 | |
B | |
k |
yk
T | |
s | |
k |
yk
-1 | |
B | |
k+1 |
=
-1 | |
B | |
k |
+
| ||||||||||||||||||||||||||||
|
-
| ||||||||||||||||||||||||||||
|
.
Therefore, in order to avoid any matrix inversion, the inverse of the Hessian can be approximated instead of the Hessian itself:
Hk\overset{\operatorname{def}}{=}
-1 | |
B | |
k |
.
From an initial guess
x0
H0
xk
pk
pk=-Hk\nablaf(xk)
\alphak
\alphak=\argminf(xk+\alphapk)
\alphak
sk=\alphakpk
xk+1=xk+sk
yk={\nablaf(xk+1)-\nablaf(xk)}
Hk+1=Hk+
| ||||||||||||||||||||||
|
-
| |||||||||||||||||||
|
In statistical estimation problems (such as maximum likelihood or Bayesian inference), credible intervals or confidence intervals for the solution can be estimated from the inverse of the final Hessian matrix . However, these quantities are technically defined by the true Hessian matrix, and the BFGS approximation may not converge to the true Hessian matrix.[1]
The BFGS update formula heavily relies on the curvature
\top | |
s | |
k |
yk
In such cases, one of the so-called damped BFGS updates can be used (see) which modify
sk
yk
Notable open source implementations are:
fsolve
function, with trust region extensions.Notable proprietary implementations include: