The compact representation for quasi-Newton methods is a matrix decomposition, which is typically used in gradient based optimization algorithms or for solving nonlinear systems. The decomposition uses a low-rank representation for the direct and/or inverse Hessian or the Jacobian of a nonlinear system. Because of this, the compact representation is often used for large problems and constrained optimization.
The compact representation of a quasi-Newton matrix for the inverse Hessian
Hk
Bk
f(x):Rn\toR
k
2k
\nablaf(xk)=gk
\{si-1=xi-xi-1,yi-1=gi-gi-1
k | |
\} | |
i=1 |
r=k
r=2k
n x r
Uk,Jk
r x r
Mk,Nk
si,yi
Hk=H0+Uk
-1 | |
M | |
k |
T | |
U | |
k, |
and Bk=B0+Jk
-1 | |
N | |
k |
T | |
J | |
k |
Because of the special matrix decomposition the compact representation is implemented in state-of-the-art optimization software.[3] [4] [5] [6] When combined with limited-memory techniques it is a popular technique for constrained optimization with gradients.[7] Linear algebra operations can be done efficiently, like matrix-vector products, solves or eigendecompositions. It can be combinedwith line-search and trust region techniques, and the representation has been developed for many quasi-Newtonupdates. For instance, the matrix vector product with the direct quasi-Newton Hessian and an arbitraryvector
g\inRn
(0) | |
\begin{align} p | |
k |
&=
T | |
J | |
k |
g\\ solve Nk
(1) | |
p | |
k |
&=
(0) | |
p | |
k |
(Nkissmall)
(2) | |
\\ p | |
k |
&=Jk
(1) | |
p | |
k |
(3) | |
\\ p | |
k |
&=H0g\\ p\phantom{(4)
In the context of the GMRES method, Walker[8] showed that a product of Householder transformations (an identity plus rank-1) can be expressed as a compact matrix formula. This led to the derivationof an explicit matrix expression for the product of
k
~Yk=\begin{bmatrix}y0&y1&\ldotsyk-1\end{bmatrix},
~(Rk)ij=
T | |
s | |
i-1 |
yj-1,
~\rhoi-1=
T | |
1/s | |
i-1 |
yi-1
1\lei\lej\lek
k
Vi
A parametric family of quasi-Newton updates includes many of the most known formulas.[9] Forarbitrary vectors
vk
ck
T | |
v | |
k |
yk\ne0
T | |
c | |
k |
sk\ne0
By making specific choices for the parameter vectors
vk
ck
vk | method | ck | method | |||||||
---|---|---|---|---|---|---|---|---|---|---|
sk | sk | PSB (Powell Symmetric Broyden) | ||||||||
yk | Greenstadt's | yk | ||||||||
sk-Hkyk | yk-Bksk | SR1 | ||||||||
sk | MSS (Multipoint-Symmetric-Secant) |
Collecting the updating vectors of the recursive formulas into matrices, define
upper triangular
lower triangular
and diagonal
With these definitions the compact representations of general rank-2 updates in and (including the well known quasi-Newton updates in Table 1) have been developed in Brust:[11]
and the formula for the direct Hessian is
For instance, when
Vk=Sk
Prior to the development of the compact representations of and,equivalent representations have been discovered for most known updates (see Table 1).
Along with the SR1 representation, the BFGS (Broyden-Fletcher-Goldfarb-Shanno) compact representation was the first compact formula known. In particular, the inverse representation is given by
Hk=H0+Uk
-1 | |
M | |
k |
T | |
U | |
k, |
Uk=\begin{bmatrix}Sk&H0Yk\end{bmatrix},
-1 | |
M | |
k |
= \left[\begin{smallmatrix}
-T | |
R | |
k(D |
T | |
k |
H0Yk)
-1 | |
R | |
k |
&
-T | |
-R | |
k |
-1 | |
\ -R | |
k |
&0\end{smallmatrix}\right]
Bk=B0+Jk
-1 | |
N | |
k |
T | |
J | |
k, |
Jk=\begin{bmatrix}B0Sk&Yk\end{bmatrix}, Nk= \left[\begin{smallmatrix}STB0Sk&Lk
T | |
\ L | |
k |
&-Dk\end{smallmatrix}\right]
The SR1 (Symmetric Rank-1) compact representation was first proposed in. Using the definitions of
Dk,Lk
Rk
Hk=H0+Uk
-1 | |
M | |
k |
T | |
U | |
k, |
Uk=Sk-H0Yk, Mk
T | |
= R | |
k-D |
T | |
k |
H0Yk
The direct Hessian is obtained by the Sherman-Morrison-Woodbury identity and has the form
Bk=B0+Jk
-1 | |
N | |
k |
T | |
J | |
k, |
Jk=Yk-B0Sk, Nk= Dk+L
T | |
k |
B0Sk
The multipoint symmetric secant (MSS) method is a method that aims to satisfy multiple secant equations. The recursiveupdate formula was originally developed by Burdakov.[12] The compact representation for the direct Hessian was derived in [13]
Bk=B0+Jk
-1 | |
N | |
k |
T | |
J | |
k, |
Jk=\begin{bmatrix}Sk&Yk-B0Sk\end{bmatrix}, Nk= \left[\begin{smallmatrix}
T | |
W | |
k |
B0Sk-(Rk-Dk
T | |
+R | |
k))W |
k&Wk\ Wk&0\end{smallmatrix}\right]-1, Wk=
T | |
(S | |
k |
-1 | |
S | |
k) |
Another equivalent compact representation for the MSS matrix is derived by rewriting
Jk
Jk=\begin{bmatrix}Sk&B0Yk\end{bmatrix}
Since the DFP (Davidon Fletcher Powell) update is the dual of the BFGS formula (i.e., swapping
Hk\leftrightarrowBk
H0\leftrightarrowB0
yk\leftrightarrowsk
The PSB (Powell-Symmetric-Broyden) compact representation was developed for the direct Hessian approximation.[16] It is equivalent to substituting
Ck=Sk
Bk=B0+Jk
-1 | |
N | |
k |
T | |
J | |
k, |
Jk=\begin{bmatrix}Sk&Yk-B0Sk\end{bmatrix}, Nk= \left[\begin{smallmatrix}0&
SS | |
R | |
k |
T | |
\ (R | |
k) |
&
T | |
R | |
k-(D |
k+
T | |
S | |
k |
B0Sk)\end{smallmatrix}\right]
For structured optimization problems in which the objective function can be decomposed into two parts
f(x)=\widehat{k}(x)+\widehat{u}(x)
\widehat{k}(x)
\widehat{u}(x)
Jk
Nk
The reduced compact representation (RCR) of BFGS is for linear equality constrained optimization
minimizef(x)subjectto:Ax=b
A
Sk,Yk
yi
A
B0
Kk= \begin{bmatrix} Bk&AT\\ A&0 \end{bmatrix}, B0=
1 | |
\gammak |
I, H0=\gammakI, \gammak>0
The most common use of the compact representations is for the limited-memory setting where
m\lln
m\in[5,12]
m
\{(si,yi
k-1 | |
\} | |
i=k-m |
\{vi
k-1 | |
\} | |
i=k-m |
\{ci
k-1 | |
\} | |
i=k-m |
(0) | |
H | |
k |
=\gammakI
\gammak=
T | |
y | |
k-1 |
sk-1/
T | |
y | |
k-1 |
yk-1
(0) | |
B | |
k |
=
1 | |
\gammak |
I
n
Sk\inRn
Yk\inRn
Vk,Ck
Sk=\begin{bmatrix}sk-l-1&\ldots&sk-1\end{bmatrix}
Yk=\begin{bmatrix}yk-l-1&\ldots&yk-1\end{bmatrix}
Open source implementations include:
optim
general-purpose optimizer routine uses the L-BFGS-B method.Non open source implementations include:
Sk+1=\begin{bmatrix} s0&\ldots&sk
S | |
\end{bmatrix},~P | |
k |
=I-Sk+1
T | |
(S | |
k+1 |
Sk+1)-1
T | |
S | |
k+1 |