Compact quasi-Newton representation explained

The compact representation for quasi-Newton methods is a matrix decomposition, which is typically used in gradient based optimization algorithms or for solving nonlinear systems. The decomposition uses a low-rank representation for the direct and/or inverse Hessian or the Jacobian of a nonlinear system. Because of this, the compact representation is often used for large problems and constrained optimization.

Definition

The compact representation of a quasi-Newton matrix for the inverse Hessian

H_k

ordirect Hessian

B_k

of a nonlinear objective function

f(x):Rⁿ\toR

expresses a sequence of recursive rank-1or rank-2 matrix updates as one rank-

or rank-

update of an initial matrix.^[1] ^[2] Because it is derived from quasi-Newton updates,it uses differences of iterates and gradients

\nablaf(x_k)=g_k

in its definition

\{s_i-1=x_i-x_i-1,y_i-1=g_i-g_i-1

	k
\}
	i=1

.In particular, for

r=k

r=2k

the rectangular

n x r

matrices

U_k,J_k

and the

r x r

square symmetric systems

M_k,N_k

depend on the

s_i,y_i

's and define the quasi-Newton representations

H_k=H₀+U_k

	-1
M
	k

	T
U
	k,

and B_k=B₀+J_k

	-1
N
	k

	T
J
	k

Applications

Because of the special matrix decomposition the compact representation is implemented in state-of-the-art optimization software.^[3] ^[4] ^[5] ^[6] When combined with limited-memory techniques it is a popular technique for constrained optimization with gradients.^[7] Linear algebra operations can be done efficiently, like matrix-vector products, solves or eigendecompositions. It can be combinedwith line-search and trust region techniques, and the representation has been developed for many quasi-Newtonupdates. For instance, the matrix vector product with the direct quasi-Newton Hessian and an arbitraryvector

g\inRⁿ

is:

	(0)
\begin{align} p
	k

	T
J
	k

g\\ solve N_k

	(1)
p
	k

	(0)
p
	k

(N_kissmall)

	(2)
\\ p
	k

&=J_k

	(1)
p
	k

	(3)
\\ p
	k

&=H₀g\\ p^\phantom{(4)

}_k &= p^_k + p^_k\end

Background

In the context of the GMRES method, Walker^[8] showed that a product of Householder transformations (an identity plus rank-1) can be expressed as a compact matrix formula. This led to the derivationof an explicit matrix expression for the product of

identity plus rank-1 matrices.Specifically, for

S_k = \begin s_0 & s_1 & \ldots s_ \end,

~Y_k=\begin{bmatrix}y₀&y₁&\ldotsy_k-1\end{bmatrix},

~(R_k)_ij=

	T
s
	i-1

y_j-1,

~\rho_i-1=

	T
1/s
	i-1

y_i-1

and

~V_i = I - \rho_ y_ s^T_

when

1\lei\lej\lek

the product of

rank-1 updates to the identity is

\prod_^k V_ = \left(I - \rho_ y_ s^T_ \right) \cdots \left(I - \rho_ y_ s^T_ \right)= I - Y_k R^_k S^T_k

The BFGS update can be expressed in terms of products of the

V_i

's, which have a compact matrix formula. Therefore, the BFGS recursion can exploit these block matrixrepresentations

Recursive quasi-Newton updates

A parametric family of quasi-Newton updates includes many of the most known formulas.^[9] Forarbitrary vectors

v_k

and

c_k

such that

	T
v
	k

y_k\ne0

and

	T
c
	k

s_k\ne0

general recursive update formulas for the inverse and direct Hessianestimates are

By making specific choices for the parameter vectors

v_k

and

c_k

well knownmethods are recovered

Table 1: Quasi-Newton updates parametrized by vectors

v_k

and

c_k

v_k

method

c_k

method

s_k

PSB (Powell Symmetric Broyden)

y_k

Greenstadt's

y_k

s_k-H_ky_k

y_k-B_ks_k

SR1

	S
P
	k

s_k

^[10]

MSS (Multipoint-Symmetric-Secant)

Compact Representations

Collecting the updating vectors of the recursive formulas into matrices, define

$S_k = \begin s_0 & s_1 & \ldots & s_ \end,$ $Y_k = \begin y_0 & y_1 & \ldots & y_ \end,$ $V_k = \begin v_0 & v_1 & \ldots & v_ \end,$ $C_k = \begin c_0 & c_1 & \ldots & c_ \end,$

upper triangular

$\big(R_k\big)_ := \big(R^_k\big)_ = s^T_y_, \quad \big(R^_k\big)_ = v^T_y_, \quad \big(R^_k\big)_ = c^T_s_, \quad \quad \text 1 \le i \le j \le k$

lower triangular

$\big(L_k\big)_ := \big(L^_k\big)_ = s^T_y_, \quad \big(L^_k\big)_ = v^T_y_, \quad \big(L^_k\big)_ = c^T_s_, \quad \quad \text 1 \le j < i \le k$

and diagonal

$(D_k)_ := \big(D^_k\big)_ = s^T_y_, \quad \quad \text 1 \le i = j \le k$

With these definitions the compact representations of general rank-2 updates in and (including the well known quasi-Newton updates in Table 1) have been developed in Brust:^[11]

$U_k = \begin V_k & S_k - H_0 Y_k \end$

$M_k = \begin 0_ & R^_k \\ \big(R^_k \big)^T & R_k+R^T_k-(D_k+Y^T_k H_0 Y_k) \end$

and the formula for the direct Hessian is

$J_k = \begin C_k & Y_k - B_0 S_k \end$

$N_k = \begin 0_ & R^_k \\ \big(R^_k \big)^T & R_k+R^T_k-(D_k+S^T_k B_0 S_k) \end$

For instance, when

V_k=S_k

the representation in isthe compact formula for the BFGS recursion in .

Specific Representations

Prior to the development of the compact representations of and,equivalent representations have been discovered for most known updates (see Table 1).

Along with the SR1 representation, the BFGS (Broyden-Fletcher-Goldfarb-Shanno) compact representation was the first compact formula known. In particular, the inverse representation is given by

H_k=H₀+U_k

	-1
M
	k

	T
U
	k,

U_k=\begin{bmatrix}S_k&H₀Y_k\end{bmatrix},

	-1
M
	k

= \left[\begin{smallmatrix}

	-T
R
	k(D

	T

	k

H₀Y_k)

	-1
R
	k

	-T
-R
	k

	-1
\ -R
	k

&0\end{smallmatrix}\right]

The direct Hessian approximation can be found by applying the Sherman-Morrison-Woodbury identity to the inverse Hessian:

B_k=B₀+J_k

	-1
N
	k

	T
J
	k,

J_k=\begin{bmatrix}B₀S_k&Y_k\end{bmatrix}, N_k= \left[\begin{smallmatrix}S^TB₀S_k&L_k

	T
\ L
	k

&-D_k\end{smallmatrix}\right]

The SR1 (Symmetric Rank-1) compact representation was first proposed in. Using the definitions of

D_k,L_k

and

R_k

from above, the inverse Hessian formula is given by

H_k=H₀+U_k

	-1
M
	k

	T
U
	k,

U_k=S_k-H₀Y_k, M_k

	T
= R
	k-D

	T

	k

H₀Y_k

The direct Hessian is obtained by the Sherman-Morrison-Woodbury identity and has the form

B_k=B₀+J_k

	-1
N
	k

	T
J
	k,

J_k=Y_k-B₀S_k, N_k= D_k+L

	T

	k

B₀S_k

MSS

The multipoint symmetric secant (MSS) method is a method that aims to satisfy multiple secant equations. The recursiveupdate formula was originally developed by Burdakov.^[12] The compact representation for the direct Hessian was derived in ^[13]

B_k=B₀+J_k

	-1
N
	k

	T
J
	k,

J_k=\begin{bmatrix}S_k&Y_k-B₀S_k\end{bmatrix}, N_k= \left[\begin{smallmatrix}

	T
W
	k

B₀S_k-(R_k-D_k

	T
+R
	k))W

_k&W_k\ W_k&0\end{smallmatrix}\right]^-1, W_k=

	T
(S
	k

	-1
S
	k)

Another equivalent compact representation for the MSS matrix is derived by rewriting

J_k

in terms of

J_k=\begin{bmatrix}S_k&B₀Y_k\end{bmatrix}

.^[14] The inverse representation can be obtained by application for the Sherman-Morrison-Woodbury identity.

Since the DFP (Davidon Fletcher Powell) update is the dual of the BFGS formula (i.e., swapping

H_k\leftrightarrowB_k

H₀\leftrightarrowB₀

and

y_k\leftrightarrows_k

in the BFGS update), the compact representation for DFP can be immediately obtained from the one for BFGS.^[15]

PSB

The PSB (Powell-Symmetric-Broyden) compact representation was developed for the direct Hessian approximation.^[16] It is equivalent to substituting

C_k=S_k

B_k=B₀+J_k

	-1
N
	k

	T
J
	k,

J_k=\begin{bmatrix}S_k&Y_k-B₀S_k\end{bmatrix}, N_k= \left[\begin{smallmatrix}0&

	SS
R
	k

	T
\ (R
	k)

	T
R
	k-(D

_k+

	T
S
	k

B₀S_k)\end{smallmatrix}\right]

Structured BFGS

For structured optimization problems in which the objective function can be decomposed into two parts

f(x)=\widehat{k}(x)+\widehat{u}(x)

, where the gradients and Hessian of

\widehat{k}(x)

are known but only the gradient of

\widehat{u}(x)

is known, structured BFGS formulasexist. The compact representation of these methods has the general form of,with specific

J_k

and

N_k

.^[17]

Reduced BFGS

The reduced compact representation (RCR) of BFGS is for linear equality constrained optimization

minimizef(x)subjectto:Ax=b

, where

is underdetermined. In addition to the matrices

S_k,Y_k

the RCR also stores the projections of the

y_i

's onto the nullspace of

B_k

\big(K^_k \big)_ = H_0 + U_k M^_k U^T_k, \quad U_k = \begin A^T & S_k & Z_k \end, \quad M_k =\left[\begin{smallmatrix} - AA^T / \gamma_k & \\ & G_k \end{smallmatrix} \right], \quad G_k = \left[\begin{smallmatrix} R^{-T}_k(D_k+Y^T_k H_0 Y_k) R^{-1}_k & -H_0 R^{-T}_k \\ -H_0R^{-1}_k & 0 \end{smallmatrix} \right]^

Limited Memory

The most common use of the compact representations is for the limited-memory setting where

Compact quasi-Newton representation explained

Definition

Applications

Background

Recursive quasi-Newton updates

Compact Representations

Specific Representations

MSS

PSB

Structured BFGS

Reduced BFGS

Limited Memory

Implementations

Notes and References