Adjoint state method explained

The adjoint state method is a numerical method for efficiently computing the gradient of a function or operator in a numerical optimization problem.^[1] It has applications in geophysics, seismic imaging, photonics and more recently in neural networks.^[2]

The adjoint state space is chosen to simplify the physical interpretation of equation constraints.^[3]

Adjoint state techniques allow the use of integration by parts, resulting in a form which explicitly contains the physically interesting quantity. An adjoint state equation is introduced, including a new unknown variable.

The adjoint method formulates the gradient of a function towards its parameters in a constraint optimization form. By using the dual form of this constraint optimization problem, it can be used to calculate the gradient very fast. A nice property is that the number of computations is independent of the number of parameters for which you want the gradient.The adjoint method is derived from the dual problem^[4] and is used e.g. in the Landweber iteration method.^[5]

A^*=\overlineA^T

is used.

When the initial problem consists of calculating the product

s^Tx

and

must satisfy

Ax=b

, the dual problem can be realized as calculating the product, where

must satisfy

A^*r=s

. And

is called the adjoint state vector.

General case

The original adjoint calculation method goes back to Jean Cea,^[6] with the use of the Lagrangian of the optimization problem to compute the derivative of a functional with respect to a shape parameter.

For a state variable

u\inl{U}

, an optimization variable

v\inl{V}

, an objective functional

J:l{U} x l{V}\toR

is defined. The state variable

is often implicitly dependent on

through the (direct) state equation

D_v(u)=0

(usually the weak form of a partial differential equation), thus the considered objective is

j(v)=J(u_v,v)

. Usually, one would be interested in calculating

\nablaj(v)

using the chain rule:

\nablaj(v)=\nabla_vJ(u_v,v)+\nabla_uJ(u_v)\nabla_vu_v.

Unfortunately, the term

\nabla_vu_v

is often very hard to differentiate analytically since the dependance is defined through an implicit equation. The Lagrangian functional can be used as a workaround for this issue. Since the state equation can be considered as a constraint in the minimization of

, the problem

minimize j(v)=J(u_v,v)

subjectto D_v(u_v)=0

has an associate Lagrangian functional

l{L}:l{U} x l{V} x l{U}\toR

defined by

l{L}(u,v,λ)=J(u,v)+\langleD_{v(u),λ\rangle,}

where

λ\inl{U}

is a Lagrange multiplier or adjoint state variable and

\langle ⋅ , ⋅ \rangle

is an inner product on

l{U}

. The method of Lagrange multipliers states that a solution to the problem has to be a stationary point of the lagrangian, namely

\begin{cases} d_{ul{L}(u,v,λ;\delta}_u)=d_{uJ(u,v;\delta}_u)+\langle\delta_u,D

	(λ)\rangle*

	v

=0&\forall\delta_u\inl{U},\\ d_{vl{L}(u,v,λ;\delta}_v)=d_{vJ(u,v;\delta}_v)+\langled_vD_v(u;\delta_v),λ\rangle=0&\forall\delta_{v\inl{V},\\
d}_{λl{L}(u,v,λ;\delta}_λ)=\langleD_v(u),\delta_λ\rangle=0 &\forall\delta_λ\inl{U}, \end{cases}

where

d_xF(x;\delta_x)

is the Gateaux derivative of

with respect to

in the direction

\delta_x

. The last equation is equivalent to

D_v(u)=0

, the state equation, to which the solution is

u_v

. The first equation is the so-called adjoint state equation,

\langle\delta_u,D

	(λ)\rangle*

	v

=-d_uJ(u_v,v;\delta_u) \forall\delta_u\inl{U},

because the operator involved is the adjoint operator of

D_v

	*
D
	v

. Resolving this equation yields the adjoint state

λ_v

.The gradient of the quantity of interest

with respect to

\langle\nablaj(v),\delta_v\rangle=d_vj(v;\delta_v)=d_vl{L}(u_v,v,λ_v;\delta_v)

(the second equation with

u=u_v

and

λ=λ_v

), thus it can be easily identified by subsequently resolving the direct and adjoint state equations. The process is even simpler when the operator

D_v

is self-adjoint or symmetric since the direct and adjoint state equations differ only by their right-hand side.

Example: Linear case

In a real finite dimensional linear programming context, the objective function could be

J(u,v)=\langleAu,v\rangle

, for

v\inRⁿ

u\inR^m

and

A\inR^{n x}

, and let the state equation be

B_vu=b

, with

B_v\inR^{m x}

and

b\inR^m

The lagrangian function of the problem is

l{L}(u,v,λ)=\langleAu,v\rangle+\langleB_vu-b,λ\rangle

, where

λ\inR^m

The derivative of

l{L}

with respect to

yields the state equation as shown before, and the state variable is

u_v=

	-1
B
	v

. The derivative of

l{L}

with respect to

is equivalent to the adjoint equation, which is, for every

\delta_u\inR^m

d_u[\langleB_{v ⋅ -}b,λ\rangle](u;\delta_u)=-\langleA^\topv,\deltau\rangle\iff\langleB_v\delta_u,λ\rangle=-\langleA^\topv,\deltau\rangle\iff\langle

	\top
B
	v

λ+A^\topv,\delta_u\rangle=0\iff

	\top
B
	v

λ=-A^\topv.

Thus, we can write symbolically

λ_v=

	-\top
B
	v

A^\topv

. The gradient would be

\langle\nablaj(v),\delta_v\rangle=\langleAu_v,\delta_v\rangle+\langle\nabla_vB_v:λ_{v ⊗}u_v,\delta_v\rangle,

where

\nabla_vB_v=

	\partialB_ij
	\partialv_k

is a third order tensor,

λ_v ⊗ u_v=

	\top
λ
	v

u_v

is the dyadic product between the direct and adjoint states and

denotes a double tensor contraction. It is assumed that

B_v

has a known analytic expression that can be differentiated easily.

Numerical consideration for the self-adjoint case

If the operator

B_v

was self-adjoint,

B_v=

	\top
B
	v

, the direct state equation and the adjoint state equation would have the same left-hand side. In the goal of never inverting a matrix, which is a very slow process numerically, a LU decomposition can be used instead to solve the state equation, in

O(m³⁾

operations for the decomposition and

O(m²⁾

operations for the resolution. That same decomposition can then be used to solve the adjoint state equation in only

O(m²⁾

operations since the matrices are the same.

References

Pollini. Nicolò. Lavan. Oren. Amir. Oded. 2018-06-01. Adjoint sensitivity analysis and optimization of hysteretic dynamic systems with nonlinear viscous dampers. Structural and Multidisciplinary Optimization. en. 57. 6. 2273–2289. 10.1007/s00158-017-1858-2. 125712091. 1615-1488.
Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud Neural Ordinary Differential Equations Available online
Plessix, R-E. "A review of the adjoint-state method for computing the gradient of a functional with geophysical applications." Geophysical Journal International, 2006, 167(2): 495-503. free access on GJI website
McNamara . Antoine . Treuille . Adrien . Popović . Zoran . Stam . Jos . Fluid control using the adjoint method . 28 October 2022 . https://web.archive.org/web/20220129011505/https://www.dgp.toronto.edu/public_user/stam/reality/Research/pdf/sig04.pdf . 29 January 2022 . ACM Transactions on Graphics . 23 . 3. 449–456 . 10.1145/1015706.1015744 . August 2004 . live.
Web site: Lundvall . Johan . Data Assimilation in Fluid Dynamics using Adjoint Optimization . . 28 October 2022 . https://web.archive.org/web/20221009083111/http://liu.diva-portal.org/smash/get/diva2:24091/FULLTEXT01.pdf . 9 October 2022 . Sweden . en . 2007 . live.
Cea. Jean. 1986. Conception optimale ou identification de formes, calcul rapide de la dérivée directionnelle de la fonction coût. ESAIM: Mathematical Modelling and Numerical Analysis - Modélisation Mathématique et Analyse Numérique. fr. 20. 3. 371–402. 10.1051/m2an/1986200303711 . free.

External links

A well written explanation by Errico: What is an adjoint Model?
Another well written explanation with worked examples, written by Bradley http://cs.stanford.edu/~ambrad/adjoint_tutorial.pdf
More technical explanation: A review of the adjoint-state method for computing the gradient of a functional with geophysical applications
MIT course https://web.archive.org/web/20150906095132/http://ocw.mit.edu/courses/mathematics/18-325-topics-in-applied-mathematics-waves-and-imaging-fall-2012/lecture-notes/MIT18_325F12_Chapter4.pdf
MIT notes http://math.mit.edu/~stevenj/18.336/adjoint.pdf

Adjoint state method explained

General case

Example: Linear case

Numerical consideration for the self-adjoint case

See also

References

External links