Hamilton–Jacobi–Bellman equation explained
The Hamilton-Jacobi-Bellman (HJB) equation is a nonlinear partial differential equation that provides necessary and sufficient conditions for optimality of a control with respect to a loss function.[1] Its solution is the value function of the optimal control problem which, once known, can be used to obtain the optimal control by taking the maximizer (or minimizer) of the Hamiltonian involved in the HJB equation.[2] [3]
The equation is a result of the theory of dynamic programming which was pioneered in the 1950s by Richard Bellman and coworkers.[4] [5] [6] The connection to the Hamilton–Jacobi equation from classical physics was first drawn by Rudolf Kálmán.[7] In discrete-time problems, the analogous difference equation is usually referred to as the Bellman equation.
While classical variational problems, such as the brachistochrone problem, can be solved using the Hamilton–Jacobi–Bellman equation,[8] the method can be applied to a broader spectrum of problems. Further it can be generalized to stochastic systems, in which case the HJB equation is a second-order elliptic partial differential equation.[9] A major drawback, however, is that the HJB equation admits classical solutions only for a sufficiently smooth value function, which is not guaranteed in most situations. Instead, the notion of a viscosity solution is required, in which conventional derivatives are replaced by (set-valued) subderivatives.[10]
Optimal Control Problems
Consider the following problem in deterministic optimal control over the time period
:
V(x(0),0)=minu\left\{
C[x(t),u(t)]dt+D[x(T)]\right\}
where
is the scalar cost rate function and
is a function that gives the
bequest value at the final state,
is the system state vector,
is assumed given, and
for
is the control vector that we are trying to find. Thus,
is the
value function.
The system must also be subject to
where
gives the vector determining physical evolution of the state vector over time.
The Partial Differential Equation
For this simple system, the Hamilton–Jacobi–Bellman partial differential equation is
+minu\left\{
⋅ F(x,u)+C(x,u)\right\}=0
subject to the terminal condition
As before, the unknown scalar function
in the above partial differential equation is the Bellman
value function, which represents the cost incurred from starting in state
at time
and controlling the system optimally from then until time
.
Deriving the Equation
Intuitively, the HJB equation can be derived as follows. If
is the optimal cost-to-go function (also called the 'value function'), then by Richard Bellman's
principle of optimality, going from time
t to
t +
dt, we have
V(x(t),t)=minu\left\{V(x(t+dt),t+dt)+
C(x(s),u(s))ds\right\}.
Note that the Taylor expansion of the first term on the right-hand side is
V(x(t+dt),t+dt)=V(x(t),t)+
dt+
⋅
(t)dt+l{o}(dt),
where
denotes the terms in the Taylor expansion of higher order than one in little-
o notation. Then if we subtract
from both sides, divide by
dt, and take the limit as
dt approaches zero, we obtain the HJB equation defined above.
Solving the Equation
The HJB equation is usually solved backwards in time, starting from
and ending at
.
[11] When solved over the whole of state space and
is continuously differentiable, the HJB equation is a
necessary and sufficient condition for an optimum when the terminal state is unconstrained.
[12] If we can solve for
then we can find from it a control
that achieves the minimum cost.
In general case, the HJB equation does not have a classical (smooth) solution. Several notions of generalized solutions have been developed to cover such situations, including viscosity solution (Pierre-Louis Lions and Michael Crandall),[13] minimax solution, and others.
Approximate dynamic programming has been introduced by D. P. Bertsekas and J. N. Tsitsiklis with the use of artificial neural networks (multilayer perceptrons) for approximating the Bellman function in general.[14] This is an effective mitigation strategy for reducing the impact of dimensionality by replacing the memorization of the complete function mapping for the whole space domain with the memorization of the sole neural network parameters. In particular, for continuous-time systems, an approximate dynamic programming approach that combines both policy iterations with neural networks was introduced.[15] In discrete-time, an approach to solve the HJB equation combining value iterations and neural networks was introduced.[16]
Alternatively, it has been shown that sum-of-squares optimization can yield an approximate polynomial solution to the Hamilton–Jacobi–Bellman equation arbitrarily well with respect to the
norm.
[17] Extension to Stochastic Problems
The idea of solving a control problem by applying Bellman's principle of optimality and then working out backwards in time an optimizing strategy can be generalized to stochastic control problems. Consider similar as above
minuE\left\{
C(t,Xt,ut)dt+D(XT)\right\}
now with
the stochastic process to optimize and
the steering. By first using Bellman and then expanding
with
Itô's rule, one finds the stochastic HJB equation
minu\left\{l{A}V(x,t)+C(t,x,u)\right\}=0,
where
represents the
stochastic differentiation operator, and subject to the terminal condition
Note that the randomness has disappeared. In this case a solution
of the latter does not necessarily solve the primal problem, it is a candidate only and a further verifying argument is required. This technique is widely used in Financial Mathematics to determine optimal investment strategies in the market (see for example
Merton's portfolio problem).
Application to LQG-Control
As an example, we can look at a system with linear stochastic dynamics and quadratic cost. If the system dynamics is given by
dxt=(axt+but)dt+\sigmadwt,
and the cost accumulates at rate
, the HJB equation is given by
=
q(t)x2+
ax-
\right)2+
| \partial2V(x,t) |
\partialx2 |
.
with optimal action given by
Assuming a quadratic form for the value function, we obtain the usual
Riccati equation for the Hessian of the value function as is usual for
Linear-quadratic-Gaussian control.
See also
- Bellman equation, discrete-time counterpart of the Hamilton–Jacobi–Bellman equation.
- Pontryagin's maximum principle, necessary but not sufficient condition for optimum, by maximizing a Hamiltonian, but this has the advantage over HJB of only needing to be satisfied over the single trajectory being considered.
Further reading
- Book: Bertsekas, Dimitri P. . Dimitri P. Bertsekas . 2005 . Dynamic Programming and Optimal Control . Athena Scientific.
- Book: Pham, Huyên . The Classical PDE Approach to Dynamic Programming . Continuous-time Stochastic Control and Optimization with Financial Applications . Springer . 2009 . 978-3-540-89499-5 . 37–60 . https://books.google.com/books?id=xBsagiBp1SYC&pg=PA37 .
- Book: Stengel, Robert F. . Conditions for Optimality . Optimal Control and Estimation . New York . Dover . 1994 . 0-486-68200-5 . 201–222 . https://books.google.com/books?id=jDjPxqm7Lw0C&pg=PA201 .
Notes and References
- Book: Kirk, Donald E. . Optimal Control Theory: An Introduction . Englewood Cliffs, NJ . Prentice-Hall . 1970 . 0-13-638098-0 . 86–90 .
- Book: Jiongmin . Yong . Xun Yu . Zhou . Stochastic Controls : Hamiltonian Systems and HJB Equations . Springer . 1999 . 0-387-98723-1 . 157–215 [p. 163] . Dynamic Programming and HJB Equations . https://books.google.com/books?id=CdHuD7E-7XIC&pg=PA163 .
- Book: Naidu, Desineni S. . Optimal Control Systems . Boca Raton . CRC Press . 2003 . 0-8493-0892-5 . 277–283 [p. 280] . The Hamilton–Jacobi–Bellman Equation . https://books.google.com/books?id=hGxurdEZVtkC&pg=PA280 .
- R. E. . Bellman . Dynamic Programming and a new formalism in the calculus of variations . . 40 . 4 . 1954 . 231–235 . 10.1073/pnas.40.4.231. 527981 . 16589462. 1954PNAS...40..231B . free .
- Book: Bellman, R. E. . Dynamic Programming . Princeton, NJ. Princeton University Press . 1957 .
- R. . Bellman . S. . Dreyfus . An Application of Dynamic Programming to the Determination of Optimal Satellite Trajectories . J. Br. Interplanet. Soc. . 17 . 1959 . 78–83 .
- Book: Kálmán, Rudolf E. . The Theory of Optimal Control and the Calculus of Variations . Mathematical Optimization Techniques . Richard . Bellman . Berkeley . University of California Press . 1963 . 309–331 . 1033974 .
- Book: Kemajou-Brown, Isabelle . Probability on Algebraic and Geometric Structures . Contemporary Mathematics . 668 . Gregory . Budzban . Harry Randolph . Hughes . Henri . Schurz . 2016 . Brief History of Optimal Control Theory and Some Recent Developments . 119–130 . 10.1090/conm/668/13400 . 9781470419455 .
- Book: Chang, Fwu-Ranq . Stochastic Optimization in Continuous Time . Cambridge, UK . Cambridge University Press . 2004 . 0-521-83406-6 . 113–168 .
- Book: Martino . Bardi . Italo . Capuzzo-Dolcetta . Optimal Control and Viscosity Solutions of Hamilton–Jacobi–Bellman Equations . Boston . Birkhäuser . 1997 . 0-8176-3640-4 .
- Book: Frank L. . Lewis . Draguna . Vrabie . Vassilis L. . Syrmos . Optimal Control . 3rd . Wiley . 2012 . 278 . 978-0-470-63349-6 .
- Book: Bertsekas, Dimitri P. . Dynamic Programming and Optimal Control . Athena Scientific . 2005 .
- Book: Martino . Bardi . Italo . Capuzzo-Dolcetta . Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations . Boston . Birkhäuser . 1997 . 0-8176-3640-4 .
- Book: Dimitri P. . Bertsekas . John N. . Tsitsiklis . Neuro-dynamic Programming . 1996 . Athena Scientific . 978-1-886529-10-6.
- Murad . Abu-Khalaf . Frank L.. Lewis . Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. 2005 . Automatica . 41 . 5 . 779–791. 10.1016/j.automatica.2004.11.034. 14757582 .
- Asma . Al-Tamimi. Frank L.. Lewis . Murad . Abu-Khalaf . Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof. 2008 . IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics . 38. 4 . 943–949 . 10.1109/TSMCB.2008.926614. 18632382. 14202785.
- Jones . Morgan . Peet . Matthew . Polynomial Approximation of Value Functions and Nonlinear Controller Design with Performance Bounds . 2020 . math.OC . 2010.06828 .