In applied mathematics, Hessian automatic differentiation are techniques based on automatic differentiation (AD)that calculate the second derivative of an
n
When examining a function in a neighborhood of a point, one can discard many complicated global aspects of the function and accurately approximate it with simpler functions. The quadratic approximation is the best-fitting quadratic in the neighborhood of a point, and is frequently used in engineering and science. To calculate the quadratic approximation, one must first calculate its gradient and Hessian matrix.
Let
f:Rn → R
x\inRn
H(x)\inRn
For a given
u\inRn
H(x)u
H(x)ei
i=1,\ldots,n
The method works by first using forward AD to perform
f(x) → uT\nablaf(x)
uT\nablaf(x)
\nabla\left(u ⋅ \nablaf(x)\right)=uTH(x)=(H(x)u)T
An algorithm that calculates the entire Hessian with one forward and one reverse sweep of the computational graph is Edge_Pushing. Edge_Pushing is the result of applying the reverse gradient to the computational graph of the gradient. Naturally, this graph has n output nodes, thus in a sense one has to apply the reverse gradient method to each outgoing node. Edge_Pushing does this by taking into account overlapping calculations.
The algorithm's input is the computational graph of the function. After a preceding forward sweep where all intermediate values in the computational graph are calculated, the algorithm initiates a reverse sweep of the graph. Upon encountering a node that has a corresponding nonlinear elemental function, a new nonlinear edge is created between the node's predecessors indicating there is nonlinear interaction between them. See the example figure on the right. Appended to this nonlinear edge is an edge weight that is the second-order partial derivative of the nonlinear node in relation to its predecessors. This nonlinear edge is subsequently pushed down to further predecessors in such a way that when it reaches the independent nodes, its edge weight is the second-order partial derivative of the two independent nodes it connects.
The graph colouring techniques explore sparsity patterns of the Hessian matrix and cheap Hessian vector products to obtain the entire matrix. Thus these techniques are suited for large, sparse matrices. The general strategy of any such colouring technique is as follows.
H
x\inRn
Steps one and two need only be carried out once, and tend to be costly. When one wants to calculate the Hessian at numerous points (such as in an optimization routine), steps 3 and 4 are repeated.
As an example, the figure on the left shows the sparsity pattern of the Hessian matrix where the columns have been appropriately coloured in such a way to allow columns of the same colour to be merged without incurring in a collision between elements.There are a number of colouring techniques, each with a specific recovery technique. For a comprehensive survey, see. There have been successful numerical results of such methods.
This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article "Hessian automatic differentiation".
Except where otherwise indicated, Everything.Explained.Today is © Copyright 2009-2024, A B Cryer, All Rights Reserved. Cookie policy.