Augmented Lagrangian methods are a certain class of algorithms for solving constrained optimization problems. They have similarities to penalty methods in that they replace a constrained optimization problem by a series of unconstrained problems and add a penalty term to the objective, but the augmented Lagrangian method adds yet another term designed to mimic a Lagrange multiplier. The augmented Lagrangian is related to, but not identical with, the method of Lagrange multipliers.
Viewed differently, the unconstrained objective is the Lagrangian of the constrained problem, with an additional penalty term (the augmentation).
The method was originally known as the method of multipliers and was studied in the 1970s and 1980s as a potential alternative to penalty methods. It was first discussed by Magnus Hestenes[1] and then by Michael Powell in 1969.[2] The method was studied by R. Tyrrell Rockafellar in relation to Fenchel duality, particularly in relation to proximal-point methods, Moreau–Yosida regularization, and maximal monotone operators; these methods were used in structural optimization. The method was also studied by Dimitri Bertsekas, notably in his 1982 book,[3] together with extensions involving non-quadratic regularization functions (e.g., entropic regularization). This combined study gives rise to the "exponential method of multipliers" which handles inequality constraints with a twice-differentiable augmented Lagrangian function.
Since the 1970s, sequential quadratic programming (SQP) and interior point methods (IPM) have been given more attention, in part because they more easily use sparse matrix subroutines from numerical software libraries, and in part because IPMs possess proven complexity results via the theory of self-concordant functions. The augmented Lagrangian method was rejuvenated by the optimization systems LANCELOT, ALGENCAN[4] and AMPL, which allowed sparse matrix techniques to be used on seemingly dense but "partially-separable" problems. The method is still useful for some problems.[5]
Around 2007, there was a resurgence of augmented Lagrangian methods in fields such as total variation denoising and compressed sensing. In particular, a variant of the standard augmented Lagrangian method that uses partial updates (similar to the Gauss–Seidel method for solving linear equations) known as the alternating direction method of multipliers or ADMM gained some attention.
Consider solving the following constrained optimization problem:
minf(x)
ci(x)=0~\foralli\inl{E},
where
l{E}
min\Phik(x)=f(x)+\muk~\sumi\in
\muk
The augmented Lagrangian method uses the following unconstrained objective:
min\Phik(x)=f(x)+
\muk | |
2 |
~\sumi\in
\muk
λ
λi\leftarrowλi+\mukci(xk)
xk
xk=argmin\Phik(x)
The variable
λ
\mu → infty
\mu
The method can be extended to handle inequality constraints. For a discussion of practical improvements, see refs.[5]
The alternating direction method of multipliers (ADMM) is a variant of the augmented Lagrangian scheme that uses partial updates for the dual variables. This method is often applied to solve problems such as,
minxf(x)+g(x).
This is equivalent to the constrained problem,
minx,yf(x)+g(y), subjectto x=y.
Though this change may seem trivial, the problem can now be attacked using methods of constrained optimization (in particular, the augmented Lagrangian method), and the objective function is separable in x and y. The dual update requires solving a proximity function in x and y at the same time; the ADMM technique allows this problem to be solved approximately by first solving for x with y fixed and then solving for y with x fixed. Rather than iterate until convergence (like the Jacobi method), the algorithm proceeds directly to updating the dual variable and then repeating the process. This is not equivalent to the exact minimization, but the method still converges to the correct solution under some assumptions. Because of this approximation, the algorithm is distinct from the pure augmented Lagrangian method.
The ADMM can be viewed as an application of the Douglas-Rachford splitting algorithm, and the Douglas-Rachford algorithm is in turn an instance of the Proximal point algorithm; details can be found in ref.[6] There are several modern software packages, including YALL1[7] (2009), SpaRSA[8] (2009) and SALSA[9] (2009), which solve Basis pursuit and variants and use the ADMM. There are also packages which use the ADMM to solve more general problems, some of which can exploit multiple computing cores (e.g., SNAPVX[10] (2015), parADMM[11] (2016)).
Stochastic optimization considers the problem of minimizing a loss function with access to noisy samples of the (gradient of the) function. The goal is to have an estimate of the optimal parameter (minimizer) per new sample. With some modifications, ADMM can be used for stochastic optimization. In a stochastic setting, only noisy samples of a gradient are accessible, so an inexact approximation of the Lagrangian is used:
\hat{l{L}}\rho,k=f1(xk)+\langle\nablaf(xk,\zetak+1),x\rangle+g(y)-zT(Ax+By-c)+
\rho | |
2 |
\VertAx+By-c
| ||||
\Vert |
,
where
ηk+1
ADMM has been applied to solve regularized problems, where the function optimization and regularization can be carried out locally and then coordinated globally via constraints.[13] [14] [15] [16]
Regularized optimization problems are especially relevant in the high-dimensional regime as regularization is a natural mechanism to overcome ill-posedness and to encourage parsimony in the optimal solution (e.g., sparsity and low rank). ADMM's effectiveness for solving regularized problems may mean it could be useful for solving high-dimensional stochastic optimization problems.[17]
Open source and non-free/commercial implementations of the augmented Lagrangian method: