Augmented Lagrangian method explained

Augmented Lagrangian methods are a certain class of algorithms for solving constrained optimization problems. They have similarities to penalty methods in that they replace a constrained optimization problem by a series of unconstrained problems and add a penalty term to the objective, but the augmented Lagrangian method adds yet another term designed to mimic a Lagrange multiplier. The augmented Lagrangian is related to, but not identical with, the method of Lagrange multipliers.

Viewed differently, the unconstrained objective is the Lagrangian of the constrained problem, with an additional penalty term (the augmentation).

The method was originally known as the method of multipliers and was studied in the 1970s and 1980s as a potential alternative to penalty methods. It was first discussed by Magnus Hestenes[1] and then by Michael Powell in 1969.[2] The method was studied by R. Tyrrell Rockafellar in relation to Fenchel duality, particularly in relation to proximal-point methods, Moreau–Yosida regularization, and maximal monotone operators; these methods were used in structural optimization. The method was also studied by Dimitri Bertsekas, notably in his 1982 book,[3] together with extensions involving non-quadratic regularization functions (e.g., entropic regularization). This combined study gives rise to the "exponential method of multipliers" which handles inequality constraints with a twice-differentiable augmented Lagrangian function.

Since the 1970s, sequential quadratic programming (SQP) and interior point methods (IPM) have been given more attention, in part because they more easily use sparse matrix subroutines from numerical software libraries, and in part because IPMs possess proven complexity results via the theory of self-concordant functions. The augmented Lagrangian method was rejuvenated by the optimization systems LANCELOT, ALGENCAN[4] and AMPL, which allowed sparse matrix techniques to be used on seemingly dense but "partially-separable" problems. The method is still useful for some problems.[5]

Around 2007, there was a resurgence of augmented Lagrangian methods in fields such as total variation denoising and compressed sensing. In particular, a variant of the standard augmented Lagrangian method that uses partial updates (similar to the Gauss–Seidel method for solving linear equations) known as the alternating direction method of multipliers or ADMM gained some attention.

General method

Consider solving the following constrained optimization problem:

minf(x)

subject to

ci(x)=0~\foralli\inl{E},

where

l{E}

denotes the indices for equality constraints. This problem can be solved as a series of unconstrained minimization problems. For reference, we first list the kth step of the penalty method approach:

min\Phik(x)=f(x)+\muk~\sumi\in

} ~ c_i(\mathbf)^2. The penalty method solves this problem, then at the next iteration it re-solves the problem using a larger value of

\muk

and using the old solution as the initial guess or "warm start".

The augmented Lagrangian method uses the following unconstrained objective:

min\Phik(x)=f(x)+

\muk
2

~\sumi\in

} ~ c_i(\mathbf)^2 + \sum_ ~ \lambda_i c_i(\mathbf)and after each iteration, in addition to updating

\muk

, the variable

λ

is also updated according to the rule

λi\leftarrowλi+\mukci(xk)

where

xk

is the solution to the unconstrained problem at the kth step (i.e.

xk=argmin\Phik(x)

).

The variable

λ

is an estimate of the Lagrange multiplier, and the accuracy of this estimate improves at every step. The major advantage of the method is that unlike the penalty method, it is not necessary to take

\muinfty

in order to solve the original constrained problem. Because of the presence of the Lagrange multiplier term,

\mu

can stay much smaller, and thus avoiding ill-conditioning.[5] Nevertheless, it is common in practical implementations to project multipliers estimates in a large bounded set (safeguards) which avoids numerical instabilities and leads to strong theoretical convergence.

The method can be extended to handle inequality constraints. For a discussion of practical improvements, see refs.[5]

Alternating direction method of multipliers

The alternating direction method of multipliers (ADMM) is a variant of the augmented Lagrangian scheme that uses partial updates for the dual variables. This method is often applied to solve problems such as,

minxf(x)+g(x).

This is equivalent to the constrained problem,

minx,yf(x)+g(y),subjecttox=y.

Though this change may seem trivial, the problem can now be attacked using methods of constrained optimization (in particular, the augmented Lagrangian method), and the objective function is separable in x and y. The dual update requires solving a proximity function in x and y at the same time; the ADMM technique allows this problem to be solved approximately by first solving for x with y fixed and then solving for y with x fixed. Rather than iterate until convergence (like the Jacobi method), the algorithm proceeds directly to updating the dual variable and then repeating the process. This is not equivalent to the exact minimization, but the method still converges to the correct solution under some assumptions. Because of this approximation, the algorithm is distinct from the pure augmented Lagrangian method.

The ADMM can be viewed as an application of the Douglas-Rachford splitting algorithm, and the Douglas-Rachford algorithm is in turn an instance of the Proximal point algorithm; details can be found in ref.[6] There are several modern software packages, including YALL1[7] (2009), SpaRSA[8] (2009) and SALSA[9] (2009), which solve Basis pursuit and variants and use the ADMM. There are also packages which use the ADMM to solve more general problems, some of which can exploit multiple computing cores (e.g., SNAPVX[10] (2015), parADMM[11] (2016)).

Stochastic optimization

Stochastic optimization considers the problem of minimizing a loss function with access to noisy samples of the (gradient of the) function. The goal is to have an estimate of the optimal parameter (minimizer) per new sample. With some modifications, ADMM can be used for stochastic optimization. In a stochastic setting, only noisy samples of a gradient are accessible, so an inexact approximation of the Lagrangian is used:

\hat{l{L}}\rho,k=f1(xk)+\langle\nablaf(xk,\zetak+1),x\rangle+g(y)-zT(Ax+By-c)+

\rho
2

\VertAx+By-c

2+\Vertx-xk\Vert2
k+1
\Vert

,

where

ηk+1

is a time-varying step size.[12]

ADMM has been applied to solve regularized problems, where the function optimization and regularization can be carried out locally and then coordinated globally via constraints.[13] [14] [15] [16]

Regularized optimization problems are especially relevant in the high-dimensional regime as regularization is a natural mechanism to overcome ill-posedness and to encourage parsimony in the optimal solution (e.g., sparsity and low rank). ADMM's effectiveness for solving regularized problems may mean it could be useful for solving high-dimensional stochastic optimization problems.[17]

Alternative approaches

Software

Open source and non-free/commercial implementations of the augmented Lagrangian method:

See also

References

  1. M. R. . Hestenes . Multiplier and gradient methods . Journal of Optimization Theory and Applications . 4 . 1969 . 5 . 303–320 . 10.1007/BF00927673 . 121584579 .
  2. Book: Powell, M. J. D. . A method for nonlinear constraints in minimization problems . Optimization . R. . Fletcher . Academic Press . New York . 1969 . 283–298 . 0-12-260650-7 .
  3. Book: Bertsekas, Dimitri P. . Constrained optimization and Lagrange multiplier methods . Athena Scientific . 1996 . 1982 .
  4. Andreani . R. . Birgin . E. G. . Martínez . J. M. . Schuverdt . M. L. . 10.1137/060654797 . On Augmented Lagrangian Methods with General Lower-Level Constraints . SIAM Journal on Optimization . 18 . 4 . 1286–1309 . 2007 . 1218538 .
  5. , chapter 17
  6. Eckstein . J. . Bertsekas . D. P. . 10.1007/BF01581204 . On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators . Mathematical Programming . 55 . 1–3 . 293–318 . 1992 . 10.1.1.141.6246 . 15551627 .
  7. Web site: YALL1: Your ALgorithms for L1 . yall1.blogs.rice.edu.
  8. Web site: SpaRSA . www.lx.it.pt.
  9. Web site: (C)SALSA: A Solver for Convex Optimization Problems in Image Recovery . cascais.lx.it.pt.
  10. Web site: SnapVX. snap.stanford.edu.
  11. Web site: parADMM/engine. February 6, 2021. GitHub.
  12. Ouyang, H.. He, N.. Tran, L.. Gray, A. G. amp. Stochastic alternating direction method of multipliers. Proceedings of the 30th International Conference on Machine Learning. 2013. 80–88.
  13. Boyd, S.. Parikh, N.. Chu, E.. Peleato, B.. Eckstein, J.. amp. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning. 2011. 3. 1. 1–122. 10.1561/2200000016. 10.1.1.360.1664. 51789432 .
  14. Wahlberg, B. . Boyd, S. . Annergren, M. . Wang, Y.. An ADMM algorithm for a class of total variation regularized estimation problems . 1203.1828. 2012. stat.ML .
  15. Esser, E. . Zhang, X. . Chan, T.. A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science. SIAM Journal on Imaging Sciences. 2010. 3. 4. 1015–1046 . 10.1137/09076934X .
  16. Book: Mota, J. FC . Xavier, J. MF . Aguiar, P. MQ . Puschel, M.. 2012 IEEE 51st IEEE Conference on Decision and Control (CDC) . Distributed ADMM for model predictive control and congestion control . 2012. 5110–5115 . 10.1109/CDC.2012.6426141 . 978-1-4673-2066-5 . 12128421 .
  17. Web site: Alternating Direction Method of Multipliers - an overview ScienceDirect Topics . 2023-08-07 . www.sciencedirect.com.
  18. Web site: Bitbucket. bitbucket.org.
  19. Web site: TANGO Project. www.ime.usp.br.
  20. Web site: PyProximal Project. www.github.com/PyLops/pyproximal.