Mirror descent explained

In mathematics, mirror descent is an iterative optimization algorithm for finding a local minimum of a differentiable function.

It generalizes algorithms such as gradient descent and multiplicative weights.

History

Mirror descent was originally proposed by Nemirovski and Yudin in 1983.^[1]

Motivation

In gradient descent with the sequence of learning rates

(η_n)_n

applied to a differentiable function

, one starts with a guess

x₀

for a local minimum of

and considers the sequence

x_0,x_1,x_2,\ldots

such that

x_n+1=x_n-η_n\nablaF(x_n), n\ge0.

This can be reformulated by noting that

x_n+1=\argmin_x\left(F(x_n)+\nabla

	T
F(x
	n)

(x-x_n)+

	1
	2η_n

\|x-

	2\right)
x
	n\\|

In other words,

x_n+1

minimizes the first-order approximation to

x_n

with added proximity term

\|x-

	2
x
	n\\|

This squared Euclidean distance term is a particular example of a Bregman distance. Using other Bregman distances will yield other algorithms such as Hedge which may be more suited to optimization over particular geometries.^[2] ^[3]

Formulation

We are given convex function

to optimize over a convex set

K\subsetRⁿ

, and given some norm

\| ⋅ \|

Rⁿ

We are also given differentiable convex function

h\colonRⁿ\toR

\alpha

-strongly convex with respect to the given norm. This is called the distance-generating function, and its gradient

\nablah\colonRⁿ\toRⁿ

is known as the mirror map.

Starting from initial

x₀\inK

, in each iteration of Mirror Descent:

Map to the dual space:

\theta_t\leftarrow\nablah(x_t)

Update in the dual space using a gradient step:

\theta_t+1\leftarrow\theta_t-η_t\nablaf(x_t)

Map back to the primal space:

x'_t+1\leftarrow(\nablah)^-1(\theta_t+1)

Project back to the feasible region

x_t+1\leftarrowargmin_xD_h(x||x'_t+1)

, where

D_h

is the Bregman divergence.

Extensions

Mirror descent in the online optimization setting is known as Online Mirror Descent (OMD).^[4]

Notes and References

Arkadi Nemirovsky and David Yudin. Problem Complexity and Method Efficiency in Optimization. John Wiley & Sons, 1983
Nemirovski, Arkadi (2012) Tutorial: mirror descent algorithms for large-scale deterministic and stochastic convex optimization.https://www2.isye.gatech.edu/~nemirovs/COLT2012Tut.pdf
Web site: Mirror descent algorithm . 2022-07-10 . tlienart.github.io.
Fang. Huang. Harvey. Nicholas J. A.. Portella. Victor S.. Friedlander. Michael P.. 2021-09-03. Online mirror descent and dual averaging: keeping pace in the dynamic case. cs.LG. 2006.02585.