Mirror descent explained

In mathematics, mirror descent is an iterative optimization algorithm for finding a local minimum of a differentiable function.

It generalizes algorithms such as gradient descent and multiplicative weights.

History

Mirror descent was originally proposed by Nemirovski and Yudin in 1983.[1]

Motivation

In gradient descent with the sequence of learning rates

(ηn)n

applied to a differentiable function

F

, one starts with a guess

x0

for a local minimum of

F,

and considers the sequence

x0,x1,x2,\ldots

such that

xn+1=xnn\nablaF(xn),n\ge0.

This can be reformulated by noting that

xn+1=\argminx\left(F(xn)+\nabla

T
F(x
n)

(x-xn)+

1
n

\|x-

2\right)
x
n\|

In other words,

xn+1

minimizes the first-order approximation to

F

at

xn

with added proximity term

\|x-

2
x
n\|
.

This squared Euclidean distance term is a particular example of a Bregman distance. Using other Bregman distances will yield other algorithms such as Hedge which may be more suited to optimization over particular geometries.[2] [3]

Formulation

We are given convex function

f

to optimize over a convex set

K\subsetRn

, and given some norm

\|\|

on

Rn

.

We are also given differentiable convex function

h\colonRn\toR

,

\alpha

-strongly convex with respect to the given norm. This is called the distance-generating function, and its gradient

\nablah\colonRn\toRn

is known as the mirror map.

Starting from initial

x0\inK

, in each iteration of Mirror Descent:

\thetat\leftarrow\nablah(xt)

\thetat+1\leftarrow\thetat-ηt\nablaf(xt)

x't+1\leftarrow(\nablah)-1(\thetat+1)

K

:

xt+1\leftarrowargminxDh(x||x't+1)

, where

Dh

is the Bregman divergence.

Extensions

Mirror descent in the online optimization setting is known as Online Mirror Descent (OMD).[4]

See also

Notes and References

  1. Arkadi Nemirovsky and David Yudin. Problem Complexity and Method Efficiency in Optimization. John Wiley & Sons, 1983
  2. Nemirovski, Arkadi (2012) Tutorial: mirror descent algorithms for large-scale deterministic and stochastic convex optimization.https://www2.isye.gatech.edu/~nemirovs/COLT2012Tut.pdf
  3. Web site: Mirror descent algorithm . 2022-07-10 . tlienart.github.io.
  4. Fang. Huang. Harvey. Nicholas J. A.. Portella. Victor S.. Friedlander. Michael P.. 2021-09-03. Online mirror descent and dual averaging: keeping pace in the dynamic case. cs.LG. 2006.02585.