Smooth maximum explained

In mathematics, a smooth maximum of an indexed family x₁, ..., x_n of numbers is a smooth approximation to the maximum function

max(x_1,\ldots,x_n),

meaning a parametric family of functions

m_\alpha(x_1,\ldots,x_n)

such that for every, the function is smooth, and the family converges to the maximum function as . The concept of smooth minimum is similarly defined. In many cases, a single family approximates both: maximum as the parameter goes to positive infinity, minimum as the parameter goes to negative infinity; in symbols, as and as . The term can also be used loosely for a specific smooth function that behaves similarly to a maximum, without necessarily being part of a parametrized family.

Examples

Boltzmann operator

For large positive values of the parameter

\alpha>0

, the following formulation is a smooth, differentiable approximation of the maximum function. For negative values of the parameter that are large in absolute value, it approximates the minimum.

l{S}_\alpha(x_1,\ldots,x_n)=

\sum

x_i

	\alphax_i
e

i=1

\sum

	\alphax_i
e

i=1

l{S}_\alpha

has the following properties:

l{S}_\alpha\tomax

\alpha\toinfty

l{S}₀

is the arithmetic mean of its inputs

l{S}_\alpha\tomin

\alpha\to-infty

The gradient of

l{S}_\alpha

is closely related to softmax and is given by

\nabla
	x_i

l{S}_\alpha(x_1,\ldots,x_n)=

	\alphax_i
e

\sum

	\alphax_j
e

j=1

[1+\alpha(x_i-l{S}_\alpha(x_1,\ldots,x_n))].

This makes the softmax function useful for optimization techniques that use gradient descent.

This operator is sometimes called the Boltzmann operator,^[1] after the Boltzmann distribution.

LogSumExp

See main article: LogSumExp. Another smooth maximum is LogSumExp:

LSE_\alpha(x_1,\ldots,x_n)=

	1
	\alpha

log

	n
\sum
	i=1

\exp\alphax_i

This can also be normalized if the

x_i

are all non-negative, yielding a function with domain

[0,infty)ⁿ

and range

[0,infty)

g(x_1,\ldots,x_n)=log\left(

	n
\sum
	i=1

\expx_i-(n-1)\right)

The

(n-1)

term corrects for the fact that

\exp(0)=1

by canceling out all but one zero exponential, and

log1=0

if all

x_i

are zero.

Mellowmax

The mellowmax operator^[1] is defined as follows:

mm_\alpha(x)=

	1
	\alpha

log

	1
	n

	n
\sum
	i=1

\exp\alphax_i

It is a non-expansive operator. As

\alpha\toinfty

, it acts like a maximum. As

\alpha\to0

, it acts like an arithmetic mean. As

\alpha\to-infty

, it acts like a minimum. This operator can be viewed as a particular instantiation of the quasi-arithmetic mean. It can also be derived from information theoretical principles as a way of regularizing policies with a cost function defined by KL divergence. The operator has previously been utilized in other areas, such as power engineering.^[2]

Notes and References

Asadi . Kavosh . Littman . Michael L. . Michael L. Littman . 2017 . An Alternative Softmax Operator for Reinforcement Learning . PMLR . 70 . 243–252 . 1612.05628 . January 6, 2023.
Safak . Aysel . February 1993 . Statistical analysis of the power sum of multiple correlated log-normal components . IEEE Transactions on Vehicular Technology . 42 . 1 . .