Smooth maximum explained
In mathematics, a smooth maximum of an indexed family x1, ..., xn of numbers is a smooth approximation to the maximum function
meaning a
parametric family of functions
such that for every, the function is smooth, and the family converges to the maximum function as . The concept of
smooth minimum is similarly defined. In many cases, a single family approximates both: maximum as the parameter goes to positive infinity, minimum as the parameter goes to negative infinity; in symbols, as and as . The term can also be used loosely for a specific smooth function that behaves similarly to a maximum, without necessarily being part of a parametrized family.
Examples
Boltzmann operator
For large positive values of the parameter
, the following formulation is a smooth,
differentiable approximation of the maximum function. For negative values of the parameter that are large in absolute value, it approximates the minimum.
l{S}\alpha(x1,\ldots,xn)=
has the following properties:
as
is the
arithmetic mean of its inputs
as
The gradient of
is closely related to
softmax and is given by
l{S}\alpha(x1,\ldots,xn)=
[1+\alpha(xi-l{S}\alpha(x1,\ldots,xn))].
This makes the softmax function useful for optimization techniques that use gradient descent.
This operator is sometimes called the Boltzmann operator,[1] after the Boltzmann distribution.
LogSumExp
See main article: LogSumExp. Another smooth maximum is LogSumExp:
LSE\alpha(x1,\ldots,xn)=
log
\exp\alphaxi
This can also be normalized if the
are all non-negative, yielding a function with domain
and range
:
g(x1,\ldots,xn)=log\left(
\expxi-(n-1)\right)
The
term corrects for the fact that
by canceling out all but one zero exponential, and
if all
are zero.
Mellowmax
The mellowmax operator[1] is defined as follows:
mm\alpha(x)=
log
\exp\alphaxi
It is a
non-expansive operator. As
, it acts like a maximum. As
, it acts like an arithmetic mean. As
, it acts like a minimum. This operator can be viewed as a particular instantiation of the
quasi-arithmetic mean. It can also be derived from information theoretical principles as a way of regularizing policies with a cost function defined by KL divergence. The operator has previously been utilized in other areas, such as power engineering.
[2] Notes and References
- Asadi . Kavosh . Littman . Michael L. . Michael L. Littman . 2017 . An Alternative Softmax Operator for Reinforcement Learning . PMLR . 70 . 243–252 . 1612.05628 . January 6, 2023.
- Safak . Aysel . February 1993 . Statistical analysis of the power sum of multiple correlated log-normal components . IEEE Transactions on Vehicular Technology . 42 . 1 . .