Continuous Bernoulli distribution explained

λ\in(0,1)

, defined on the unit interval

x\in[0,1]

, by:

p(x|λ)\proptoλ^x(1-λ)^1-x.

The continuous Bernoulli distribution arises in deep learning and computer vision, specifically in the context of variational autoencoders,^[4] ^[5] for modeling the pixel intensities of natural images. As such, it defines a proper probabilistic counterpart for the commonly used binary cross entropy loss, which is often applied to continuous,

[0,1]

-valued data.^[6] ^[7] ^[8] ^[9] This practice amounts to ignoring the normalizing constant of the continuous Bernoulli distribution, since the binary cross entropy loss only defines a true log-likelihood for discrete,

\{0,1\}

-valued data.

The continuous Bernoulli also defines an exponential family of distributions. Writing

η=log\left(λ/(1-λ)\right)

for the natural parameter, the density can be rewritten in canonical form:

p(x|η)\propto\exp(ηx)

Related distributions

Bernoulli distribution

The continuous Bernoulli can be thought of as a continuous relaxation of the Bernoulli distribution, which is defined on the discrete set

\{0,1\}

by the probability mass function:

p(x)=p^x(1-p)^1-x,

where

is a scalar parameter between 0 and 1. Applying this same functional form on the continuous interval

[0,1]

results in the continuous Bernoulli probability density function, up to a normalizing constant.

Beta distribution

The Beta distribution has the density function:

p(x)\proptox^\alpha(1-x)^\beta,

which can be re-written as:

p(x)\propto

	\alpha₁-1
x
	1

	\alpha₂-1
x
	2

where

\alpha_1,\alpha₂

are positive scalar parameters, and

(x_1,x₂₎

represents an arbitrary point inside the 1-simplex,

\Delta¹=\{(x_1,x_2):x₁>0,x₂>0,x₁+x₂=1\}

. Switching the role of the parameter and the argument in this density function, we obtain:

p(x)\propto

	x₁
\alpha
	1

	x₂
\alpha
	2

This family is only identifiable up to the linear constraint

\alpha₁+\alpha₂=1

, whence we obtain:

p(x)\propto

	x₁
λ

	x₂
(1-λ)

corresponding exactly to the continuous Bernoulli density.

Exponential distribution

An exponential distribution restricted to the unit interval is equivalent to a continuous Bernoulli distribution with appropriate parameter.

Continuous categorical distribution

The multivariate generalization of the continuous Bernoulli is called the continuous-categorical.^[10]

Notes and References

Loaiza-Ganem, G., & Cunningham, J. P. (2019). The continuous Bernoulli: fixing a pervasive error in variational autoencoders. In Advances in Neural Information Processing Systems (pp. 13266-13276).
PyTorch Distributions. https://pytorch.org/docs/stable/distributions.html#continuousbernoulli
Tensorflow Probability. https://www.tensorflow.org/probability/api_docs/python/tfp/edward2/ContinuousBernoulli
Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
Kingma, D. P., & Welling, M. (2014, April). Stochastic gradient VB and the variational auto-encoder. In Second International Conference on Learning Representations, ICLR (Vol. 19).
Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2016, June). Autoencoding beyond pixels using a learned similarity metric. In International conference on machine learning (pp. 1558-1566).
Jiang, Z., Zheng, Y., Tan, H., Tang, B., & Zhou, H. (2017, August). Variational deep embedding: an unsupervised and generative approach to clustering. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (pp. 1965-1972).
PyTorch VAE tutorial: https://github.com/pytorch/examples/tree/master/vae.
Keras VAE tutorial: https://blog.keras.io/building-autoencoders-in-keras.html.
Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. In 36th International Conference on Machine Learning, ICML 2020. International Machine Learning Society (IMLS).