The Cauchy distribution, named after Augustin Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) function, or Breit–Wigner distribution. The Cauchy distribution
f(x;x0,\gamma)
(x0,\gamma)
The Cauchy distribution is often used in statistics as the canonical example of a "pathological" distribution since both its expected value and its variance are undefined (but see below). The Cauchy distribution does not have finite moments of order greater than or equal to one; only fractional absolute moments exist.[1] The Cauchy distribution has no moment generating function.
In mathematics, it is closely related to the Poisson kernel, which is the fundamental solution for the Laplace equation in the upper half-plane.
It is one of the few stable distributions with a probability density function that can be expressed analytically, the others being the normal distribution and the Lévy distribution.
A function with the form of the density function of the Cauchy distribution was studied geometrically by Fermat in 1659, and later was known as the witch of Agnesi, after Agnesi included it as an example in her 1748 calculus textbook. Despite its name, the first explicit analysis of the properties of the Cauchy distribution was published by the French mathematician Poisson in 1824, with Cauchy only becoming associated with it during an academic controversy in 1853.[2] Poisson noted that if the mean of observations following such a distribution were taken, the mean error did not converge to any finite number. As such, Laplace's use of the central limit theorem with such a distribution was inappropriate, as it assumed a finite mean and variance. Despite this, Poisson did not regard the issue as important, in contrast to Bienaymé, who was to engage Cauchy in a long dispute over the matter.
Here are the most important constructions.
If one stands in front of a line and kicks a ball with a direction (more precisely, an angle) uniformly at random towards the line, then the distribution of the point where the ball hits the line is a Cauchy distribution.
More formally, consider a point at
(x0,\gamma)
x
x0
\gamma
This definition gives a simple way to sample from the standard Cauchy distribution. Let
u
[0,1]
x
x=\tan\left(\pi(u-
1 | |
2 |
)\right)
U
V
U/V
More generally, if
(U,V)
U/V
The Cauchy distribution is the probability distribution with the following probability density function (PDF)[1] [3]
f(x;x0,\gamma)=
1 | |||||
|
={1\over\pi}\left[{\gamma\over(x-
2 | |
x | |
0) |
+\gamma2}\right],
where
x0
\gamma
2\gamma
\gamma
The maximum value or amplitude of the Cauchy PDF is
1 | |
\pi\gamma |
x=x0
It is sometimes convenient to express the PDF in terms of the complex parameter
\psi=x0+i\gamma
f(x;\psi)= | 1 |
| |||
\pi |
1 | |
\pi |
|
The special case when
x0=0
\gamma=1
f(x;0,1)=
1 | |
\pi(1+x2) |
.
In physics, a three-parameter Lorentzian function is often used:
f(x;x0,\gamma,I)=
I | |||||
|
=I\left[{\gamma2\over(x-
2 | |
x | |
0) |
+\gamma2}\right],
I
I=
1 | |
\pi\gamma |
.
The Cauchy distribution is the probability distribution with the following cumulative distribution function (CDF):
F(x;
x | \arctan\left( | ||||
|
x-x0 | \right)+ | |
\gamma |
1 | |
2 |
and the quantile function (inverse cdf) of the Cauchy distribution is
Q(p;x0,\gamma)=x0+\gamma\tan\left[\pi\left(p-\tfrac{1}{2}\right)\right].
(x0-\gamma,x0+\gamma)
2\gamma
\arctan(x)
F(x;0,1)=
1 | \arctan\left(x\right)+ | |
\pi |
1 | |
2 |
The standard Cauchy distribution is the Student's t-distribution with one degree of freedom, and so it may be constructed by any method that constructs the Student's t-distribution.
If
\Sigma
p x p
X,Y\simN(0,\Sigma)
p
w
X
Y
w1+ … +wp=1
wi\geq0,i=1,\ldots,p,
p | |
\sum | |
j=1 |
w | ||||
|
\simCauchy(0,1).
The Cauchy distribution is an example of a distribution which has no mean, variance or higher moments defined. Its mode and median are well defined and are both equal to
x0
The Cauchy distribution is an infinitely divisible probability distribution. It is also a strictly stable distribution.[7]
Like all stable distributions, the location-scale family to which the Cauchy distribution belongs is closed under linear transformations with real coefficients. In addition, the family of Cauchy-distributed random variables is closed under linear fractional transformations with real coefficients.[8] In this connection, see also McCullagh's parametrization of the Cauchy distributions.
If
X1,X2,\ldots,Xn
\barX=
1n | |
\sum |
iXi
This can be proved by repeated integration with the PDF, or more conveniently, by using the characteristic function of the standard Cauchy distribution (see below):With this, we have
\varphi | |
\sumiXi |
(t)=e-n
\barX
More generally, if
X1,X2,\ldots,Xn
x1,\ldots,xn
\gamma1,\ldots,\gamman
a1,\ldots,an
\sumiaiXi
\sumiaixi
\sumi|ai|\gammai
This shows that the condition of finite variance in the central limit theorem cannot be dropped. It is also an example of a more generalized version of the central limit theorem that is characteristic of all stable distributions, of which the Cauchy distribution is a special case.
If
X1,X2,\ldots
\rho
\limc
1 | |
c |
c | |
\int | |
-c |
x2\rho(x)dx=
2\gamma | |
\pi |
1n | |
\sum |
n | |
i=1 |
Xi
\gamma
Let
X
\varphiX(t)=\operatorname{E}\left[eiXt\right]
infty | |
=\int | |
-infty |
ixt | |
f(x;x | |
0,\gamma)e |
dx=
ix0t-\gamma|t| | |
e |
.
which is just the Fourier transform of the probability density. The original probability density may be expressed in terms of the characteristic function, essentially by using the inverse Fourier transform:
f(x;x0,\gamma)=
1 | |
2\pi |
infty | |
\int | |
-infty |
\varphiX(t;x
-ixt | |
0,\gamma)e |
dt
The nth moment of a distribution is the nth derivative of the characteristic function evaluated at
t=0
The Kullback–Leibler divergence between two Cauchy distributions has the following symmetric closed-form formula:[10]
KL\left(p | |
x0,1,\gamma1 |
:
p | |
x0,2,\gamma2 |
\right)=log
\left(\gamma1+\gamma2\right)2+\left(x0,1-x0,2\right)2 | |
4\gamma1\gamma2 |
.
Any f-divergence between two Cauchy distributions is symmetric and can be expressed as a function of the chi-squared divergence.[11] Closed-form expression for the total variation, Jensen–Shannon divergence, Hellinger distance, etc are available.
The entropy of the Cauchy distribution is given by:
\begin{align} H(\gamma)&
infty | |
=-\int | |
-infty |
f(x;x0,\gamma)log(f(x;x0,\gamma))dx\\[6pt] &=log(4\pi\gamma) \end{align}
The derivative of the quantile function, the quantile density function, for the Cauchy distribution is:
Q'(p;\gamma)=\gamma\pi{\sec}2\left[\pi\left(p-\tfrac12\right)\right].
The differential entropy of a distribution can be defined in terms of its quantile density,[12] specifically:
H(\gamma)=
1 | |
\int | |
0 |
log(Q'(p;\gamma))dp=log(4\pi\gamma)
The Cauchy distribution is the maximum entropy probability distribution for a random variate
X
2/\gamma | |
\operatorname{E}[log(1+(X-x | |
0) |
2)]=log4
The Cauchy distribution is usually used as an illustrative counterexample in elementary probability courses, as a distribution with no well-defined (or "indefinite") moments.
If we take an IID sample
X1,X2,\ldots
Sn=
1n | |
\sum |
n | |
i=1 |
Xi
Similarly, the sample variance
Vn=
1n | |
\sum |
n | |
i=1 |
(Xi-
2 | |
S | |
n) |
S1,S2,...
V1,V2,...
Moments of sample lower than order 1 would converge to zero. Moments of sample higher than order 2 would diverge to infinity even faster than sample variance.
f(x)
We may evaluate this two-sided improper integral by computing the sum of two one-sided improper integrals. That is,for an arbitrary real number
a
For the integral to exist (even as an infinite value), at least one of the terms in this sum should be finite, or both should be infinite and have the same sign. But in the case of the Cauchy distribution, both the terms in this sum are infinite and have opposite sign. Hence is undefined, and thus so is the mean.[14] When the mean of a probability distribution function (PDF) is undefined, no one can compute a reliable average over the experimental data points, regardless of the sample’s size.
Note that the Cauchy principal value of the mean of the Cauchy distribution iswhich is zero. On the other hand, the related integralis not zero, as can be seen by computing the integral. This again shows that the mean cannot exist.
Various results in probability theory about expected values, such as the strong law of large numbers, fail to hold for the Cauchy distribution.[14]
The absolute moments for
p\in(-1,1)
X\simCauchy(0,\gamma)
\operatorname{E}[|X|p]=\gammapsec(\pip/2).
The Cauchy distribution does not have finite moments of any order. Some of the higher raw moments do exist and have a value of infinity, for example, the raw second moment:
\begin{align} \operatorname{E}[X2]&\propto
infty | |
\int | |
-infty |
x2 | |
1+x2 |
dx=
infty | |
\int | |
-infty |
1-
1 | |
1+x2 |
dx\\[8pt] &=
infty | |
\int | |
-infty |
dx-
infty | |
\int | |
-infty |
1 | |
1+x2 |
dx=
infty | |
\int | |
-infty |
dx-\pi=infty. \end{align}
By re-arranging the formula, one can see that the second moment is essentially the infinite integral of a constant (here 1). Higher even-powered raw moments will also evaluate to infinity. Odd-powered raw moments, however, are undefined, which is distinctly different from existing with the value of infinity. The odd-powered raw moments are undefined because their values are essentially equivalent to
infty-infty
The results for higher moments follow from Hölder's inequality, which implies that higher moments (or halves of moments) diverge if lower ones do.
Consider the truncated distribution defined by restricting the standard Cauchy distribution to the interval . Such a truncated distribution has all moments (and the central limit theorem applies for i.i.d. observations from it); yet for almost all practical purposes it behaves like a Cauchy distribution.[15]
Because the parameters of the Cauchy distribution do not correspond to a mean and variance, attempting to estimate the parameters of the Cauchy distribution by using a sample mean and a sample variance will not succeed.[16] For example, if an i.i.d. sample of size n is taken from a Cauchy distribution, one may calculate the sample mean as:
\bar{x}= | 1 |
n |
n | |
\sum | |
i=1 |
xi
Although the sample values
xi
x0
x0
Therefore, more robust means of estimating the central value
x0
\gamma
x0
\gamma
x0
Maximum likelihood can also be used to estimate the parameters
x0
\gamma
n
\hat\ell(x1,...c,xn\midx0,\gamma)=-nlog(\gamma\pi)-
n | |
\sum | |
i=1 |
log\left(1+\left(
xi-x0 | |
\gamma |
\right)2\right)
Maximizing the log likelihood function with respect to
x0
\gamma
d\ell | |
dx0 |
=
n | |
\sum | |
i=1 |
2(xi-x0) | ||||||||||||
|
=0
d\ell | |
d\gamma |
=
n | |
\sum | |
i=1 |
| |||||||||||||
|
-
n | |
\gamma |
=0
Note that
n | |
\sum | |
i=1 |
| |||||||||||||
|
is a monotone function in
\gamma
\gamma
min|xi-x0|\le\gamma\lemax|xi-x0|.
Solving just for
x0
2n-1
\gamma
2n
x0
x0
x0
x0
The shape can be estimated using the median of absolute values, since for location 0 Cauchy variables
X\simCauchy(0,\gamma)
\operatorname{median}(|X|)=\gamma
X=(X1,\ldots,
T | |
X | |
k) |
Y=a1X1+ … +akXk
a\inRk
Y=aTX
\varphiX(t)=
ix0(t)-\gamma(t) | |
e |
,
where
x0(t)
\gamma(t)
x0(t)
\gamma(t)
x0(at)=ax0(t),
\gamma(at)=|a|\gamma(t),
for all
t
An example of a bivariate Cauchy distribution can be given by:[26]
f(x,y;x0,y0,\gamma)={1\over2\pi}\left[{\gamma\over((x-
2 | |
x | |
0) |
+(y-
2 | |
y | |
0) |
+\gamma2)3/2}\right].
x
y
x
y
We also can write this formula for complex variable. Then the probability density function of complex cauchy is :
f(z;z0,\gamma)={1\over2\pi}\left[{\gamma\over
2 | |
(|z-z | |
0| |
+\gamma2)3/2}\right].
Like how the standard Cauchy distribution is the Student t-distribution with one degree of freedom, the multidimensional Cauchy density is the multivariate Student distribution with one degree of freedom. The density of a
k
f({x};{\mu},{\Sigma},k)= |
| ||||||||||
|
| ||||
\right| |
\left[1+({x}-{\mu})T{\Sigma}-1
| ||||
({x}-{\mu})\right] |
The properties of multidimensional Cauchy distribution are then special cases of the multivariate Student distribution.
X\sim\operatorname{Cauchy}(x0,\gamma)
kX+\ell\simrm{Cauchy}(x0k+\ell,\gamma|k|)
X\sim\operatorname{Cauchy}(x0,\gamma0)
Y\sim\operatorname{Cauchy}(x1,\gamma1)
X+Y\sim\operatorname{Cauchy}(x0+x1,\gamma0+\gamma1)
X-Y\sim\operatorname{Cauchy}(x0-x1,\gamma0+\gamma1)
X\sim\operatorname{Cauchy}(0,\gamma)
\tfrac{1}{X}\sim\operatorname{Cauchy}(0,\tfrac{1}{\gamma})
[27] Expressing a Cauchy distribution in terms of one complex parameter
\psi=x0+i\gamma
X\sim\operatorname{Cauchy}(\psi)
X\sim\operatorname{Cauchy}(x0,|\gamma|)
X\sim\operatorname{Cauchy}(\psi)
a
b
c
d
X\sim\operatorname{Cauchy}(\psi)
\operatorname{CCauchy}
The Cauchy distribution is the stable distribution of index 1. The Lévy–Khintchine representation of such a stable distribution of parameter
\gamma
X\sim\operatorname{Stable}(\gamma,0,0)
\operatorname{E}\left(eixX\right)=\exp\left(\int(eixy-1)\Pi\gamma(dy)\right)
where
\Pi\gamma(dy)=\left(c1,
1 | |
y1 |
1}+c2,\gamma
1 | |
|y|1 |
1\left\{
and
c1,,c2,
\gamma=1
c1,=c2,
This last representation is a consequence of the formula
\pi|x|=\operatorname{PV}\intR(1-eixy)
dy | |
y2 |
\operatorname{Cauchy}(0,1)\simrm{t}(df=1)
\operatorname{Cauchy}(\mu,\sigma)\simrm{t}(df=1)(\mu,\sigma)
X,Y\simrm{N}(0,1)X,Y
\tfracXY\simrm{Cauchy}(0,1)
X\simrm{U}(0,1)
\tan\left(\pi\left(X-\tfrac{1}{2}\right)\right)\simrm{Cauchy}(0,1)
X\sim\operatorname{Log-Cauchy}(0,1)
ln(X)\simrm{Cauchy}(0,1)
X\sim\operatorname{Cauchy}(x0,\gamma)
\tfrac1X\sim\operatorname{Cauchy}\left(\tfrac{x0}{x
2+\gamma | |
0 |
2+\gamma | |
0 |
2}\right)
X\simrm{Stable}(1,0,\gamma,\mu)
X\sim\operatorname{Cauchy}(\mu,\gamma)
X\simrm{N}(0,1)
Z\sim\operatorname{Inverse-Gamma}(1/2,s2/2)
Y=\mu+X\sqrtZ\sim\operatorname{Cauchy}(\mu,s)
X\simrm{N}(0,1)I\{X\ge0\}
See main article: article and Relativistic Breit–Wigner distribution. In nuclear and particle physics, the energy profile of a resonance is described by the relativistic Breit–Wigner distribution, while the Cauchy distribution is the (non-relativistic) Breit–Wigner distribution.
\hat{\beta}
xt+1=\beta{x}t+\varepsilont+1,\beta>1