In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions. It is a type of f-divergence. The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909.
It is sometimes called the Jeffreys distance.[1] [2]
To define the Hellinger distance in terms of measure theory, let
P
Q
l{X}
λ
λ=(P+Q)
P
Q
H2(P,Q)=
1 | |
2 |
\displaystyle\intl{X
Here,
P(dx)=p(x)λ(dx)
Q(dx)=q(x)λ(dx)
p
q
λ
λ
λ
H2(P,Q)=
1 | |
2 |
\intl{X
To define the Hellinger distance in terms of elementary probability theory, we take λ to be the Lebesgue measure, so that dP / dλ and dQ / dλ are simply probability density functions. If we denote the densities as f and g, respectively, the squared Hellinger distance can be expressed as a standard calculus integral
H2(f,g)=
1 | |
2 |
\int\left(\sqrt{f(x)}-\sqrt{g(x)}\right)2dx=1-\int\sqrt{f(x)g(x)}dx,
where the second form can be obtained by expanding the square and using the fact that the integral of a probability density over its domain equals 1.
The Hellinger distance H(P, Q) satisfies the property (derivable from the Cauchy–Schwarz inequality)
0\leH(P,Q)\le1.
For two discrete probability distributions
P=(p1,\ldots,pk)
Q=(q1,\ldots,qk)
H(P,Q)=
1 | |
\sqrt{2 |
which is directly related to the Euclidean norm of the difference of the square root vectors, i.e.
H(P,Q)=
1 | |
\sqrt{2 |
Also,
1-H2(P,Q)=
k | |
\sum | |
i=1 |
\sqrt{piqi}.
The Hellinger distance forms a bounded metric on the space of probability distributions over a given probability space.
The maximum distance 1 is achieved when P assigns probability zero to every set to which Q assigns a positive probability, and vice versa.
Sometimes the factor
1/2
BC(P,Q)
H(P,Q)=\sqrt{1-BC(P,Q)}.
Hellinger distances are used in the theory of sequential and asymptotic statistics.[3] [4]
The squared Hellinger distance between two normal distributions
P\siml{N}(\mu1,\sigma
2) | |
1 |
Q\siml{N}(\mu2,\sigma
2) | |
2 |
H2(P,Q)=1-\sqrt{
2\sigma1\sigma2 | ||||||
|
The squared Hellinger distance between two multivariate normal distributions
P\siml{N}(\mu1,\Sigma1)
Q\siml{N}(\mu2,\Sigma2)
H2(P,Q)=1-
| \exp\left\{- | |||||||||||||||
|
1 | |
8 |
(\mu1-
T | ||
\mu | \left( | |
2) |
\Sigma1+\Sigma2 | |
2 |
\right)-1(\mu1-\mu2)\right\}
The squared Hellinger distance between two exponential distributions
P\simExp(\alpha)
Q\simExp(\beta)
H2(P,Q)=1-
2\sqrt{\alpha\beta | |
The squared Hellinger distance between two Weibull distributions
P\simW(k,\alpha)
Q\simW(k,\beta)
k
\alpha,\beta
H2(P,Q)=1-
2(\alpha\beta)k/2 | |
\alphak+\betak |
.
The squared Hellinger distance between two Poisson distributions with rate parameters
\alpha
\beta
P\simPoisson(\alpha)
Q\simPoisson(\beta)
H2(P,Q)=
| |||||
1-e |
-\sqrt{\beta})2}.
The squared Hellinger distance between two beta distributions
P\simBeta(a1,b1)
Q\simBeta(a2,b2)
H2(P,Q)=1-
| ||||||||
\sqrt{B(a1,b1)B(a2,b2) |
B
The squared Hellinger distance between two gamma distributions
P\simGamma(a1,b1)
Q\simGamma(a2,b2)
H2(P,Q)=1-\Gamma\left({\scriptstyle
a1+a2 | |
2 |
\Gamma
The Hellinger distance
H(P,Q)
\delta(P,Q)
H2(P,Q)\leq\delta(P,Q)\leq\sqrt{2}H(P,Q).
The constants in this inequality may change depending on which renormalization you choose (
1/2
1/\sqrt{2}
These inequalities follow immediately from the inequalities between the 1-norm and the 2-norm.