In statistics, EM (expectation maximization) algorithm handles latent variables, while GMM is the Gaussian mixture model.
In the picture below, are shown the red blood cell hemoglobin concentration and the red blood cell volume data of two groups of people, the Anemia group and the Control Group (i.e. the group of people without Anemia). As expected, people with Anemia have lower red blood cell volume and lower red blood cell hemoglobin concentration than those without Anemia.
x
x:=(redbloodcellvolume,redbloodcellhemoglobinconcentration)
x
x\simlN(\mu,\Sigma)
z
x
zi=0
xi
zi=1
xi
z\sim\operatorname{Categorical}(k,\phi)
k=2
\phij\geq0,
k\phi | |
\sum | |
j=1 |
The following procedure can be used to estimate
\phi,\mu,\Sigma
A maximum likelihood estimation can be applied:
m | |
\ell(\phi,\mu,\Sigma)=\sum | |
i=1 |
log(p(x(i)
m | |
;\phi,\mu,\Sigma)) =\sum | |
i=1 |
log
k | |
\sum | |
z(i)=1 |
p\left(x(i)\midz(i);\mu,\Sigma\right)p(z(i);\phi)
As the
zi
xi
\ell(\phi,\mu,
m | |
\Sigma)=\sum | |
i=1 |
logp\left(x(i)\midz(i);\mu,\Sigma\right)+logp\left(z(i);\phi\right)
Now the likelihood function can be maximized by making partial derivative over
\mu,\Sigma,\phi
\phij=
1 | |
m |
m | |
\sum | |
i=1 |
1\{z(i)=j\}
\muj=
| ||||||||||
x |
(i)
\Sigmaj=
| ||||||||||
(x |
(i)
(i) | |
-\mu | |
j)(x |
m | |
-\mu | |
i=1 |
1\{z(i)=j\}}
If
zi
zi
Being
z
z
In machine learning, the latent variable
z
xi
\phi,\mu,\Sigma
z
xi
The EM algorithm consists of two steps: the E-step and the M-step. Firstly, the model parameters and the
z(i)
z(i)
z(i)
The algorithm in GMM is:
Repeat until convergence:
1. (E-step) For each
i,j
(i) | |
w | |
j |
:=p\left(z(i)=j|x(i);\phi,\mu,\Sigma\right)
2. (M-step) Update the parameters
\phij:=
1 | |
m |
m | |
\sum | |
i=1 |
(i) | |
w | |
j |
\muj:=
| ||||||||||||||||
|
\Sigmaj:=
| ||||||||||||||||
|
With Bayes Rule, the following result is obtained by the E-step:
p\left(z(i)=j|x(i);\phi,\mu,\Sigma\right)=
p\left(x(i)|z(i)=j;\mu,\Sigma\right)p\left(z(i)=j;\phi\right) | |||||||||
|
According to GMM setting, these following formulas are obtained:
p\left(x(i)|z(i)=j;\mu,\Sigma\right)=
1 | |
(2\pi)n\left|\Sigmaj\right|1 |
\exp\left(-
1 | |
2 |
\left(x(i)-\muj\right)T
-1 | |
\Sigma | |
j |
\left(x(i)-\muj\right)\right)
p\left(z(i)=j;\phi\right)=\phij
In this way, a switch between the E-step and the M-step is possible, according to the randomly initialized parameters.