EM algorithm and GMM model explained

In statistics, EM (expectation maximization) algorithm handles latent variables, while GMM is the Gaussian mixture model.

Background

In the picture below, are shown the red blood cell hemoglobin concentration and the red blood cell volume data of two groups of people, the Anemia group and the Control Group (i.e. the group of people without Anemia). As expected, people with Anemia have lower red blood cell volume and lower red blood cell hemoglobin concentration than those without Anemia.

is a random vector such as

x:=(redbloodcellvolume,redbloodcellhemoglobinconcentration)

, and from medical studies it is known that

are normally distributed in each group, i.e.

x\simlN(\mu,\Sigma)

is denoted as the group where

belongs, with

z_i=0

when

x_i

belongs to Anemia Group and

z_i=1

when

x_i

belongs to Control Group. Also

z\sim\operatorname{Categorical}(k,\phi)

where

k=2

\phi_j\geq0,

and

	k\phi
\sum
	j=1

. See Categorical distribution.

The following procedure can be used to estimate

\phi,\mu,\Sigma

A maximum likelihood estimation can be applied:

	m
\ell(\phi,\mu,\Sigma)=\sum
	i=1

log(p(x⁽ⁱ⁾

	m
;\phi,\mu,\Sigma)) =\sum
	i=1

log

	k
\sum
	z⁽ⁱ⁾=1

p\left(x⁽ⁱ⁾\midz⁽ⁱ⁾;\mu,\Sigma\right)p(z⁽ⁱ⁾;\phi)

As the

z_i

for each

x_i

are known, the log likelihood function can be simplified as below:

\ell(\phi,\mu,

	m
\Sigma)=\sum
	i=1

logp\left(x⁽ⁱ⁾\midz⁽ⁱ⁾;\mu,\Sigma\right)+logp\left(z⁽ⁱ⁾;\phi\right)

Now the likelihood function can be maximized by making partial derivative over

\mu,\Sigma,\phi

, obtaining:

\phi_j=

	1
	m

	m
\sum
	i=1

1\{z⁽ⁱ⁾=j\}

\mu_j=

	m
\sum		1\{z⁽ⁱ⁾=j\
	i=1

⁽ⁱ⁾

}

\Sigma_j=

	m
\sum		1\{z⁽ⁱ⁾=j\
	i=1

⁽ⁱ⁾

	(i)
-\mu
	j)(x

	m
-\mu
	i=1

1\{z⁽ⁱ⁾=j\}}

^[1]

z_i

is known, the estimation of the parameters results to be quite simple with maximum likelihood estimation. But if

z_i

is unknown it is much more complicated.^[2]

Being

a latent variable (i.e. not observed), with unlabeled scenario, the Expectation Maximization Algorithm is needed to estimate

as well as other parameters. Generally, this problem is set as a GMM since the data in each group is normally distributed.^[3]

In machine learning, the latent variable

is considered as a latent pattern lying under the data, which the observer is not able to see very directly.

x_i

is the known data, while

\phi,\mu,\Sigma

are the parameter of the model. With the EM algorithm, some underlying pattern

in the data

x_i

can be found, along with the estimation of the parameters. The wide application of this circumstance in machine learning is what makes EM algorithm so important.

EM algorithm in GMM

The EM algorithm consists of two steps: the E-step and the M-step. Firstly, the model parameters and the

z⁽ⁱ⁾

can be randomly initialized. In the E-step, the algorithm tries to guess the value of

z⁽ⁱ⁾

based on the parameters, while in the M-step, the algorithm updates the value of the model parameters based on the guess of

z⁽ⁱ⁾

of the E-step. These two steps are repeated until convergence is reached.

The algorithm in GMM is:

Repeat until convergence:

1. (E-step) For each

i,j

, set

	(i)
w
	j

:=p\left(z⁽ⁱ⁾=j|x⁽ⁱ⁾;\phi,\mu,\Sigma\right)

2. (M-step) Update the parameters

\phi_j:=

	1
	m

	m
\sum
	i=1

	(i)
w
	j

\mu_j:=

\sum

	(i)
w
	j

x⁽ⁱ⁾

i=1

\sum

	(i)
w
	j

i=1

\Sigma_j:=

\sum

	(i)
w
	j

\left(x⁽ⁱ⁾-\mu_j\right)\left(x⁽ⁱ⁾-\mu_j\right)^T

i=1

\sum

	(i)
w
	j

i=1

^[4]

With Bayes Rule, the following result is obtained by the E-step:

p\left(z⁽ⁱ⁾=j|x⁽ⁱ⁾;\phi,\mu,\Sigma\right)=

p\left(x⁽ⁱ⁾|z⁽ⁱ⁾=j;\mu,\Sigma\right)p\left(z⁽ⁱ⁾=j;\phi\right)

	k
\sum		p\left(x⁽ⁱ⁾\|z⁽ⁱ⁾=l;\mu,\Sigma\right)p\left(z⁽ⁱ⁾=l;\phi\right)
	l=1

According to GMM setting, these following formulas are obtained:

p\left(x⁽ⁱ⁾|z⁽ⁱ⁾=j;\mu,\Sigma\right)=

	1
	(2\pi)ⁿ\left\|\Sigma_j\right\|¹

\exp\left(-

	1
	2

\left(x⁽ⁱ⁾-\mu_j\right)^T

	-1
\Sigma
	j

\left(x⁽ⁱ⁾-\mu_j\right)\right)

p\left(z⁽ⁱ⁾=j;\phi\right)=\phi_j

In this way, a switch between the E-step and the M-step is possible, according to the randomly initialized parameters.

Notes and References

Web site: Ng . Andrew . CS229 Lecture notes .
Web site: Hui . Jonathan . Machine Learning —Expectation-Maximization Algorithm (EM) . Medium . en . 13 October 2019.
Web site: Tong . Y. L. . Multivariate normal distribution . Wikipedia . en . 2 July 2020.
Web site: Ng . Andrew . CS229 Lecture notes .