Diffusion maps is a dimensionality reduction or feature extraction algorithm introduced by Coifman and Lafon which computes a family of embeddings of a data set into Euclidean space (often low-dimensional) whose coordinates can be computed from the eigenvectors and eigenvalues of a diffusion operator on the data. The Euclidean distance between points in the embedded space is equal to the "diffusion distance" between probability distributions centered at those points. Different from linear dimensionality reduction methods such as principal component analysis (PCA), diffusion maps are part of the family of nonlinear dimensionality reduction methods which focus on discovering the underlying manifold that the data has been sampled from. By integrating local similarities at different scales, diffusion maps give a global description of the data-set. Compared with other methods, the diffusion map algorithm is robust to noise perturbation and computationally inexpensive.
Following and, diffusion maps can be defined in four steps.
Diffusion maps exploit the relationship between heat diffusion and random walk Markov chain. The basic observation is that if we take a random walk on the data, walking to a nearby data-point is more likely than walking to another that is far away. Let
(X,l{A},\mu)
X
\mu
X
Based on this, the connectivity
k
x
y
x
y
k:X x X → R
k(x,y)=\exp\left(- | ||x-y||2 |
\epsilon |
\right)
More generally, the kernel function has the following properties
k(x,y)=k(y,x)
k
k(x,y)\geq0\forallx,y
k
The kernel constitutes the prior definition of the local geometry of the data-set. Since a given kernel will capture a specific feature of the data set, its choice should be guided by the application that one has in mind. This is a major difference with methods such as principal component analysis, where correlations between all data points are taken into account at once.
Given
(X,k)
X
d(x)=\intXk(x,y)d\mu(y)
and define:
p(x,y)=
k(x,y) | |
d(x) |
Although the new normalized kernel does not inherit the symmetric property, it does inherit the positivity-preserving property and gains a conservation property:
\intXp(x,y)d\mu(y)=1
From
p(x,y)
M
X
p(x,y)
x
y
Mt
We define the diffusion matrix
L
Li,j=k(xi,xj)
We then define the new kernel
(\alpha) | |
L | |
i,j |
=k(\alpha)(xi,xj)=
Li,j | ||||||||||||
|
L(\alpha)=D-\alphaLD-\alpha
Di,=\sumjLi,.
We apply the graph Laplacian normalization to this new kernel:
M=({D}(\alpha))-1L(\alpha),
D(\alpha)
(\alpha) | |
{D} | |
i,i |
=\sumj
(\alpha) | |
L | |
i,j |
.
p(xj,t|x
t | |
i,j |
One of the main ideas of the diffusion framework is that running the chain forward in time (taking larger and larger powers of
M
X
The eigendecomposition of the matrix
Mt
t | |
M | |
i,j |
=\suml
t | |
λ | |
l |
\psil(xi)\phil(xj)
where
\{λl\}
M
\{\psil\}
\{\phil\}
The reason to introduce the normalization step involving
\alpha
\alpha=1
\alpha=0.5
\alpha=0
The diffusion distance at time
t
Dt(xi,x
2 | |
j) |
=\sumy
| |||||||||||||
\phi0(y) |
where
\phi0(y)
M
\phi0(y)=
d(y) | |
\sumzd(z) |
Dt(xi,xj)
xi
xj
t
Dt(xi,xj)
t
xi
xj
The diffusion distance can be calculated using the eigenvectors by
Dt(xi,x
2=\sum | |
l |
2t | |
λ | |
l |
(\psil(xi)-\psil(x
2 | |
j)) |
So the eigenvectors can be used as a new set of coordinates for the data. The diffusion map is defined as:
\Psit(x)=(λ
t\psi | |
1(x),λ |
t\psi | |
2(x),\ldots,λ |
t\psi | |
k(x)) |
Because of the spectrum decay, it is sufficient to use only the first k eigenvectors and eigenvalues.Thus we get the diffusion map from the original data to a k-dimensional space which is embedded in the original space.
In it is proved that
Dt(xi,x
2 | |
j) |
≈ ||\Psit(xi)-\Psit(x
2 | |
j)|| |
The basic algorithm framework of diffusion map is as:
Step 1. Given the similarity matrix L.
Step 2. Normalize the matrix according to parameter
\alpha
L(\alpha)=D-\alphaLD-\alpha
Step 3. Form the normalized matrix
M=({D}(\alpha))-1L(\alpha)
Step 4. Compute the k largest eigenvalues of
Mt
Step 5. Use diffusion map to get the embedding
\Psit
In the paper Nadler et al. showed how to design a kernel that reproduces the diffusion induced by a Fokker–Planck equation. They also explained that, when the data approximate a manifold, one can recover the geometry of this manifold by computing an approximation of the Laplace–Beltrami operator. This computation is completely insensitiveto the distribution of the points and therefore provides a separation of the statistics and the geometry of thedata. Since diffusion maps give a global description of the data-set, they can measure the distances between pairs of sample points in the manifold in which the data is embedded. Applications based on diffusion maps include face recognition,[1] spectral clustering, low dimensional representation of images, image segmentation, 3D model segmentation, speaker verification[2] and identification, sampling on manifolds, anomaly detection,[3] image inpainting, revealing brain resting state networks organization [4] and so on.
Furthermore, the diffusion maps framework has been productively extended to complex networks,[5] revealing a functional organisation of networks which differs from the purely topological or structural one.