Maximum Variance Unfolding (MVU), also known as Semidefinite Embedding (SDE), is an algorithm in computer science that uses semidefinite programming to perform non-linear dimensionality reduction of high-dimensional vectorial input data.
It is motivated by the observation that kernel Principal Component Analysis (kPCA) does not reduce the data dimensionality, as it leverages the Kernel trick to non-linearly map the original data into an inner-product space.
MVU creates a mapping from the high dimensional input vectors to some low dimensional Euclidean vector space in the following steps:
The steps of applying semidefinite programming followed by a linear dimensionality reduction step to recover a low-dimensional embedding into a Euclidean space were first proposed by Linial, London, and Rabinovich.
Let
X
Y
i,j
|Xi-Xj|2=|Yi-Yj|2
Let
G,K
X
Y
Gij=Xi ⋅ Xj,Kij=Yi ⋅ Yj
i,j
G,K
Gii+Gjj-Gij-Gji=Kii+Kjj-Kij-Kji
In addition, we also want to constrain the embedding
Y
0=|\sumiYi
2\Leftrightarrow(\sum | |
| | |
i |
Yi) ⋅ (\sumiYi)\Leftrightarrow\sumi,jYi ⋅ Yj\Leftrightarrow\sumi,jKij
As described above, except the distances of neighbor points are preserved, the algorithm aims to maximize the pairwise distance of every pair of points. The objective function to be maximized is:
T(Y)=\dfrac{1}{2N}\sumi,j|Yi-Yj|2
Intuitively, maximizing the function above is equivalent to pulling the points as far away from each other as possible and therefore "unfold" the manifold. The local isometry constraint
Let
\tau=max\{ηij|Yi-Yj|2\}
ηij:=\begin{cases} 1&if iisaneighbourofj\\ 0&otherwise. \end{cases}
prevents the objective function from diverging (going to infinity).
Since the graph has N points, the distance between any two points
|Yi-Yj|2\leqN\tau
T(Y)=\dfrac{1}{2N}\sumi,j|Yi-Yj|2\leq\dfrac{1}{2N}\sumi,j(N\tau)2=\dfrac{N3\tau2}{2}
The objective function can be rewritten purely in the form of the Gram matrix:
\begin{align} T(Y)&{}=\dfrac{1}{2N}\sumi,j|Yi-Yj|2\\ &{}=\dfrac{1}{2N}\sumi,j
2-Y | |
(Y | |
i |
⋅ Yj-Yj ⋅ Yi)\\ &{}=\dfrac{1}{2N}(\sumi,j
2+\sum | |
Y | |
i,j |
2-\sum | |
Y | |
i,j |
Yi ⋅ Yj-\sumi,jYj ⋅ Yi)\ &{}=\dfrac{1}{2N}(\sumi,j
2+\sum | |
Y | |
i,j |
2-0 | |
Y | |
j |
-0)\\ &{}=\dfrac{1}{N}(\sumi
2)=\dfrac{1}{N}(Tr(K))\\ \end{align} | |
Y | |
i |
Finally, the optimization can be formulated as:
\begin{align} &Maximize&&Tr(K)\\ &subjectto&&K\succeq0,\sumijKij=0\\ &and&&Gii+Gjj-Gij-Gji=Kii+Kjj-Kij-Kji,\foralli,jwhereηij=1, \end{align}
After the Gram matrix
K
Y
In particular, the Gram matrix can be written as
Kij
N | |
=\sum | |
\alpha=1 |
(λ\alphaV\alphaV\alpha)
V\alpha
V\alpha
λ\alpha
It follows that the
\alpha
Yi
\sqrt{λ\alpha