In data mining and machine learning, -flats algorithm[1] [2] is an iterative method which aims to partition observations into clusters where each cluster is close to a -flat, where is a given integer.
It is a generalization of the -means algorithm. In -means algorithm, clusters are formed in the way that each cluster is close to one point, which is a -flat. -flats algorithm gives better clustering result than -means algorithmfor some data set.
Given a set of observations
(a1,a2,...,am)
ai
A -flat is a subset of
\Realsn
\Realsq
n-1
F=\left\{x\midx\in\Realsn,W'x=\gamma\right\}
W\in\Realsn x
\gamma\in\Reals1 x
Denote a partition of
\{1,2,...,n\}
S=(S1,S2,...,Sk)
P | |
Fi |
(aj)
aj
Fi
\|aj-
P | |
Fi |
(aj)\|=\operatorname{dist}(aj,Fl)
aj
Fl
The algorithm is similar to the k-means algorithm (i.e. Lloyd's algorithm) in that it alternates between cluster assignment and cluster update. In specific, the algorithm starts with an initial set of -flats
(0) | |
F | |
l |
=\left\{x\inRn\mid
(0) | |
\left(W | |
l |
\right)'x=
(0) | |
\gamma | |
l |
\right\},l=1,...,k
l=1,...,k
A(l)\inRm(l) x
ai
(t+1) | |
W | |
l |
(n-q)
A(l)'\left(I-\tfrac{ee'}{m}\right)A(l)
(t+1) | |
\gamma | |
l |
=
| |||||||||
m |
Stop whenever the assignments no longer change.
The cluster assignment step uses the following fact: given a -flat
Fl=\{x\midW'x=\gamma\}
W'W=I
Fl
The key part of this algorithm is how to update the cluster, i.e. given points, how to find a -flat that minimizes the sum of squares of distances of each point to the -flat. Mathematically, this problem is: given
A\inRm x ,
where
A\in\Rm
e=(1,...,1)'\in\Rm
The problem can be solved using Lagrangian multiplier method and the solution is as given in the cluster update step.
It can be shown that the algorithm will terminate in a finite number of iterations (no more than the total number of possible assignments, which is bounded by
km
This convergence result is a consequence of the fact that can be solved exactly.The same convergence result holds for -means algorithm because the cluster update problem can be solved exactly.
-flats algorithm is a generalization of -means algorithm. In fact, -means algorithm is -flats algorithm since a point is a 0-flat. Despite their connection, they should be used in different scenarios. -flats algorithm for the case that data lie in a few low-dimensional spaces. -means algorithm is desirable for the case the clusters are of the ambient dimension. For example, if all observations lie in two lines, -flats algorithm with
q=1
Natural signals lie in a high-dimensional space. For example, the dimension of a 1024-by-1024 image is about 106, which is far too high for most signal processing algorithms. One way to get rid of the high dimensionality is to find a set of basis functions, such that the high-dimensional signal can be represented by only a few basis functions. In other words, the coefficients of the signal representation lies in a low-dimensional space, which is easier to apply signal processing algorithms. In the literature, wavelet transform is usually used in image processing, and fourier transform is usually used in audio processing. The set of basis functions is usually called a dictionary.
However, it is not clear what is the best dictionary to use once given a signal data set. One popular approach is to find a dictionary when given a data set using the idea of Sparse Dictionary Learning. It aims to find a dictionary, such that the signal can be sparsely represented by the dictionary. The optimization problem can be written as follows.
where
Ri
\|v\|0
\|V\|F
The idea of -flats algorithm is similar to sparse dictionary learning in nature. If we restrict the -flat to -dimensional subspace, then the -flats algorithm is simply finding the closed -dimensional subspace to a given signal. Sparse dictionary learning is also doing the same thing, except for an additional constraints on the sparsity of the representation. Mathematically, it is possible to show that -flats algorithm is of the form of sparse dictionary learning with an additional block structure on .
Let
Bk
d x q
Bk
Bkrk
rk
B=[B1, … ,BK]
The block structure of refers the fact that each signal is labeled to only one flat. Comparing the two formulations, -flat is the same as sparse dictionary modeling when
l=K x q
Classification is a procedure that classifies an input signal into different classes. One example is to classify an email into spam or non-spam classes. Classification algorithms usually require a supervised learning stage. In the supervised learning stage, training data for each class is used for the algorithm to learn the characteristics of the class. In the classification stage, a new observation is classified into a class by using the characteristics that were already trained.
-flat algorithm can be used for classification. Suppose there are total of m classes. For each class, flats are trained a priori via training data set. When a new data comes, find the flat that is closest to the new data. Then the new data is associate to class of the closest flat.
However, the classification performance can be further improved if we impose some structure on the flats. One possible choice is to require different flats from different class be sufficiently far apart. Some researchers[4] use this idea and develop a discriminative k q-flat algorithm.
Source:
In -flats algorithm,
\|x-PF(x)\|2
PF(x)
\|x-
2 | |
x | |
c\| |
xc
where is a positive semi-definite matrix.
If is the identity matrix, then the Mahalanobis metric is exactly the same as the error measure used in -means. If is not the identity matrix, then
2 | |
\|x-y\| | |
A |