FLAME clustering explained
Fuzzy clustering by Local Approximation of MEmberships (FLAME) is a data clustering algorithm that defines clusters in the dense parts of a dataset and performs cluster assignment solely based on the neighborhood relationships among objects. The key feature of this algorithm is that the neighborhood relationships among neighboring objects in the feature space are used to constrain the memberships of neighboring objects in the fuzzy membership space.
Description of the FLAME algorithm
The FLAME algorithm is mainly divided into three steps:
- Extraction of the structure information from the dataset:
- Construct a neighborhood graph to connect each object to its K-Nearest Neighbors (KNN);
- Estimate a density for each object based on its proximities to its KNN;
- Objects are classified into 3 types:
- Cluster Supporting Object (CSO): object with density higher than all its neighbors;
- Cluster Outliers: object with density lower than all its neighbors, and lower than a predefined threshold;
- the rest.
- Local/Neighborhood approximation of fuzzy memberships:
- Initialization of fuzzy membership:
- Each CSO is assigned with fixed and full membership to itself to represent one cluster;
- All outliers are assigned with fixed and full membership to the outlier group;
- The rest are assigned with equal memberships to all clusters and the outlier group;
- Then the fuzzy memberships of all type 3 objects are updated by a converging iterative procedure called Local/Neighborhood Approximation of Fuzzy Memberships, in which the fuzzy membership of each object is updated by a linear combination of the fuzzy memberships of its nearest neighbors.
- Cluster construction from fuzzy memberships in two possible ways:
- One-to-one object-cluster assignment, to assign each object to the cluster in which it has the highest membership;
- One-to-multiple object-clusters assignment, to assign each object to the cluster in which it has a membership higher than a threshold.
The optimization problem in FLAME
The Local/Neighborhood Approximation of Fuzzy Memberships is a procedure to minimize the Local/Neighborhood Approximation Error (LAE/NAE) defined as the following:
E(\{\boldsymbol{p}\})=\sum\boldsymbol{x\in\boldsymbol{X}}\|\boldsymbol{p(x)}-\sum(x)}}w\boldsymbol{xy
} \boldsymbol \bigg\|^2
where
is the set of all type 3 objects,
is the fuzzy membership vector of object
,
is the set of nearest neighbors of
, and
} with
\sum\boldsymbol{y\in(x)}}w\boldsymbol{xy
}=1 are the coefficients reflecting the relative proximities of the nearest neighbors.
The NAE can be minimized by solving the following linear equations with unique solution which is the unique global minimum of NAE with value zero:
pk(\boldsymbol{x})-\sum\boldsymbol{y\in(x)}}w}pk(\boldsymbol{y})=0, \forall{\boldsymbol{x}\in\boldsymbol{X}}, k=1,...,M
where
is the number of CSOs plus one (for the outlier group). The following iterative procedure can be used to solve these linear equations:
{\boldsymbol{p}t+1(\boldsymbol{x})}=\sum(x)}}
w\boldsymbol{xy
}
See also
External links