SUBCLU is an algorithm for clustering high-dimensional data by Karin Kailing, Hans-Peter Kriegel and Peer Kröger.[1] It is a subspace clustering algorithm that builds on the density-based clustering algorithm DBSCAN. SUBCLU can find clusters in axis-parallel subspaces, and uses a bottom-up, greedy strategy to remain efficient.
SUBCLU uses a monotonicity criteria: if a cluster is found in a subspace
S
T\subseteqS
C\subseteqDB
S
T\subseteqS
T
C
S
T\subseteqS
This downward-closure property is utilized by SUBCLU in a way similar to the Apriori algorithm: first, all 1-dimensional subspaces are clustered. All clusters in a higher-dimensional subspace will be subsets of the clusters detected in this first clustering. SUBCLU hence recursively produces
k+1
k
k-1
k
k+1
SUBCLU takes two parameters,
\epsilon
MinPts
SUBCLU(DB,eps,MinPts)
S1:=\emptyset
C1:=\emptyset
foreacha\inAttributes
C\{a\
if(C\{a\
S1:=S1\cup\{a\}
C1:=C1\cupC\{a\
endif
endfor
// In a second step,
k+1
k
k:=1
while(Ck ≠ \emptyset)
CandSk+1:=GenerateCandidateSubspaces(Sk)
foreachcand\inCandSk+1
bestSubspace:=
min | |
s\inSk\wedges\subsetcand |
\sum | |
Ci\inCs |
|Ci|
Ccand:=\emptyset
foreachclustercl\inCbestSubspace
Ccand:=Ccand\cupDBSCAN(cl,cand,eps,MinPts)
if(Ccand ≠ \emptyset)
Sk+1:=Sk+1\cupcand
Ck+1:=Ck+1\cupCcand
endif
endfor
endfor
k:=k+1
endwhile
end
The set
Sk
k
Ck
bestSubspace
Candidate subspaces are generated much alike the Apriori algorithm generates the frequent itemset candidates: Pairs of the
k
k+1
k
GenerateCandidateSubspaces(Sk)
CandSk+1:=\emptyset
foreachs1\inSk
foreachs2\inSk
if(s1ands2differinexactlyoneattribute)
CandSk+1:=CandSk+1\cup\{s1\cups2\}
endif
endfor
endfor
// Pruning of irrelevant candidate subspaces
foreachcand\inCandSk+1
foreachktt{-element}s\subsetcand
if(s\not\inSk)
CandSk+1=CandSk+1\setminus\{cand\}
endif
endfor
endfor
end
An example implementation of SUBCLU is available in the ELKI framework.