Achtert, Elke (2007): Hierarchical Subspace Clustering. Dissertation, LMU München: Faculty of Mathematics, Computer Science and Statistics 

PDF
Achtert_Elke.pdf 2375Kb 
Abstract
It is wellknown that traditional clustering methods considering all dimensions of the feature space usually fail in terms of efficiency and effectivity when applied to highdimensional data. This poor behavior is based on the fact that clusters may not be found in the highdimensional feature space, although clusters exist in subspaces of the feature space. To overcome these limitations of traditional clustering methods, several methods for subspace clustering have been proposed recently. Subspace clustering algorithms aim at automatically identifying lower dimensional subspaces of the feature space in which clusters exist. There exist two types of subspace clustering algorithms: Algorithms for detecting clusters in axisparallel subspaces and, as an extension, algorithms for finding clusters in subspaces which are arbitrarily oriented. Generally, the subspace clusters may be hierarchically nested, i.e., several subspace clusters of low dimensionality may form a subspace cluster of higher dimensionality. Since existing subspace clustering methods are not able to detect these complex structures, hierarchical approaches for subspace clustering have to be applied. The goal of this dissertation is to develop new efficient and effective methods for hierarchical subspace clustering by identifying novel challenges for the hierarchical approach and proposing innovative and solid solutions for these challenges. The first Part of this work deals with the analysis of hierarchical subspace clusters in axisparallel subspaces. Two new methods are proposed that search simultaneously for subspace clusters of arbitrary dimensionality in order to detect complex hierarchies of subspace clusters. Furthermore, a new visualization model of the clustering result by means of a graph representation is provided. In the second Part of this work new methods for hierarchical clustering in arbitrarily oriented subspaces of the feature space are discussed. The socalled correlation clustering can be seen as an extension of axisparallel subspace clustering. Correlation clustering aims at grouping the data set into subsets, the socalled correlation clusters, such that the objects in the same correlation cluster show uniform attribute correlations. Two new hierarchical approaches are proposed which combine densitybased clustering with Principal Component Analysis in order to identify hierarchies of correlation clusters. The last Part of this work addresses the analysis and interpretation of the results obtained from correlation clustering algorithms. A general method is introduced to extract quantitative information on the linear dependencies between the objects of given correlation clusters. Furthermore, these quantitative models can be used to predict the probability that an object is created by one of these models. Both, the efficiency and the effectiveness of the presented techniques are thoroughly analyzed. The benefits over traditional approaches are shown by evaluating the new methods on synthetic as well as realworld test data sets.
Item Type:  Thesis (Dissertation, LMU Munich) 

Keywords:  data mining, densitybased clustering, subspace clustering, correlation clustering, hierarchical clustering 
Subjects:  600 Natural sciences and mathematics 600 Natural sciences and mathematics > 510 Mathematics 
Faculties:  Faculty of Mathematics, Computer Science and Statistics 
Language:  English 
Date Accepted:  24. April 2007 
1. Referee:  Böhm, Christian 
Persistent Identifier (URN):  urn:nbn:de:bvb:1968071 
MD5 Checksum of the PDFfile:  c9a9b0720e2a693299f5799345f174df 
Signature of the printed copy:  0001/UMC 16196 
ID Code:  6807 
Deposited On:  30. May 2007 
Last Modified:  16. Oct 2012 08:06 