Hierarchical Subspace Clustering

www.lmu.de | UB | Blättern | FAQ

Zur erweiterten Suche

English

Zur erweiterten Suche

It is well-known that traditional clustering methods considering all dimensions of the feature space usually fail in terms of efficiency and effectivity when applied to high-dimensional data. This poor behavior is based on the fact that clusters may not be found in the high-dimensional feature space, although clusters exist in subspaces of the feature space. To overcome these limitations of traditional clustering methods, several methods for subspace clustering have been proposed recently. Subspace clustering algorithms aim at automatically identifying lower dimensional subspaces of the feature space in which clusters exist. There exist two types of subspace clustering algorithms: Algorithms for detecting clusters in axis-parallel subspaces and, as an extension, algorithms for finding clusters in subspaces which are arbitrarily oriented. Generally, the subspace clusters may be hierarchically nested, i.e., several subspace clusters of low dimensionality may form a subspace cluster of higher dimensionality. Since existing subspace clustering methods are not able to detect these complex structures, hierarchical approaches for subspace clustering have to be applied. The goal of this dissertation is to develop new efficient and effective methods for hierarchical subspace clustering by identifying novel challenges for the hierarchical approach and proposing innovative and solid solutions for these challenges. The first Part of this work deals with the analysis of hierarchical subspace clusters in axis-parallel subspaces. Two new methods are proposed that search simultaneously for subspace clusters of arbitrary dimensionality in order to detect complex hierarchies of subspace clusters. Furthermore, a new visualization model of the clustering result by means of a graph representation is provided. In the second Part of this work new methods for hierarchical clustering in arbitrarily oriented subspaces of the feature space are discussed. The so-called correlation clustering can be seen as an extension of axis-parallel subspace clustering. Correlation clustering aims at grouping the data set into subsets, the so-called correlation clusters, such that the objects in the same correlation cluster show uniform attribute correlations. Two new hierarchical approaches are proposed which combine density-based clustering with Principal Component Analysis in order to identify hierarchies of correlation clusters. The last Part of this work addresses the analysis and interpretation of the results obtained from correlation clustering algorithms. A general method is introduced to extract quantitative information on the linear dependencies between the objects of given correlation clusters. Furthermore, these quantitative models can be used to predict the probability that an object is created by one of these models. Both, the efficiency and the effectiveness of the presented techniques are thoroughly analyzed. The benefits over traditional approaches are shown by evaluating the new methods on synthetic as well as real-world test data sets.

data mining, density-based clustering, subspace clustering, correlation clustering, hierarchical clustering

Achtert, Elke

24. Apr. 2007

2007

Englisch

Universitätsbibliothek der Ludwig-Maximilians-Universität München

https://nbn-resolving.org/urn:nbn:de:bvb:19-68071

Achtert, Elke (2007): Hierarchical Subspace Clustering. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik

Vorschau

PDF
Achtert_Elke.pdf
2MB

DOI: 10.5282/edoc.6807

URN: urn:nbn:de:bvb:19-68071

Abstract

Dokumententyp:	Dissertationen (Dissertation, LMU München)
Keywords:	data mining, density-based clustering, subspace clustering, correlation clustering, hierarchical clustering
Themengebiete:	500 Naturwissenschaften und Mathematik 500 Naturwissenschaften und Mathematik > 510 Mathematik
Fakultäten:	Fakultät für Mathematik, Informatik und Statistik
Sprache der Hochschulschrift:	Englisch
Datum der mündlichen Prüfung:	24. April 2007
1. Berichterstatter:in:	Böhm, Christian
MD5 Prüfsumme der PDF-Datei:	c9a9b0720e2a693299f5799345f174df
Signatur der gedruckten Ausgabe:	0001/UMC 16196
ID Code:	6807
Eingestellt am:	30. May 2007
Letzte Änderungen:	24. Oct. 2020 08:34

Nur für Administratoren und Editoren: Dokument bearbeiten