Multi-purpose exploratory mining of complex data

www.lmu.de | UB | Blättern | FAQ

Zur erweiterten Suche

English

Zur erweiterten Suche

Due to the increasing power of data acquisition and data storage technologies, a large amount of data sets with complex structure are collected in the era of data explosion. Instead of simple representations by low-dimensional numerical features, such data sources range from high-dimensional feature spaces to graph data describing relationships among objects. Many techniques exist in the literature for mining simple numerical data but only a few approaches touch the increasing challenge of mining complex data, such as high-dimensional vectors of non-numerical data type, time series data, graphs, and multi-instance data where each object is represented by a finite set of feature vectors. Besides, there are many important data mining tasks for high-dimensional data, such as clustering, outlier detection, dimensionality reduction, similarity search, classification, prediction and result interpretation. Many algorithms have been proposed to solve these tasks separately, although in some cases they are closely related. Detecting and exploiting the relationships among them is another important challenge. This thesis aims to solve these challenges in order to gain new knowledge from complex high-dimensional data. We propose several new algorithms combining different data mining tasks to acquire novel knowledge from complex high-dimensional data: ROCAT (Relevant Overlapping Subspace Clusters on Categorical Data) automatically detects the most relevant overlapping subspace clusters on categorical data. It integrates clustering, feature selection and pattern mining without any input parameters in an information theoretic way. The next algorithm MSS (Multiple Subspace Selection) finds multiple low-dimensional subspaces for moderately high-dimensional data, each exhibiting an interesting cluster structure. For better interpretation of the results, MSS visualizes the clusters in multiple low-dimensional subspaces in a hierarchical way. SCMiner (Summarization-Compression Miner) focuses on bipartite graph data, which integrates co-clustering, graph summarization, link prediction, and the discovery of the hidden structure of a bipartite graph data on the basis of data compression. Finally, we propose a novel similarity measure for multi-instance data. The Probabilistic Integral Metric (PIM) is based on a probabilistic generative model requiring few assumptions. Experiments demonstrate the effectiveness and efficiency of PIM for similarity search (multi-instance data indexing with M-tree), explorative data analysis and data mining (multi-instance classification). To sum up, we propose algorithms combining different data mining tasks for complex data with various data types and data structures to discover the novel knowledge hidden behind the complex data.

Exploratory Data Mining, Subspace Clustering, Minimum Description Length, Multi-instance Indexing

He, Xiao

05. Nov. 2014

2014

Englisch

Universitätsbibliothek der Ludwig-Maximilians-Universität München

https://nbn-resolving.org/urn:nbn:de:bvb:19-175984

He, Xiao (2014): Multi-purpose exploratory mining of complex data. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik

Vorschau

PDF
He_Xiao.pdf
5MB

DOI: 10.5282/edoc.17598

URN: urn:nbn:de:bvb:19-175984

Abstract

Dokumententyp:	Dissertationen (Dissertation, LMU München)
Keywords:	Exploratory Data Mining, Subspace Clustering, Minimum Description Length, Multi-instance Indexing
Themengebiete:	000 Allgemeines, Informatik, Informationswissenschaft 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik
Fakultäten:	Fakultät für Mathematik, Informatik und Statistik
Sprache der Hochschulschrift:	Englisch
Datum der mündlichen Prüfung:	5. November 2014
1. Berichterstatter:in:	Böhm, Christian
MD5 Prüfsumme der PDF-Datei:	d3c168f29b690c57bf68f4c1c95ce6e8
Signatur der gedruckten Ausgabe:	0001/UMC 22484
ID Code:	17598
Eingestellt am:	10. Nov. 2014 08:52
Letzte Änderungen:	23. Oct. 2020 22:51

Nur für Administratoren und Editoren: Dokument bearbeiten