Feng, Jing (2015): Informationtheoretic graph mining. Dissertation, LMU München: Faculty of Mathematics, Computer Science and Statistics 

PDF
Feng_Jing.pdf 1MB 
Abstract
Real world data from various application domains can be modeled as a graph, e.g. social networks and biomedical networks like protein interaction networks or coactivation networks of brain regions. A graph is a powerful concept to model arbitrary (structural) relationships among objects. In recent years, the prevalence of social networks has made graph mining an important center of attention in the data mining field. There are many important tasks in graph mining, such as graph clustering, outlier detection, and link prediction. Many algorithms have been proposed in the literature to solve these tasks. However, normally these issues are solved separately, although they are closely related. Detecting and exploiting the relationship among them is a new challenge in graph mining. Moreover, with data explosion, more information has already been integrated into graph structure. For example, bipartite graphs contain two types of node and graphs with node attributes offer additional nonstructural information. Therefore, more challenges arise from the increasing graph complexity. This thesis aims to solve these challenges in order to gain new knowledge from graph data. An important paradigm of data mining used in this thesis is the principle of Minimum Description Length (MDL). It follows the assumption: the more knowledge we have learned from the data, the better we are able to compress the data. The MDL principle balances the complexity of the selected model and the goodness of fit between model and data. Thus, it naturally avoids overfitting. This thesis proposes several algorithms based on the MDL principle to acquire knowledge from various types of graphs: Infospot (Automatically Spotting Informationrich Nodes in Graphs) proposes a parameterfree and efficient algorithm for the fully automatic detection of interesting nodes which is a novel outlier notion in graph. Then in contrast to traditional graph mining approaches that focus on discovering dense subgraphs, a novel graph mining technique CXprime (Compressionbased eXploiting Primitives) is proposed. It models the transitivity and the hubness of a graph using structure primitives (all possible threenode substructures). Under the coding scheme of CXprime, clusters with structural information can be discovered, dominating substructures of a graph can be distinguished, and a new link prediction score based on substructures is proposed. The next algorithm SCMiner (SummarizationCompression Miner) integrates tasks such as graph summarization, graph clustering, link prediction, and the discovery of the hidden structure of a bipartite graph on the basis of data compression. Finally, a method for nonredundant graph clustering called IROC (Informationtheoretic nonRedundant Overlapping Clustering) is proposed to smartly combine structural information with nonstructural information based on MDL. IROC is able to detect overlapping communities within subspaces of the attributes. To sum up, algorithms to unify different learning tasks for various types of graphs are proposed. Additionally, these algorithms are based on the MDL principle, which facilitates the unification of different graph learning tasks, the integration of different graph types, and the automatic selection of input parameters that are otherwise difficult to estimate.
Item Type:  Thesis (Dissertation, LMU Munich) 

Keywords:  Graph Mining, Graph Compression, Link Prediction, Minimum Description Length 
Subjects:  000 Computers, Information and General Reference 000 Computers, Information and General Reference > 004 Data processing computer science 
Faculties:  Faculty of Mathematics, Computer Science and Statistics 
Language:  English 
Date Accepted:  11. June 2015 
1. Referee:  Böhm, Christian 
Persistent Identifier (URN):  urn:nbn:de:bvb:19183384 
MD5 Checksum of the PDFfile:  4271fe0ef7bebb9c5a9d977783a186cd 
Signature of the printed copy:  0001/UMC 23030 
ID Code:  18338 
Deposited On:  29. Jun 2015 13:10 
Last Modified:  20. Jul 2016 10:39 