Statistical relational learning with nonparametric Bayesian models

www.lmu.de | UB | Blättern | FAQ

Zur erweiterten Suche

English

Zur erweiterten Suche

Statistical relational learning analyzes the probabilistic constraints between the entities, their attributes and relationships. It represents an area of growing interest in modern data mining. Many leading researches are proposed with promising results. However, there is no easily applicable recipe of how to turn a relational domain (e.g. a database) into a probabilistic model. There are mainly two reasons. First, structural learning in relational models is even more complex than structural learning in (non-relational) Bayesian networks due to the exponentially many attributes an attribute might depend on. Second, it might be difficult and expensive to obtain reliable prior knowledge for the domains of interest. To remove these constraints, this thesis applies nonparametric Bayesian analysis to relational learning and proposes two compelling models: Dirichlet enhanced relational learning and infinite hidden relational learning. Dirichlet enhanced relational learning (DERL) extends nonparametric hierarchical Bayesian modeling to relational data. In existing relational models, the model parameters are global, which means the conditional probability distributions are the same for each entity and the relationships are independent of each other. To solve the limitations, we introduce hierarchical Bayesian (HB) framework to relational learning, such that model parameters can be personalized, i.e. owned by entities or relationships, and are coupled via common prior distributions. Additional flexibility is introduced in a nonparametric HB modeling, such that the learned knowledge can be truthfully represented. For inference, we develop an efficient variational method, which is motivated by the Polya urn representation of DP. DERL is demonstrated in a medical domain where we form a nonparametric HB model for entities involving hospitals, patients, procedures and diagnoses. The experiments show that the additional flexibility introduced by the nonparametric HB modeling results in a more accurate model to represent the dependencies between different types of relationships and gives significantly improved prediction performance about unknown relationships. In infinite hidden relational model (IHRM), we apply nonparametric mixture modeling to relational data, which extends the expressiveness of a relational model by introducing for each entity an infinite-dimensional hidden variable as part of a Dirichlet process (DP) mixture model. There are mainly three advantages. First, this reduces the extensive structural learning, which is particularly difficult in relational models due to the huge number of potential probabilistic parents. Second, the information can globally propagate in the ground network defined by the relational structure. Third, the number of mixture components for each entity class can be optimized by the model itself based on the data. IHRM can be applied for entity clustering and relationship/attribute prediction, which are two important tasks in relational data mining. For inference of IHRM, we develop four algorithms: collapsed Gibbs sampling with the Chinese restaurant process, blocked Gibbs sampling with the truncated stick breaking construction (SBC), and mean-field inference with truncated SBC, as well as an empirical approximation. IHRM is evaluated in three different domains: a recommendation system based on the MovieLens data set, prediction of the functions of yeast genes/proteins on the data set of KDD Cup 2001, and the medical data analysis. The experimental results show that IHRM gives significantly improved estimates of attributes/relationships and highly interpretable entity clusters in complex relational data.

Statistical relational learning, relationship uncertainty, link prediction, entity clustering, nonparametric Bayesian analysis, hierarchical Bayesian models, mixture models, Dirichlet process, variational inference, MCMC sampling

Xu, Zhao

25. Jul. 2007

2007

Englisch

Universitätsbibliothek der Ludwig-Maximilians-Universität München

https://nbn-resolving.org/urn:nbn:de:bvb:19-76196

Xu, Zhao (2007): Statistical relational learning with nonparametric Bayesian models. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik

Vorschau

PDF
xu_zhao.pdf
1MB

DOI: 10.5282/edoc.7619

URN: urn:nbn:de:bvb:19-76196

Abstract

Dokumententyp:	Dissertationen (Dissertation, LMU München)
Keywords:	Statistical relational learning, relationship uncertainty, link prediction, entity clustering, nonparametric Bayesian analysis, hierarchical Bayesian models, mixture models, Dirichlet process, variational inference, MCMC sampling
Themengebiete:	500 Naturwissenschaften und Mathematik > 510 Mathematik 500 Naturwissenschaften und Mathematik
Fakultäten:	Fakultät für Mathematik, Informatik und Statistik
Sprache der Hochschulschrift:	Englisch
Datum der mündlichen Prüfung:	25. Juli 2007
1. Berichterstatter:in:	Kriegel, Hans-Peter
MD5 Prüfsumme der PDF-Datei:	13dff99ddd9b77079cf4124a3216fa60
Signatur der gedruckten Ausgabe:	0001/UMC 16607
ID Code:	7619
Eingestellt am:	05. Nov. 2007
Letzte Änderungen:	24. Oct. 2020 08:04

Nur für Administratoren und Editoren: Dokument bearbeiten