Yu, Shipeng (2006): Advanced Probabilistic Models for Clustering and Projection. Dissertation, LMU München: Faculty of Mathematics, Computer Science and Statistics 

PDF
Yu_Shipeng.pdf 10MB 
Abstract
Probabilistic modeling for data mining and machine learning problems is a fundamental research area. The general approach is to assume a generative model underlying the observed data, and estimate model parameters via likelihood maximization. It has the deep probability theory as the mathematical background, and enjoys a large amount of methods from statistical learning, sampling theory and Bayesian statistics. In this thesis we study several advanced probabilistic models for data clustering and feature projection, which are the two important unsupervised learning problems. The goal of clustering is to group similar data points together to uncover the data clusters. While numerous methods exist for various clustering tasks, one important question still remains, i.e., how to automatically determine the number of clusters. The first part of the thesis answers this question from a mixture modeling perspective. A finite mixture model is first introduced for clustering, in which each mixture component is assumed to be an exponential family distribution for generality. The model is then extended to an infinite mixture model, and its strong connection to Dirichlet process (DP) is uncovered which is a nonparametric Bayesian framework. A variational Bayesian algorithm called VBDMA is derived from this new insight to learn the number of clusters automatically, and empirical studies on some 2D data sets and an image data set verify the effectiveness of this algorithm. In feature projection, we are interested in dimensionality reduction and aim to find a lowdimensional feature representation for the data. We first review the wellknown principal component analysis (PCA) and its probabilistic interpretation (PPCA), and then generalize PPCA to a novel probabilistic model which is able to handle nonlinear projection known as kernel PCA. An expectationmaximization (EM) algorithm is derived for kernel PCA such that it is fast and applicable to large data sets. Then we propose a novel supervised projection method called MORP, which can take the output information into account in a supervised learning context. Empirical studies on various data sets show much better results compared to unsupervised projection and other supervised projection methods. At the end we generalize MORP probabilistically to propose SPPCA for supervised projection, and we can also naturally extend the model to S2PPCA which is a semisupervised projection method. This allows us to incorporate both the label information and the unlabeled data into the projection process. In the third part of the thesis, we introduce a unified probabilistic model which can handle data clustering and feature projection jointly. The model can be viewed as a clustering model with projected features, and a projection model with structured documents. A variational Bayesian learning algorithm can be derived, and it turns out to iterate the clustering operations and projection operations until convergence. Superior performance can be obtained for both clustering and projection.
Item Type:  Thesis (Dissertation, LMU Munich) 

Keywords:  Probabilistic modeling, Clustering, Projection, Mixture modeling, Dirichlet process, Dimensionality reduction, Feature transformation 
Subjects:  600 Natural sciences and mathematics 600 Natural sciences and mathematics > 510 Mathematics 
Faculties:  Faculty of Mathematics, Computer Science and Statistics 
Language:  English 
Date Accepted:  29. September 2006 
1. Referee:  Kriegel, HansPeter 
Persistent Identifier (URN):  urn:nbn:de:bvb:1958840 
MD5 Checksum of the PDFfile:  85e5058373a8b2b128c5230ce64042cb 
Signature of the printed copy:  0001/UMC 15691 
ID Code:  5884 
Deposited On:  12. Oct 2006 
Last Modified:  19. Jul 2016 16:20 