Nickel, Maximilian (2013): Tensor factorization for relational learning. Dissertation, LMU München: Faculty of Mathematics, Computer Science and Statistics 

PDF
Nickel_Maximilian.pdf 2MB 
Abstract
Relational learning is concerned with learning from data where information is primarily represented in form of relations between entities. In recent years, this branch of machine learning has become increasingly important, as relational data is generated in an unprecedented amount and has become ubiquitous in many fields of application such as bioinformatics, artificial intelligence and social network analysis. However, relational learning is a very challenging task, due to the network structure and the high dimensionality of relational data. In this thesis we propose that tensor factorization can be the basis for scalable solutions for learning from relational data and present novel tensor factorization algorithms that are particularly suited for this task. In the first part of the thesis, we present the RESCAL model  a novel tensor factorization for relational learning  and discuss its capabilities for exploiting the idiosyncratic properties of relational data. In particular, we show that, unlike existing tensor factorizations, our proposed method is capable of exploiting contextual information that is more distant in the relational graph. Furthermore, we present an efficient algorithm for computing the factorization. We show that our method achieves better or onpar results on common benchmark data sets, when compared to current stateoftheart relational learning methods, while being significantly faster to compute. In the second part of the thesis, we focus on largescale relational learning and its applications to Linked Data. By exploiting the inherent sparsity of relational data, an efficient computation of RESCAL can scale up to the size of large knowledge bases, consisting of millions of entities, hundreds of relations and billions of known facts. We show this analytically via a thorough analysis of the runtime and memory complexity of the algorithm as well as experimentally via the factorization of the YAGO2 core ontology and the prediction of relationships in this large knowledge base on a single desktop computer. Furthermore, we derive a new procedure to reduce the runtime complexity for regularized factorizations from O(r^5) to O(r^3)  where r denotes the number of latent components of the factorization  by exploiting special properties of the factorization. We also present an efficient method for including attributes of entities in the factorization through a novel coupled tensormatrix factorization. Experimentally, we show that RESCAL allows us to approach several relational learning tasks that are important to Linked Data. In the third part of this thesis, we focus on the theoretical analysis of learning with tensor factorizations. Although tensor factorizations have become increasingly popular for solving machine learning tasks on various forms of structured data, there exist only very few theoretical results on the generalization abilities of these methods. Here, we present the first known generalization error bounds for tensor factorizations. To derive these bounds, we extend known bounds for matrix factorizations to the tensor case. Furthermore, we analyze how these bounds behave for learning on over and understructured representations, for instance, when matrix factorizations are applied to tensor data. In the course of deriving generalization bounds, we also discuss the tensor product as a principled way to represent structured data in vector spaces for machine learning tasks. In addition, we evaluate our theoretical discussion with experiments on synthetic data, which support our analysis.
Item Type:  Thesis (Dissertation, LMU Munich) 

Keywords:  Tensor Factorization, Relational Learning, RESCAL 
Subjects:  000 Computers, Information and General Reference 000 Computers, Information and General Reference > 004 Data processing computer science 
Faculties:  Faculty of Mathematics, Computer Science and Statistics 
Language:  English 
Date of oral examination:  14. August 2013 
1. Referee:  Tresp, Volker 
MD5 Checksum of the PDFfile:  a8443bd9affab90c66c2dee69463fecc 
Signature of the printed copy:  0001/UMC 21567 
ID Code:  16056 
Deposited On:  15. Oct 2013 09:05 
Last Modified:  20. Jul 2016 10:33 