Logo Logo
Hilfe
Kontakt
Switch language to English
Tensor factorization for relational learning
Tensor factorization for relational learning
Relational learning is concerned with learning from data where information is primarily represented in form of relations between entities. In recent years, this branch of machine learning has become increasingly important, as relational data is generated in an unprecedented amount and has become ubiquitous in many fields of application such as bioinformatics, artificial intelligence and social network analysis. However, relational learning is a very challenging task, due to the network structure and the high dimensionality of relational data. In this thesis we propose that tensor factorization can be the basis for scalable solutions for learning from relational data and present novel tensor factorization algorithms that are particularly suited for this task. In the first part of the thesis, we present the RESCAL model -- a novel tensor factorization for relational learning -- and discuss its capabilities for exploiting the idiosyncratic properties of relational data. In particular, we show that, unlike existing tensor factorizations, our proposed method is capable of exploiting contextual information that is more distant in the relational graph. Furthermore, we present an efficient algorithm for computing the factorization. We show that our method achieves better or on-par results on common benchmark data sets, when compared to current state-of-the-art relational learning methods, while being significantly faster to compute. In the second part of the thesis, we focus on large-scale relational learning and its applications to Linked Data. By exploiting the inherent sparsity of relational data, an efficient computation of RESCAL can scale up to the size of large knowledge bases, consisting of millions of entities, hundreds of relations and billions of known facts. We show this analytically via a thorough analysis of the runtime and memory complexity of the algorithm as well as experimentally via the factorization of the YAGO2 core ontology and the prediction of relationships in this large knowledge base on a single desktop computer. Furthermore, we derive a new procedure to reduce the runtime complexity for regularized factorizations from O(r^5) to O(r^3) -- where r denotes the number of latent components of the factorization -- by exploiting special properties of the factorization. We also present an efficient method for including attributes of entities in the factorization through a novel coupled tensor-matrix factorization. Experimentally, we show that RESCAL allows us to approach several relational learning tasks that are important to Linked Data. In the third part of this thesis, we focus on the theoretical analysis of learning with tensor factorizations. Although tensor factorizations have become increasingly popular for solving machine learning tasks on various forms of structured data, there exist only very few theoretical results on the generalization abilities of these methods. Here, we present the first known generalization error bounds for tensor factorizations. To derive these bounds, we extend known bounds for matrix factorizations to the tensor case. Furthermore, we analyze how these bounds behave for learning on over- and understructured representations, for instance, when matrix factorizations are applied to tensor data. In the course of deriving generalization bounds, we also discuss the tensor product as a principled way to represent structured data in vector spaces for machine learning tasks. In addition, we evaluate our theoretical discussion with experiments on synthetic data, which support our analysis.
Tensor Factorization, Relational Learning, RESCAL
Nickel, Maximilian
2013
Englisch
Universitätsbibliothek der Ludwig-Maximilians-Universität München
Nickel, Maximilian (2013): Tensor factorization for relational learning. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik
[thumbnail of Nickel_Maximilian.pdf]
Vorschau
PDF
Nickel_Maximilian.pdf

2MB

Abstract

Relational learning is concerned with learning from data where information is primarily represented in form of relations between entities. In recent years, this branch of machine learning has become increasingly important, as relational data is generated in an unprecedented amount and has become ubiquitous in many fields of application such as bioinformatics, artificial intelligence and social network analysis. However, relational learning is a very challenging task, due to the network structure and the high dimensionality of relational data. In this thesis we propose that tensor factorization can be the basis for scalable solutions for learning from relational data and present novel tensor factorization algorithms that are particularly suited for this task. In the first part of the thesis, we present the RESCAL model -- a novel tensor factorization for relational learning -- and discuss its capabilities for exploiting the idiosyncratic properties of relational data. In particular, we show that, unlike existing tensor factorizations, our proposed method is capable of exploiting contextual information that is more distant in the relational graph. Furthermore, we present an efficient algorithm for computing the factorization. We show that our method achieves better or on-par results on common benchmark data sets, when compared to current state-of-the-art relational learning methods, while being significantly faster to compute. In the second part of the thesis, we focus on large-scale relational learning and its applications to Linked Data. By exploiting the inherent sparsity of relational data, an efficient computation of RESCAL can scale up to the size of large knowledge bases, consisting of millions of entities, hundreds of relations and billions of known facts. We show this analytically via a thorough analysis of the runtime and memory complexity of the algorithm as well as experimentally via the factorization of the YAGO2 core ontology and the prediction of relationships in this large knowledge base on a single desktop computer. Furthermore, we derive a new procedure to reduce the runtime complexity for regularized factorizations from O(r^5) to O(r^3) -- where r denotes the number of latent components of the factorization -- by exploiting special properties of the factorization. We also present an efficient method for including attributes of entities in the factorization through a novel coupled tensor-matrix factorization. Experimentally, we show that RESCAL allows us to approach several relational learning tasks that are important to Linked Data. In the third part of this thesis, we focus on the theoretical analysis of learning with tensor factorizations. Although tensor factorizations have become increasingly popular for solving machine learning tasks on various forms of structured data, there exist only very few theoretical results on the generalization abilities of these methods. Here, we present the first known generalization error bounds for tensor factorizations. To derive these bounds, we extend known bounds for matrix factorizations to the tensor case. Furthermore, we analyze how these bounds behave for learning on over- and understructured representations, for instance, when matrix factorizations are applied to tensor data. In the course of deriving generalization bounds, we also discuss the tensor product as a principled way to represent structured data in vector spaces for machine learning tasks. In addition, we evaluate our theoretical discussion with experiments on synthetic data, which support our analysis.