Logo Logo
Hilfe
Kontakt
Switch language to English
Federated and continual learning with deep learning methods for natural language text understanding
Federated and continual learning with deep learning methods for natural language text understanding
Natural language text understanding is a sub-domain of natural language processing (NLP) that focuses on developing machine computational models to parse and extract syntactic and semantic knowledge from text documents in human language. In doing so, these models generate higher-order vector representations from the underlying text, which aligns with the paradigm of representation learning. These representations are used in focused downstream applications (tasks) to build an overall understanding of the underlying text. In real-world scenarios, text data is often generated as a continuous stream where the distribution can drift over time. However, traditional machine learning (ML) is based on isolated data and lacks the ability to learn continuously, making it prone to errors over time. Therefore, as humans continually learn over their lifetime, the development of continual learning (CL) approaches for ML models is crucial, as they enable models to continually accumulate, adapt and transfer knowledge to enhance future learning without forgetting past knowledge. Similarly, just like humans acquire knowledge by interacting with other humans, an ML model on one client can leverage relevant knowledge captured by ML models on other clients. However, due to data sovereignty and privacy regulations like GDPR, knowledge transfer via data sharing is restricted. This has led to the development of federated learning (FL) approaches, which facilitate knowledge transfer across clients by sharing model parameters instead of exchanging raw data, thus preserving privacy. This dissertation mainly focuses on three NLP tasks: topic modeling, language modeling and text classification. Topic Modeling (TM) is an unsupervised approach that utilizes word co-occurrence patterns to capture latent thematic clusters of words, known as topics, in a large collection of documents. Each topic represents a certain semantic theme and facilitates understanding of the document corpus via distribution over the word vocabulary. While traditional TMs use generative probabilistic modeling, neural topic models (NTMs) leverage the expressive and generalization power of artificial neural networks (ANNs) to capture complex patterns and boost the effectiveness of topic modeling. In doing so, NTMs utilize the extracted topic-word distributions to condense the semantic knowledge of a document into a global document-topic representation vector. This representation can be further leveraged to enhance coarse-grained text understanding in various downstream tasks, including document classification, information retrieval, and text generation. In addition to Topic Modeling, Language Modeling (LM) is another unsupervised approach which assigns probabilities to each word in a text sequence given the backward and forward context. Neural language models (NLMs), such as RNNs, perform language modeling to capture syntactic information and accumulate semantic knowledge over the input sequence. They achieve this by capturing the semantic dependencies between words. In doing so, they generate localized word representations which assist in improving fine-grained text understanding via named entity recognition (NER), word sense disambiguation (WSD) and text generation tasks among others. We also use a Convolutional Neural Network (CNN)-based model for text classification tasks. This model performs 1D convolutions over word sequences to generate a global text representation. NTMs, utilizing bag-of-words (BoW) representations of documents, have demonstrated efficiency in the extraction of latent topics and document-topic representations. However, because NLMs consider word order information, they can be expensive and challenging when capturing long-range semantic dependencies in large documents. Therefore, integrating an NTM and an NLM within a composite modeling framework can complement each other. Specifically, the NLM can leverage the broad document-level semantic knowledge captured by the NTM. This allows the NLM to effectively capture long-range semantic dependencies in large documents, thereby enhancing their performance while minimizing computational costs. Although NTMs are powerful in modeling large text corpora, topic modeling remains challenging, especially in sparse (low-resource) data settings where word co-occurrence data is insufficient. Therefore, prior knowledge transfer in sparse data settings via (1) topical semantics (topic-word distributions) from one or more high-resource data, and (2) distributed word representations from pre-trained models like GloVe, FastText and BERT, with appropriate domain alignment, can alleviate this issue and generate coherent topics along with meaningful document representations. In this dissertation, we have investigated the above-mentioned problem statements, and have consequently developed novel deep neural architectures. These architectures aim to learn meaningful local and global representations, improving fine-grained and coarse-grained text understanding respectively. Moreover, to tackle the challenges posed by drift in data distributions and data privacy concerns in real-world scenarios, we have devised novel frameworks for continual learning and federated learning for topic modeling and text classification tasks respectively. Specifically, we make the following contributions in Chapters 2-7: - Continual Learning: In Chapter 5, we propose a novel neural topic modeling framework to tackle the inadequate word co-occurrence statistics under sparse data settings. Here, we demonstrate that the transfer of relevant knowledge using multi-view embedding spaces (topic-word distributions and word embeddings) from multiple high-resource data sources with domain alignment can enhance the quality of extracted topics and document representations for the sparse target data. These contributions can be viewed as the "knowledge transfer" component of a continual learning framework. Therefore, building on this, in Chapter 3, we propose a continual learning framework for neural topic modeling that can continually accumulate and transfer knowledge over its lifetime while minimizing catastrophic forgetting. Further, we extend the scope in Chapter 6, where we perform continual learning for text classification on several clients with their own private heterogeneous data streams. - Federated Learning: In Chapter 6, we investigate the techniques for privacy-preserving knowledge transfer across clients under heterogeneous data distribution settings. In doing so, we combine the federated and continual learning paradigms to propose a novel framework that minimizes catastrophic forgetting at each client while maximizing knowledge transfer across clients for the text classification task. - Neural Composite Modeling: In Chapter 2, we propose a novel neural composite modeling framework that exposes an NLM to the document-level semantics via NTM. In doing so, we leverage the latent and explainable topics and model the sentence-level topical discourse in a joint learning fashion. Similarly, in Chapter 4, we propose a novel neural composite modeling framework where we expose large language models (LLMs), like BERT, to the broad document-level semantics via NTM to boost the performance via complementary learning and optimize the computational costs of fine-tuning for the text classification task. - Neural Topic Modeling for Sparse Data: Chapter 7 investigates the information retrieval (IR), information extraction (IE) and ranking techniques for fine-grained knowledge extraction in the biomedical domain. Here, we introduce word-level attention in supervised NTM along with word embedding prior to generate meaningful document representations for the IR task and then re-rank the retrieved documents using the BM25 algorithm. In summary, this dissertation demonstrates significant contributions to the improvement of NLP-based systems for various tasks such as topic modeling, language modeling and text classification. In particular, novel algorithms and techniques have been developed in the areas of neural composite modeling, continual learning and federated learning, each of which has achieved state-of-the-art results and overall led to improved machine understanding of textual documents.
Federated Learning, Continual Learning, Lifelong Learning, Federated Continual Learning, Transfer Learning, Catastrophic Forgetting, Composite Modeling, Language Modeling, Topic Modeling, Representation Learning, Text Classification, Information Retrieval, Domain Alignment
Chaudhary, Yatin
2024
Englisch
Universitätsbibliothek der Ludwig-Maximilians-Universität München
Chaudhary, Yatin (2024): Federated and continual learning with deep learning methods for natural language text understanding. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik
[thumbnail of Chaudhary_Yatin.pdf]
Vorschau
PDF
Chaudhary_Yatin.pdf

11MB

Abstract

Natural language text understanding is a sub-domain of natural language processing (NLP) that focuses on developing machine computational models to parse and extract syntactic and semantic knowledge from text documents in human language. In doing so, these models generate higher-order vector representations from the underlying text, which aligns with the paradigm of representation learning. These representations are used in focused downstream applications (tasks) to build an overall understanding of the underlying text. In real-world scenarios, text data is often generated as a continuous stream where the distribution can drift over time. However, traditional machine learning (ML) is based on isolated data and lacks the ability to learn continuously, making it prone to errors over time. Therefore, as humans continually learn over their lifetime, the development of continual learning (CL) approaches for ML models is crucial, as they enable models to continually accumulate, adapt and transfer knowledge to enhance future learning without forgetting past knowledge. Similarly, just like humans acquire knowledge by interacting with other humans, an ML model on one client can leverage relevant knowledge captured by ML models on other clients. However, due to data sovereignty and privacy regulations like GDPR, knowledge transfer via data sharing is restricted. This has led to the development of federated learning (FL) approaches, which facilitate knowledge transfer across clients by sharing model parameters instead of exchanging raw data, thus preserving privacy. This dissertation mainly focuses on three NLP tasks: topic modeling, language modeling and text classification. Topic Modeling (TM) is an unsupervised approach that utilizes word co-occurrence patterns to capture latent thematic clusters of words, known as topics, in a large collection of documents. Each topic represents a certain semantic theme and facilitates understanding of the document corpus via distribution over the word vocabulary. While traditional TMs use generative probabilistic modeling, neural topic models (NTMs) leverage the expressive and generalization power of artificial neural networks (ANNs) to capture complex patterns and boost the effectiveness of topic modeling. In doing so, NTMs utilize the extracted topic-word distributions to condense the semantic knowledge of a document into a global document-topic representation vector. This representation can be further leveraged to enhance coarse-grained text understanding in various downstream tasks, including document classification, information retrieval, and text generation. In addition to Topic Modeling, Language Modeling (LM) is another unsupervised approach which assigns probabilities to each word in a text sequence given the backward and forward context. Neural language models (NLMs), such as RNNs, perform language modeling to capture syntactic information and accumulate semantic knowledge over the input sequence. They achieve this by capturing the semantic dependencies between words. In doing so, they generate localized word representations which assist in improving fine-grained text understanding via named entity recognition (NER), word sense disambiguation (WSD) and text generation tasks among others. We also use a Convolutional Neural Network (CNN)-based model for text classification tasks. This model performs 1D convolutions over word sequences to generate a global text representation. NTMs, utilizing bag-of-words (BoW) representations of documents, have demonstrated efficiency in the extraction of latent topics and document-topic representations. However, because NLMs consider word order information, they can be expensive and challenging when capturing long-range semantic dependencies in large documents. Therefore, integrating an NTM and an NLM within a composite modeling framework can complement each other. Specifically, the NLM can leverage the broad document-level semantic knowledge captured by the NTM. This allows the NLM to effectively capture long-range semantic dependencies in large documents, thereby enhancing their performance while minimizing computational costs. Although NTMs are powerful in modeling large text corpora, topic modeling remains challenging, especially in sparse (low-resource) data settings where word co-occurrence data is insufficient. Therefore, prior knowledge transfer in sparse data settings via (1) topical semantics (topic-word distributions) from one or more high-resource data, and (2) distributed word representations from pre-trained models like GloVe, FastText and BERT, with appropriate domain alignment, can alleviate this issue and generate coherent topics along with meaningful document representations. In this dissertation, we have investigated the above-mentioned problem statements, and have consequently developed novel deep neural architectures. These architectures aim to learn meaningful local and global representations, improving fine-grained and coarse-grained text understanding respectively. Moreover, to tackle the challenges posed by drift in data distributions and data privacy concerns in real-world scenarios, we have devised novel frameworks for continual learning and federated learning for topic modeling and text classification tasks respectively. Specifically, we make the following contributions in Chapters 2-7: - Continual Learning: In Chapter 5, we propose a novel neural topic modeling framework to tackle the inadequate word co-occurrence statistics under sparse data settings. Here, we demonstrate that the transfer of relevant knowledge using multi-view embedding spaces (topic-word distributions and word embeddings) from multiple high-resource data sources with domain alignment can enhance the quality of extracted topics and document representations for the sparse target data. These contributions can be viewed as the "knowledge transfer" component of a continual learning framework. Therefore, building on this, in Chapter 3, we propose a continual learning framework for neural topic modeling that can continually accumulate and transfer knowledge over its lifetime while minimizing catastrophic forgetting. Further, we extend the scope in Chapter 6, where we perform continual learning for text classification on several clients with their own private heterogeneous data streams. - Federated Learning: In Chapter 6, we investigate the techniques for privacy-preserving knowledge transfer across clients under heterogeneous data distribution settings. In doing so, we combine the federated and continual learning paradigms to propose a novel framework that minimizes catastrophic forgetting at each client while maximizing knowledge transfer across clients for the text classification task. - Neural Composite Modeling: In Chapter 2, we propose a novel neural composite modeling framework that exposes an NLM to the document-level semantics via NTM. In doing so, we leverage the latent and explainable topics and model the sentence-level topical discourse in a joint learning fashion. Similarly, in Chapter 4, we propose a novel neural composite modeling framework where we expose large language models (LLMs), like BERT, to the broad document-level semantics via NTM to boost the performance via complementary learning and optimize the computational costs of fine-tuning for the text classification task. - Neural Topic Modeling for Sparse Data: Chapter 7 investigates the information retrieval (IR), information extraction (IE) and ranking techniques for fine-grained knowledge extraction in the biomedical domain. Here, we introduce word-level attention in supervised NTM along with word embedding prior to generate meaningful document representations for the IR task and then re-rank the retrieved documents using the BM25 algorithm. In summary, this dissertation demonstrates significant contributions to the improvement of NLP-based systems for various tasks such as topic modeling, language modeling and text classification. In particular, novel algorithms and techniques have been developed in the areas of neural composite modeling, continual learning and federated learning, each of which has achieved state-of-the-art results and overall led to improved machine understanding of textual documents.