Logo Logo
Hilfe
Kontakt
Switch language to English
Deep knowledge transfer for generalization across tasks and domains under data scarcity. on intersections of anomaly detection, few-shot learning, continual learning, domain generalization and data-free learning
Deep knowledge transfer for generalization across tasks and domains under data scarcity. on intersections of anomaly detection, few-shot learning, continual learning, domain generalization and data-free learning
Over the last decade, deep learning approaches have achieved tremendous performance in a wide variety of fields, e.g., computer vision and natural language understanding, and across several sectors such as healthcare, industrial manufacturing, and driverless mobility. Most deep learning successes were accomplished in learning scenarios fulfilling the two following requirements. First, large amounts of data are available for training the deep learning model and there are no access restrictions to the data. Second, the data used for training and testing is independent and identically distributed (i.i.d.). However, many real-world applications infringe at least one of the aforementioned requirements, which results in challenging learning problems. The present thesis comprises four contributions to address four such learning problems. In each contribution, we propose a novel method and empirically demonstrate its effectiveness for the corresponding problem setting. The first part addresses the underexplored intersection of the few-shot learning and the one-class classification problems. In this learning scenario, the model has to learn a new task using only a few examples from only the majority class, without overfitting to the few examples or to the majority class. This learning scenario is faced in real-world applications of anomaly detection where data is scarce. We propose an episode sampling technique to adapt meta-learning algorithms designed for class-balanced few-shot classification to the addressed few-shot one-class classification problem. This is done by optimizing for a model initialization tailored for the addressed scenario. In addition, we provide theoretical and empirical analyses to investigate the need for second-order derivatives to learn such parameter initializations. Our experiments on 8 image and time-series datasets, including a real-world dataset of industrial sensor readings, demonstrate the effectiveness of our method. The second part tackles the intersection of the continual learning and the anomaly detection problems, which we are the first to explore, to the best of our knowledge. In this learning scenario, the model is exposed to a stream of anomaly detection tasks, i.e., only examples from the normal class are available, that it has to learn sequentially. Such problem settings are encountered in anomaly detection applications where the data distribution continuously changes. We propose a meta-learning approach that learns parameter-specific initializations and learning rates suitable for continual anomaly detection. Our empirical evaluations show that a model trained with our algorithm is able to learn up 100 anomaly detection tasks sequentially with minimal catastrophic forgetting and overfitting to the majority class. In the third part, we address the domain generalization problem, in which a model trained on several source domains is expected to generalize well to data from a previously unseen target domain, without any modification or exposure to its data. This challenging learning scenario is present in applications involving domain shift, e.g., different clinical centers using different MRI scanners or data acquisition protocols. We assume that learning to extract a richer set of features improves the transfer to a wider set of unknown domains. Motivated by this, we propose an algorithm that identifies the already learned features and corrupts them, hence enforcing new feature discovery. We leverage methods from the explainable machine learning literature to identify the features, and apply the targeted corruption on multiple representation levels, including input data and high-level embeddings. Our extensive empirical evaluation shows that our approach outperforms 18 domain generalization algorithms on multiple benchmark datasets. The last part of the thesis addresses the intersection of domain generalization and data-free learning methods, which we are the first to explore, to the best of our knowledge. Hereby, we address the learning scenario where a model robust to domain shift is needed and only models trained on the same task but different domains are available instead of the original datasets. This learning scenario is relevant for any domain generalization application where the access to the data of the source domains is restricted, e.g., due to concerns about data privacy concerns or intellectual property infringement. We develop an approach that extracts and fuses domain-specific knowledge from the available teacher models into a student model robust to domain shift, by generating synthetic cross-domain data. Our empirical evaluation demonstrates the effectiveness of our method which outperforms ensemble and data-free knowledge distillation baselines. Most importantly, the proposed approach substantially reduces the gap between the best data-free baseline and the upper-bound baseline that uses the original private data.
meta-learning, anomaly detection, few-shot learning, continual learning, domain generalization, data-free learning
Frikha, Ahmed
2022
Englisch
Universitätsbibliothek der Ludwig-Maximilians-Universität München
Frikha, Ahmed (2022): Deep knowledge transfer for generalization across tasks and domains under data scarcity: on intersections of anomaly detection, few-shot learning, continual learning, domain generalization and data-free learning. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik
[thumbnail of Frikha_Ahmed.pdf]
Vorschau
PDF
Frikha_Ahmed.pdf

29MB

Abstract

Over the last decade, deep learning approaches have achieved tremendous performance in a wide variety of fields, e.g., computer vision and natural language understanding, and across several sectors such as healthcare, industrial manufacturing, and driverless mobility. Most deep learning successes were accomplished in learning scenarios fulfilling the two following requirements. First, large amounts of data are available for training the deep learning model and there are no access restrictions to the data. Second, the data used for training and testing is independent and identically distributed (i.i.d.). However, many real-world applications infringe at least one of the aforementioned requirements, which results in challenging learning problems. The present thesis comprises four contributions to address four such learning problems. In each contribution, we propose a novel method and empirically demonstrate its effectiveness for the corresponding problem setting. The first part addresses the underexplored intersection of the few-shot learning and the one-class classification problems. In this learning scenario, the model has to learn a new task using only a few examples from only the majority class, without overfitting to the few examples or to the majority class. This learning scenario is faced in real-world applications of anomaly detection where data is scarce. We propose an episode sampling technique to adapt meta-learning algorithms designed for class-balanced few-shot classification to the addressed few-shot one-class classification problem. This is done by optimizing for a model initialization tailored for the addressed scenario. In addition, we provide theoretical and empirical analyses to investigate the need for second-order derivatives to learn such parameter initializations. Our experiments on 8 image and time-series datasets, including a real-world dataset of industrial sensor readings, demonstrate the effectiveness of our method. The second part tackles the intersection of the continual learning and the anomaly detection problems, which we are the first to explore, to the best of our knowledge. In this learning scenario, the model is exposed to a stream of anomaly detection tasks, i.e., only examples from the normal class are available, that it has to learn sequentially. Such problem settings are encountered in anomaly detection applications where the data distribution continuously changes. We propose a meta-learning approach that learns parameter-specific initializations and learning rates suitable for continual anomaly detection. Our empirical evaluations show that a model trained with our algorithm is able to learn up 100 anomaly detection tasks sequentially with minimal catastrophic forgetting and overfitting to the majority class. In the third part, we address the domain generalization problem, in which a model trained on several source domains is expected to generalize well to data from a previously unseen target domain, without any modification or exposure to its data. This challenging learning scenario is present in applications involving domain shift, e.g., different clinical centers using different MRI scanners or data acquisition protocols. We assume that learning to extract a richer set of features improves the transfer to a wider set of unknown domains. Motivated by this, we propose an algorithm that identifies the already learned features and corrupts them, hence enforcing new feature discovery. We leverage methods from the explainable machine learning literature to identify the features, and apply the targeted corruption on multiple representation levels, including input data and high-level embeddings. Our extensive empirical evaluation shows that our approach outperforms 18 domain generalization algorithms on multiple benchmark datasets. The last part of the thesis addresses the intersection of domain generalization and data-free learning methods, which we are the first to explore, to the best of our knowledge. Hereby, we address the learning scenario where a model robust to domain shift is needed and only models trained on the same task but different domains are available instead of the original datasets. This learning scenario is relevant for any domain generalization application where the access to the data of the source domains is restricted, e.g., due to concerns about data privacy concerns or intellectual property infringement. We develop an approach that extracts and fuses domain-specific knowledge from the available teacher models into a student model robust to domain shift, by generating synthetic cross-domain data. Our empirical evaluation demonstrates the effectiveness of our method which outperforms ensemble and data-free knowledge distillation baselines. Most importantly, the proposed approach substantially reduces the gap between the best data-free baseline and the upper-bound baseline that uses the original private data.