Schick, Timo (2022): Few-shot learning with language models: Learning from instructions and contexts. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik |
Vorschau |
PDF
Schick_Timo.pdf 2MB |
Abstract
Pretraining deep neural networks to perform language modeling - that is, to reconstruct missing words from incomplete pieces of text - has brought large improvements throughout natural language processing (NLP). However, even pretrained models typically do not achieve satisfactory performance in few-shot settings, where only a limited number of examples is available. This is an important issue not only because the need to annotate thousands of examples is a barrier to the more widespread application of such models, but also because few-shot learning is clearly a hallmark of human language competence, which should be the ultimate goal of NLP. In this work, we therefore investigate how we can leverage advances in language model pretraining to meet two fundamental few-shot challenges: We develop methods that enable models to solve new tasks and to understand new words from only a handful of examples. For enabling models to solve new tasks, our approach is based on a simple observation: Humans can acquire many new tasks without requiring even a single example if they are provided with instructions. We thus investigate ways to allow pretrained models to also process such instructions. On a wide range of tasks and datasets, we show that this does not only remove the need for annotating thousands of examples, it also enables models to acquire new tasks in a more human-like way: by learning from instructions in addition to examples. We demonstrate that this basic idea has the potential to profoundly change the way we teach NLP models new skills as it can be used in an extremely wide range of applications, including downstream tasks such as text classification and generation, controlling the social behavior of language models and even generating entire datasets from scratch. For enabling models to understand new words, we again take inspiration from how humans approach this task. Unlike common approaches that consider only the words' surface forms, we additionally leverage all contexts in which they occur: We teach pretrained language models to infer high-quality representations for novel words by learning from contexts. We study various approaches for generating word representations using both surface form and contexts that can seamlessly be integrated with existing language models and show how they improve their understanding of both rare and new words.
Dokumententyp: | Dissertationen (Dissertation, LMU München) |
---|---|
Themengebiete: | 000 Allgemeines, Informatik, Informationswissenschaft
000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik |
Fakultäten: | Fakultät für Mathematik, Informatik und Statistik |
Sprache der Hochschulschrift: | Englisch |
Datum der mündlichen Prüfung: | 8. April 2022 |
1. Berichterstatter:in: | Schütze, Hinrich |
MD5 Prüfsumme der PDF-Datei: | e4b54c4411ddb1fc5005b229e3cbd600 |
Signatur der gedruckten Ausgabe: | 0001/UMC 28795 |
ID Code: | 29867 |
Eingestellt am: | 23. May 2022 13:48 |
Letzte Änderungen: | 23. May 2022 13:48 |