Logo Logo
Hilfe
Kontakt
Switch language to English
Few-shot learning with language models: Learning from instructions and contexts
Few-shot learning with language models: Learning from instructions and contexts
Pretraining deep neural networks to perform language modeling - that is, to reconstruct missing words from incomplete pieces of text - has brought large improvements throughout natural language processing (NLP). However, even pretrained models typically do not achieve satisfactory performance in few-shot settings, where only a limited number of examples is available. This is an important issue not only because the need to annotate thousands of examples is a barrier to the more widespread application of such models, but also because few-shot learning is clearly a hallmark of human language competence, which should be the ultimate goal of NLP. In this work, we therefore investigate how we can leverage advances in language model pretraining to meet two fundamental few-shot challenges: We develop methods that enable models to solve new tasks and to understand new words from only a handful of examples. For enabling models to solve new tasks, our approach is based on a simple observation: Humans can acquire many new tasks without requiring even a single example if they are provided with instructions. We thus investigate ways to allow pretrained models to also process such instructions. On a wide range of tasks and datasets, we show that this does not only remove the need for annotating thousands of examples, it also enables models to acquire new tasks in a more human-like way: by learning from instructions in addition to examples. We demonstrate that this basic idea has the potential to profoundly change the way we teach NLP models new skills as it can be used in an extremely wide range of applications, including downstream tasks such as text classification and generation, controlling the social behavior of language models and even generating entire datasets from scratch. For enabling models to understand new words, we again take inspiration from how humans approach this task. Unlike common approaches that consider only the words' surface forms, we additionally leverage all contexts in which they occur: We teach pretrained language models to infer high-quality representations for novel words by learning from contexts. We study various approaches for generating word representations using both surface form and contexts that can seamlessly be integrated with existing language models and show how they improve their understanding of both rare and new words.
Not available
Schick, Timo
2022
Englisch
Universitätsbibliothek der Ludwig-Maximilians-Universität München
Schick, Timo (2022): Few-shot learning with language models: Learning from instructions and contexts. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik
[thumbnail of Schick_Timo.pdf]
Vorschau
PDF
Schick_Timo.pdf

2MB

Abstract

Pretraining deep neural networks to perform language modeling - that is, to reconstruct missing words from incomplete pieces of text - has brought large improvements throughout natural language processing (NLP). However, even pretrained models typically do not achieve satisfactory performance in few-shot settings, where only a limited number of examples is available. This is an important issue not only because the need to annotate thousands of examples is a barrier to the more widespread application of such models, but also because few-shot learning is clearly a hallmark of human language competence, which should be the ultimate goal of NLP. In this work, we therefore investigate how we can leverage advances in language model pretraining to meet two fundamental few-shot challenges: We develop methods that enable models to solve new tasks and to understand new words from only a handful of examples. For enabling models to solve new tasks, our approach is based on a simple observation: Humans can acquire many new tasks without requiring even a single example if they are provided with instructions. We thus investigate ways to allow pretrained models to also process such instructions. On a wide range of tasks and datasets, we show that this does not only remove the need for annotating thousands of examples, it also enables models to acquire new tasks in a more human-like way: by learning from instructions in addition to examples. We demonstrate that this basic idea has the potential to profoundly change the way we teach NLP models new skills as it can be used in an extremely wide range of applications, including downstream tasks such as text classification and generation, controlling the social behavior of language models and even generating entire datasets from scratch. For enabling models to understand new words, we again take inspiration from how humans approach this task. Unlike common approaches that consider only the words' surface forms, we additionally leverage all contexts in which they occur: We teach pretrained language models to infer high-quality representations for novel words by learning from contexts. We study various approaches for generating word representations using both surface form and contexts that can seamlessly be integrated with existing language models and show how they improve their understanding of both rare and new words.