Few-shot learning with language models: Learning from instructions and contexts

www.lmu.de | UB | Blättern | FAQ

Zur erweiterten Suche

English

Zur erweiterten Suche

Pretraining deep neural networks to perform language modeling - that is, to reconstruct missing words from incomplete pieces of text - has brought large improvements throughout natural language processing (NLP). However, even pretrained models typically do not achieve satisfactory performance in few-shot settings, where only a limited number of examples is available. This is an important issue not only because the need to annotate thousands of examples is a barrier to the more widespread application of such models, but also because few-shot learning is clearly a hallmark of human language competence, which should be the ultimate goal of NLP. In this work, we therefore investigate how we can leverage advances in language model pretraining to meet two fundamental few-shot challenges: We develop methods that enable models to solve new tasks and to understand new words from only a handful of examples. For enabling models to solve new tasks, our approach is based on a simple observation: Humans can acquire many new tasks without requiring even a single example if they are provided with instructions. We thus investigate ways to allow pretrained models to also process such instructions. On a wide range of tasks and datasets, we show that this does not only remove the need for annotating thousands of examples, it also enables models to acquire new tasks in a more human-like way: by learning from instructions in addition to examples. We demonstrate that this basic idea has the potential to profoundly change the way we teach NLP models new skills as it can be used in an extremely wide range of applications, including downstream tasks such as text classification and generation, controlling the social behavior of language models and even generating entire datasets from scratch. For enabling models to understand new words, we again take inspiration from how humans approach this task. Unlike common approaches that consider only the words' surface forms, we additionally leverage all contexts in which they occur: We teach pretrained language models to infer high-quality representations for novel words by learning from contexts. We study various approaches for generating word representations using both surface form and contexts that can seamlessly be integrated with existing language models and show how they improve their understanding of both rare and new words.

Not available

Schick, Timo

08. Apr. 2022

2022

Englisch

Universitätsbibliothek der Ludwig-Maximilians-Universität München

https://nbn-resolving.org/urn:nbn:de:bvb:19-298671

Schick, Timo (2022): Few-shot learning with language models: Learning from instructions and contexts. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik

Vorschau

PDF
Schick_Timo.pdf
2MB

DOI: 10.5282/edoc.29867

URN: urn:nbn:de:bvb:19-298671

Abstract

Dokumententyp:	Dissertationen (Dissertation, LMU München)
Themengebiete:	000 Allgemeines, Informatik, Informationswissenschaft 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik
Fakultäten:	Fakultät für Mathematik, Informatik und Statistik
Sprache der Hochschulschrift:	Englisch
Datum der mündlichen Prüfung:	8. April 2022
1. Berichterstatter:in:	Schütze, Hinrich
MD5 Prüfsumme der PDF-Datei:	e4b54c4411ddb1fc5005b229e3cbd600
Signatur der gedruckten Ausgabe:	0001/UMC 28795
ID Code:	29867
Eingestellt am:	23. May 2022 13:48
Letzte Änderungen:	23. May 2022 13:48

Nur für Administratoren und Editoren: Dokument bearbeiten