Modeling contextual information in neural machine translation

www.lmu.de | UB | Blättern | FAQ

Zur erweiterten Suche

English

Zur erweiterten Suche

Machine translation has provided impressive translation quality for many language pairs. The improvements over the past few years are largely due to the introduction of neural networks to the field, resulting in the modern sequence-to-sequence neural machine translation models. NMT is at the core of many largescale industrial tools for automatic translation such as Google Translate, Microsoft Translator, Amazon Translate and many others. Current NMT models work on the sentence-level, meaning they are used to translate individual sentences. However, for most practical use-cases, a user is interested in translating a document. In these cases, an MT tool splits a document into individual sentences and translates them independently. As a result, any dependencies between the sentences are ignored. This is likely to result in an incoherent document translation, mainly because of inconsistent translation of ambiguous source words or wrong translation of anaphoric pronouns. For example, it is undesirable to translate “bank” as a “financial bank” in one sentence and then later as a “river bank”. Furthermore, the translation of, e.g., the English third person pronoun “it” into German depends on the grammatical gender of the English antecedent’s German translation. NMT has shown that it has impressive modeling capabilities, but is nevertheless unable to model discourse-level phenomena as it needs access to contextual information. In this work, we study discourse-level phenomena in context-aware NMT. To facilitate the particular studies of interest, we propose several models capable of incorporating contextual information into standard sentence-level NMT models. We direct our focus on several discourse phenomena, namely, coreference (anaphora) resolution, coherence and cohesion. We discuss these phenomena in terms of how well can they be modeled by context-aware NMT, how can we improve upon current state-of-the-art as well as the optimal granularity at which these phenomena should be modeled. We further investigate domain as a factor in context-aware NMT. Finally, we investigate existing challenge sets for anaphora resolution evaluation and provide a robust alternative. We make the following contributions: i) We study the importance of coreference (anaphora) resolution and coherence for context-aware NMT by making use of oracle information specific to these phenomena. ii) We propose a method for improving performance on anaphora resolution based on curriculum learning which is inspired by the way humans organize learning. iii) We investigate the use of contextual information for better handling of domain information, in particular in the case of modeling multiple domains at once and when applied to zero-resource domains. iv) We present several context-aware models to enable us to examine the specific phenomena of interest we already mentioned. v) We study the optimal way of modeling local and global context and present a model theoretically capable of using very large document context. vi) We study the robustness of challenge sets for evaluation of anaphora resolution in MT by means of adversarial attacks and provide a template test set that robustly evaluates specific steps of an idealized coreference resolution pipeline for MT.

natural language processing, neural machine translation, discourse, context-aware machine translation

Stojanovski, Dario

30. Jun. 2021

2021

Englisch

Universitätsbibliothek der Ludwig-Maximilians-Universität München

https://nbn-resolving.org/urn:nbn:de:bvb:19-284113

Stojanovski, Dario (2021): Modeling contextual information in neural machine translation. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik

Vorschau

PDF
Stojanovski_Dario.pdf
2MB

DOI: 10.5282/edoc.28411

URN: urn:nbn:de:bvb:19-284113

Abstract

Dokumententyp:	Dissertationen (Dissertation, LMU München)
Keywords:	natural language processing, neural machine translation, discourse, context-aware machine translation
Themengebiete:	000 Allgemeines, Informatik, Informationswissenschaft 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik
Fakultäten:	Fakultät für Mathematik, Informatik und Statistik
Sprache der Hochschulschrift:	Englisch
Datum der mündlichen Prüfung:	30. Juni 2021
1. Berichterstatter:in:	Fraser, Alexander
MD5 Prüfsumme der PDF-Datei:	5db12bf507bee4de6b59575dd6ff9927
ID Code:	28411
Eingestellt am:	25. Oct. 2021 08:35
Letzte Änderungen:	25. Oct. 2021 08:35

Nur für Administratoren und Editoren: Dokument bearbeiten