Logo Logo
Help
Contact
Switch language to German
Modeling contextual information in neural machine translation
Modeling contextual information in neural machine translation
Machine translation has provided impressive translation quality for many language pairs. The improvements over the past few years are largely due to the introduction of neural networks to the field, resulting in the modern sequence-to-sequence neural machine translation models. NMT is at the core of many largescale industrial tools for automatic translation such as Google Translate, Microsoft Translator, Amazon Translate and many others. Current NMT models work on the sentence-level, meaning they are used to translate individual sentences. However, for most practical use-cases, a user is interested in translating a document. In these cases, an MT tool splits a document into individual sentences and translates them independently. As a result, any dependencies between the sentences are ignored. This is likely to result in an incoherent document translation, mainly because of inconsistent translation of ambiguous source words or wrong translation of anaphoric pronouns. For example, it is undesirable to translate “bank” as a “financial bank” in one sentence and then later as a “river bank”. Furthermore, the translation of, e.g., the English third person pronoun “it” into German depends on the grammatical gender of the English antecedent’s German translation. NMT has shown that it has impressive modeling capabilities, but is nevertheless unable to model discourse-level phenomena as it needs access to contextual information. In this work, we study discourse-level phenomena in context-aware NMT. To facilitate the particular studies of interest, we propose several models capable of incorporating contextual information into standard sentence-level NMT models. We direct our focus on several discourse phenomena, namely, coreference (anaphora) resolution, coherence and cohesion. We discuss these phenomena in terms of how well can they be modeled by context-aware NMT, how can we improve upon current state-of-the-art as well as the optimal granularity at which these phenomena should be modeled. We further investigate domain as a factor in context-aware NMT. Finally, we investigate existing challenge sets for anaphora resolution evaluation and provide a robust alternative. We make the following contributions: i) We study the importance of coreference (anaphora) resolution and coherence for context-aware NMT by making use of oracle information specific to these phenomena. ii) We propose a method for improving performance on anaphora resolution based on curriculum learning which is inspired by the way humans organize learning. iii) We investigate the use of contextual information for better handling of domain information, in particular in the case of modeling multiple domains at once and when applied to zero-resource domains. iv) We present several context-aware models to enable us to examine the specific phenomena of interest we already mentioned. v) We study the optimal way of modeling local and global context and present a model theoretically capable of using very large document context. vi) We study the robustness of challenge sets for evaluation of anaphora resolution in MT by means of adversarial attacks and provide a template test set that robustly evaluates specific steps of an idealized coreference resolution pipeline for MT.
natural language processing, neural machine translation, discourse, context-aware machine translation
Stojanovski, Dario
2021
English
Universitätsbibliothek der Ludwig-Maximilians-Universität München
Stojanovski, Dario (2021): Modeling contextual information in neural machine translation. Dissertation, LMU München: Faculty of Mathematics, Computer Science and Statistics
[img]
Preview
PDF
Stojanovski_Dario.pdf

2MB

Abstract

Machine translation has provided impressive translation quality for many language pairs. The improvements over the past few years are largely due to the introduction of neural networks to the field, resulting in the modern sequence-to-sequence neural machine translation models. NMT is at the core of many largescale industrial tools for automatic translation such as Google Translate, Microsoft Translator, Amazon Translate and many others. Current NMT models work on the sentence-level, meaning they are used to translate individual sentences. However, for most practical use-cases, a user is interested in translating a document. In these cases, an MT tool splits a document into individual sentences and translates them independently. As a result, any dependencies between the sentences are ignored. This is likely to result in an incoherent document translation, mainly because of inconsistent translation of ambiguous source words or wrong translation of anaphoric pronouns. For example, it is undesirable to translate “bank” as a “financial bank” in one sentence and then later as a “river bank”. Furthermore, the translation of, e.g., the English third person pronoun “it” into German depends on the grammatical gender of the English antecedent’s German translation. NMT has shown that it has impressive modeling capabilities, but is nevertheless unable to model discourse-level phenomena as it needs access to contextual information. In this work, we study discourse-level phenomena in context-aware NMT. To facilitate the particular studies of interest, we propose several models capable of incorporating contextual information into standard sentence-level NMT models. We direct our focus on several discourse phenomena, namely, coreference (anaphora) resolution, coherence and cohesion. We discuss these phenomena in terms of how well can they be modeled by context-aware NMT, how can we improve upon current state-of-the-art as well as the optimal granularity at which these phenomena should be modeled. We further investigate domain as a factor in context-aware NMT. Finally, we investigate existing challenge sets for anaphora resolution evaluation and provide a robust alternative. We make the following contributions: i) We study the importance of coreference (anaphora) resolution and coherence for context-aware NMT by making use of oracle information specific to these phenomena. ii) We propose a method for improving performance on anaphora resolution based on curriculum learning which is inspired by the way humans organize learning. iii) We investigate the use of contextual information for better handling of domain information, in particular in the case of modeling multiple domains at once and when applied to zero-resource domains. iv) We present several context-aware models to enable us to examine the specific phenomena of interest we already mentioned. v) We study the optimal way of modeling local and global context and present a model theoretically capable of using very large document context. vi) We study the robustness of challenge sets for evaluation of anaphora resolution in MT by means of adversarial attacks and provide a template test set that robustly evaluates specific steps of an idealized coreference resolution pipeline for MT.