In Neural Machine Translation (and, more generally, conditional language modeling), the generation of a target token is influenced by two types of context: the source and the prefix of the target sequence. While many attempts to understand the internal workings of NMT models have been made, none of them explicitly evaluates relative source and target contributions to a generation decision. We propose a way to explicitly evaluate these relative source and target contributions to the generation process, and analyse NMT Transformer. When looking at changes in the contributions when conditioning on different types of prefixes, we show that models suffering from exposure bias are more prone to over-relying on target history (and hence to hallucinating) than the ones where the exposure bias is mitigated. Additionally, we analyze changes in the source and target contributions when varying the amount of training data, and during the training process. We find that models trained with more data tend to rely on source information more and to have more sharp token contributions; the training process is non-monotonic with several stages of different nature. If we have time, I’ll also talk about our ongoing work that takes a closer look at the phenomena learned during these training stages.
Elena (Lena) Voita is a PhD student at the University of Edinburgh and the University of Amsterdam supervised by Ivan Titov and Rico Sennrich and supported by the Facebook PhD Fellowship. She is mostly interested in understanding what and how neural models learn; she also worked quite a lot on (mostly document-level) neural machine translation. Previously, Lena spent 4 years at different parts of Yandex; 2.5 of them as a research scientist at Yandex Research side by side with the Yandex Translate team.