Developing systems that can execute symbolic, language-like instructions in the physical world is a long-standing challenge for Artificial Intelligence. Previous attempts to replicate human-like grounded language understanding involved hard-coding linguistic and physical principles, which is notoriously laborious and difficult to scale. Here we show that a simple neural-network based agent without any hard-coded knowledge can exploit general-purpose learning algorithms to infer the meaning of sequential symbolic instructions as they pertain to a simulated 3D world.
Beginning with no prior knowledge, the agents learn the meaning of concrete nouns, adjectives, more abstract relational predicates and longer, order-dependent, sequences of symbols. The agent naturally generalises predicates to unfamiliar objects, and can interpret word combinations (phrases) that it has never seen before. Moreover, while its initial learning is slow, the speed at which it acquires new words accelerates as a function of how much it already knows. These observations suggest that the approach may ultimately scale to a wider range of natural language, which may bring us towards machines capable of learning language via interaction with human users in the real world.
Felix is a Research Scientist at Deepmind. He did his PhD at the University of Cambridge with Anna Korhonen, working on unsupervised language and representation learning with neural nets. As well as Anna, he collaborated with (and learned a lot from) Yoshua Bengio, Kyunghyun Cho and Jason Weston. As well as developing computational models that can understand language, he is interested in using models to better understand how people understand language, and is currently doing both at Deepmind.