Human language is contextual. In addition to understanding the surface meaning of words, a successful language understanding system should also interpret sentences relative to the environment and previous sentences.
The task in the SCONE dataset is to execute a sequence of actions according to the instructions. Each scenario contains a world with several objects (e.g., beakers), each with different properties (e.g., chemical colors and amounts). Given 5 sequential instructions in human language (e.g., "Pour from the first beaker into the yellow beaker" or "Mix it"), the system has to predict the final world state.
The SCONE dataset contains 3 domains, each featuring a context-dependent linguistic phenomenon.
Note: The examples here are mixed and matched to showcase different linguistic phenomena. They are not actual training examples.
The world contains 7 beakers. Each beaker may contain up to 4 units of colored chemical. The chemical can be poured into another beaker, drained away, or mixed.
Main phenomenon: ellipsis (omitting words)
The world contains tangram pieces with different shapes. The pieces can be swapped, removed, or brought back.
Main phenomenon: action coreference
The world is a scene containing people. Each person wears a colored shirt and optionally a colored hat. People can enter the scene, leave the scene, move around, or exchange hats.
Main phenomenon: object coreference
The dataset does not include semantic annotations (programs or logical forms) of the sentences; only the world states are given. As such, the training data is ambiguous. Say that a sentence "drain X beaker" makes chemical disappear from the second beaker.
Maybe the unknown word X means "second", but it could also be describing the color of the beaker. It could even be a pronoun referencing a previously used beaker. A system has to learn to associate words like X above with the correct meaning instead of overfitting the spurious interpretations.
Main paper: (Long et al., ACL 2016) Simpler Context-Dependent Logical Forms via Model Projections [CodaLab]
(Guu et al., ACL 2017) From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood [CodaLab]