SEMPRE: Semantic Parsing with Execution

SEMPRE is a toolkit for training semantic parsers, which map natural language utterances to denotations (answers) via intermediate logical forms. Here's an example for querying databases:

Utterance: Which college did Obama go to?
Logical form: (and (Type University) (Education BarackObama))
Denotation: Occidental College, Columbia University

Here's another example for programming via natural language:

Utterance: Compute three plus four.
Logical form: (call + 3 4)
Denotation: 7

SEMPRE has the following functionality:

It supports many types of logical forms (e.g., lambda calculus, lambda DCS, Java expressions, etc.), so you can choose whichever one suits your task.
It is agnostic to the construction procedure for building logical forms, which include Combinatory Categorical Grammar (CCG) or something more simplistic. You just specify the combination rules in a domain specific language. Here's a toy subset of CCG.
It supports various online learning algorithms that discriminatively train a classifier to maximize denotation accuracy.
It comes with a full copy of Freebase (41M entities, 19K properties, 596M assertions), which has been indexed by Virtuoso SPARQL engine. This allows you to immediately start executing logical forms on Freebase.

You can download all the code and documentation for SEMPRE from GitHub. To learn more about the system, walk through our tutorial.

In our EMNLP 2013 paper, we created a new dataset, WebQuestions, which is released under the CC BY 4.0 license. Here are the train and test splits. You can also see the leader board, upload your predictions, and evaluate your system in this CodaLab worksheet.

In addition, we preprocessed the Free917 dataset (Cai & Yates, 2013) to work with our system. Here are the train and test splits.

Both datasets are provided in JSON format. WebQuestions contains 3,778 training examples and 2,032 test examples. Free917 contains 641 training example and 276 test examples.

On WebQuestions, each example contains three fields:

utterance: natural language utterance.
targetValue: The answer provided by AMT workers, given as a list of descriptions.
url: Frebase page where AMT workers found the answer.

On Free917, each example contains two fields:

utterance: natural language utterance.
targetFormula: Logical form for the utterance (see paper and tutorial).

SEMPRE was used in the papers:

Jonathan Berant, Andrew Chou, Roy Frostig, Percy Liang. Semantic Parsing on Freebase from Question-Answer Pairs. Empirical Methods in Natural Language Processing (EMNLP), 2013.

Jonathan Berant, Percy Liang. Semantic Parsing via Paraphrasing. Association for Computational Linguistics (ACL), 2014.

Yushi Wang, Jonathan Berant, Percy Liang. Building a Semantic Parser Overnight. Association for Computational Linguistics (ACL), 2015.

Panupong Pasupat, Percy Liang. Compositional Semantic Parsing on Semi-Structured Tables. Association for Computational Linguistics (ACL), 2015. [Project Page]

Jonathan Berant, Percy Liang. Imitation Learning of Agenda-based Semantic Parsers. Transactions of ACL (TACL), 2015.

Panupong Pasupat, Percy Liang. Inferring Logical Forms From Denotations. Association for Computational Linguistics (ACL), 2016.

Reginald Long, Panupong Pasupat, Percy Liang. Simpler Context-Dependent Logical Forms via Model Projections. Association for Computational Linguistics (ACL), 2016.

Sida Wang, Percy Liang, Christopher Manning. Learning Language Games through Interaction. Association for Computational Linguistics (ACL), 2016.

Yuchen Zhang, Panupong Pasupat, Percy Liang. Macro Grammars and Holistic Triggering for Efficient Semantic Parsing. Empirical Methods on Natural Language Processing (EMNLP), 2017.

SEMPRE supports lambda DCS logical forms, which is the default one used for querying Freebase:

Percy Liang. Lambda Dependency-Based Compositional Semantics. arXiv report.

Software > Sempre

SEMPRE: Semantic Parsing with Execution