Knowledge Base Population (KBP)


Overview

Knowledge Base Population is the task of taking an incomplete knowledge base (e.g., Freebase, or the structured information in Wikipedia infoboxes), and a large corpus of text (e.g., Wikipedia), and completing the incomplete elements of the knowledge base. That is, the computer has to "read" the text and get information out of it. Stanford has focused on two aspects of this task:

  1. Slotfilling

    In slotfilling, the task is to complete all known information about a given query entity. For instance, given the query "Barack Obama", the system's goal is to collect Barack Obama's birthplace, birthdate, occupation, spouse, etc. This can be thought of as "filling" the Wikipedia infoboxes from having read a large corpus of text from blogs, newswire, and of course Wikipedia itself. A key aspect of this is relation extraction -- the classification of a sentence and two entities in the sentence to a relation of interest. For example, reading Barack Obama was born in Hawaii" and extracting the relation born_in(Barack Obama, Hawaii).

  2. Entity Linking

    Often, entities are ambiguous when described in text. For example, "George Bush" may refer to either George Bush Sr. or George Bush Jr. Or, the acronym ACL may refer to the Association for Computational Linguistics, or the ACL music festival in Austin. Entity linking aims to take these ambiguous mentions, and "link" them with concrete entities in the knowledge base. This is, in a sense, in the same spirit as coreference resolution but (a) spanning multiple documents, and (b) linking mentions to concrete entities (when possible), rather than simply clustering them.

System performance

The KBP workshop papers in the "papers" section describe our official results for each year. Current performance on our KBP development sets can be found at on our results dump page. Note, however, that these may be out of date, and are run on the most recent development version of the code.

Available software

The workhorse of the Stanford KBP system is the Multi-Instance Multi-Label relation extractor, which is available for download.

Papers

Slotfilling

Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D. Manning. 2012. Multi-instance Multi-label Learning for Relation Extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). [pdf, bib]

Mihai Surdeanu, Sonal Gupta, John Bauer, David McClosky, Angel X. Chang, Valentin I. Spitkovsky, and Christopher D. Manning. 2011. Stanford's Distantly-Supervised Slot-Filling System. In Proceedings of the Fourth Text Analysis Conference (TAC 2011). [pdf, bib; data]

Mihai Surdeanu, David McClosky, Julie Tibshirani, John Bauer, Angel X. Chang, Valentin I. Spitkovsky, and Christopher D. Manning. 2010. A Simple Distant Supervision Approach for the TAC-KBP Slot Filling Task. In Proceedings of the Third Text Analysis Conference (TAC 2010). [pdf, bib; slides, data]

Eneko Agirre, Angel X. Chang, Daniel S. Jurafsky, Christopher D. Manning, Valentin I. Spitkovsky, and Eric Yeh. 2009. Stanford-UBC at TAC-KBP. In Proceedings of the Second Text Analysis Conference (TAC 2009). [pdf, bib; slides]

Entity Linking

Angel X. Chang, Valentin I. Spitkovsky, Eneko Agirre, and Christopher D. Manning. 2011. Stanford-UBC Entity Linking at TAC-KBP, Again. In Proceedings of the Fourth Text Analysis Conference (TAC 2011). [pdf, bib]

Angel X. Chang, Valentin I. Spitkovsky, Eric Yeh, Eneko Agirre, and Christopher D. Manning. 2010. Stanford-UBC Entity Linking at TAC-KBP. In Proceedings of the Third Text Analysis Conference (TAC 2010). [pdf, bib; poster]