In a distantly supervised information extraction system, training texts are labeled automatically (and noisily) by leveraging an existing database of known facts. While this approach has typically been applied to the extraction of binary relations, this project explores the use of distant supervision for template-based event extraction.
This work places emphasis on joint extraction models, where sentence and entity level decisions are made jointly in a unified probablistic framework. In particular we explore Search-based Structured Prediction (Searn) and Conditional Random Fields (CRF).
Our study was conducted on a plane crash knowledge base derived from wikipedia infoboxes. Links to the dataset and presentations of this work are given below.
Plane Crash Dataset:    plane_crash_dataset.zip
Slide Deck:     LREC-2014
For any comments or questions, please e-mail kreschke@cs.stanford.edu.