We are working on extracting process structures from a biology textbook. Our goal is to extract structures that will facilitate answering questions that are not answerable by simple bag-of-words models.
This is the code and data used in our EMNLP 2013 paper:
Learning Biological Processes with Global Constraint by Aju Thalappillil Scaria, Jonathan Berant, Mengqiu Wang, Peter Clark, Justin Lewis, Brittany Harding and Christopher D. Manning.
The dataset contains full annotation of 148 process descriptions split into a training set and a test set. Annotation was performed using the brat annotation tool and so it is easiest to look at the annotation by importing the dataset files to brat.The software is open source, distributed under the BSD 3-clause license. You can download: