Marti
Hearst Towards Semi-Supervised Algorithms for Semantic Relation Detection in BioScience TextAbstract
A crucial step toward the goal of automatic extraction of propositional
information from natural language text is the identification of semantic
relations between constituents in sentences. In the bioscience text
domain, we
have developed a simple ontology-based algorithm for determining which
semantic
relation holds between terms in noun compounds, and a supervised learning
algorithm for discovering relations between entities. In this talk, I
will
first briefly describe these results.
A major bottleneck for semantic labeling work is the development of
labeled
training data. To remedy this, we propose a new approach for creating
semantically-labeled data that makes use of what we call *citances*: the
text of
the sentences surrounding citations to research articles. Citances
provide us
with differently-worded statements of approximately the same semantic
information; by looking at the way that different authors talk about the
same
facts, we obtain paraphrases nearly for free. We have just begun to
assess how
well citances work for the creation of labeled training data for the
problem of
detecting protein-protein interaction relations. We also hypothesize
that citances
will be useful for synonym creation, document summarization, and database
curation.
|