Named Entity Recognition (NER) and Information Extraction (IE)
We have worked on a wide range of NER and IE related tasks over the past several years. We entered the 2003 CoNLL NER shared task, using a Character-based Maximum Entropy Markov Model (MEMM). In late 2003 we entered the BioCreative shared task, which aimed at doing NER in the domain of Biomedical papers. This task required identifying genes and proteins, but not distinguishing between the two. We used a similar model as for the CoNLL shared task, but more tuned to the domain and with some additional features; we had the best performing system. Then, in 2004, we entered the BioNLP shared task at CoLing which also looked at Biomedical papers, but required identifying five different classes - DNA, RNA, cell line, cell type, and protein. We once again used an MEMM, but added much richer features, including features from parse trees, the web, and how entities where labeled elsewhere on a previous run. We also entered the PASCAL IE shared task, which involved extracting information from workshop announcements. We attempted to use a relational model in addition to the MEMM to allow the use of top-down information. We have also studied the use of Gibbs sampling for inference in a Conditional Random Field (CRF), so as to incorporate longer distance information. There has also been work on adapting sequence classifiers to new, unseen domains.
Details of our CMM and CRF systems'
performance on CoNLL 2002 and 2003 NER data are available.
You can download our CRF-based NER system.
Jenny Rose Finkel, Trond Grenager, and Christopher
Manning. 2005. Incorporating Non-local Information into Information
Extraction Systems by Gibbs Sampling. Proceedings of the 43nd Annual
Meeting of the Association for Computational Linguistics (ACL 2005),
pp. 363-370. |
[pdf]
|
Shipra Dingare, Malvina Nissim, Jenny Finkel, Claire Grover, and
Christopher D. Manning. 2004. A System For Identifying Named Entities in
Biomedical Text: How Results From Two Evaluations Reflect on Both
the System and the Evaluations. Comparative and Functional
Genomics 6:77-85. |
[ps] |
[pdf] |
Shipra Dingare, Jenny Finkel, Malvina Nissim, Christopher Manning,
and Claire Grover. 2004. A System For Identifying Named Entities in
Biomedical Text: How Results From Two Evaluations Reflect on Both the
System and the Evaluations. In The 2004 BioLink meeting: Linking
Literature, Information and Knowledge for Biology at ISMB 2004. |
[ps] |
[pdf] |
Jenny Finkel, Shipra Dingare, Huy Nguyen, Malvina Nissim, Christopher
Manning, and Gail Sinclair. 2004. Exploiting Context for Biomedical
Entity Recognition: From Syntax to the Web. Joint Workshop on Natural
Language Processing in Biomedicine and its Applications at Coling 2004.
|
[ps] |
[pdf] |
Jenny Finkel, Shipra Dingare, Christopher Manning, Malvina Nissim,
Beatrice Alex, and Claire Grover. in press.
Exploring the Boundaries: Gene and Protein Identification in
Biomedical Text. Accepted for publication in BMC Bioinformatics.
|
[ps] |
[pdf] |
Shipra Dingare, Jenny Finkel, Christopher Manning, Malvina Nissim,
and Beatrice Alex. 2004. Exploring the Boundaries: Gene and
Protein Identification in Biomedical Text.
Proceedings of the BioCreative Workshop, Granada.
|
[ps] |
[pdf] |
|