Self-trained biomedical parsing
Note: If you're looking for our biomedical event extraction software, please see this page instead.
Using the above trees, I repeated the self-training experiments from our ACL 2008 paper using GENIA 1.0 trees as the labeled data. This also allowed me to create a GENIA reranker. The results (on the dev set from my division) are quite dramatic:
|WSJ + WSJ reranker||76.8|
|WSJ + PubMed (parsed by WSJ) + WSJ reranker||80.7 |
|Genia + WSJ reranker||84.5|
|Genia + Genia reranker||85.7|
|Genia + PubMed (parsed by Genia) + Genia reranker||87.6 |
 Original self-trained biomedical parsing model
 Improved self-trained biomedical parsing model (please see my thesis)
Improved self-trained biomedical parsing model
Available here. Please cite my thesis if you use this model:
- David McClosky. 2010. Any Domain Parsing: Automatic Domain Adaptation for Natural Language Parsing. Ph.D. thesis, Department of Computer Science, Brown University. [PDF] [thesis defense slides]
Original self-trained biomedical parsing model
Available here. This is deprecated and only here for historical purposes.
The DATA/ directory is an alternate data directory, trained from WSJ and 266,664 randomly collected biomedical abstracts from PubMed. Using the standard WSJ-trained reranker (included with the BLLIP reranking parser), this model achieves an f-score of 84.3% on the GENIA treebank beta 2 test set. For more details, please see:
- David McClosky and Eugene Charniak. Self-Training for Biomedical Parsing. Proceedings of the Association for Computational Linguistics (ACL 2008, short papers), Columbus, Ohio. [PDF]
More information about self-training can be found in these papers:
- David McClosky, Eugene Charniak, and Mark Johnson. Effective Self-Training for Parsing. Proceedings of the Conference on Human Language Technology and North American chapter of the Association for Computational Linguistics (HLT-NAACL 2006), Brooklyn, New York. [PDF] [slides]
- David McClosky, Eugene Charniak, and Mark Johnson. Reranking and Self-Training for Parser Adaptation. Proceedings of the Association for Computational Linguistics (COLING-ACL 2006), Sydney, Australia. [PDF] [slides]
- David McClosky, Eugene Charniak, and Mark Johnson. When is Self-Training Effective for Parsing? Proceedings of the International Conference on Computational Linguistics (COLING 2008), Manchester, UK. [PDF] [slides]
Make sure you have a new enough release of the BLLIP reranking parser from here or it will not be able to handle the larger vocabulary.