Stanford NLP Named Entity Recognition Results

Corpus # Word Tokens # Entities # Features Exact Match Score (conlleval) Technology Notes
Name Language Train Test Types Instances Φ(X) λ/f(X,Y) Prec Rec F1 Classifier Properties file/flag
CoNLL 2002 Dutch news testa (devset) 218737 37761 4 2616 838524 4192620 78.99% 77.33% 78.15% pure CMM -goodCoNLL 1, 3, 5, 7
CoNLL 2002 Dutch news testb 218737 68994 4 3941 838559 4192795 80.48% 78.96% 79.71% pure CMM -goodCoNLL 1, 3, 5, 7
CoNLL 2002 Spanish news testa (devset) 273037 52923 4 4352 776511 3882555 78.01% 76.19% 77.09% pure CMM -goodCoNLL 1, 3, 5, 7
CoNLL 2002 Spanish news testb 273037 51533 4 3559 776444 3882220 81.24% 81.03% 81.14% pure CMM -goodCoNLL 1, 3, 5, 7
CoNLL 2003 English news testa (devset) 219553 51578 4 5942 738378 3691890 91.37% 91.22% 91.29% pure CMM -goodCoNLL 1, 5, b
CoNLL 2003 English news testa (devset) 219554 51578 4 5942     92.15% 92.39% 92.27% postprocessed CMM   1, 2, 4
CoNLL 2003 English news testb 219553 46666 4 5648 738378 3691890 85.65% 85.41% 85.53% pure CMM -goodCoNLL 1, 5, b
CoNLL 2003 English news testb 219554 46666 4 5648     86.12% 86.49% 86.31% postprocessed CMM   1, 2, 4
CoNLL 2003 German news testa (devset) 220189 51645 4 4833 1079044 5395220 77.12% 61.37% 68.35% pure CMM -goodCoNLL 1, 3, 5, 6, 7, a
CoNLL 2003 German news testa (devset) 220189 51645 4 4833     75.36% 60.36% 67.03% postprocessed CMM   1, 2, 3, 4
CoNLL 2003 German news testb 220189 52098 4 3673 1079037 5395185 79.23% 63.65% 70.59% pure CMM -goodCoNLL 1, 3, 5, 6, 7, a
CoNLL 2003 German news testb 220189 52098 4 3673     80.38% 65.04% 71.90% postprocessed CMM   1, 2, 3, 4
CoNLL 2003 English news testa (devset) 219553 51578 4 5942 616918 11532202 91.64% 90.93% 91.28% CRF (closed task) conll.crf.chris2009.prop iob2 1, 5, c
CoNLL 2003 English news testa (devset) 219553 51578 4 5942 633786 12285708 93.28% 92.71% 92.99% CRF (with distsim) conll.crf.chris2009.prop iob2 distsim 1, 5, c
CoNLL 2003 English news testb 219553 46666 4 5648 633786 12285708 88.21% 87.68% 87.94% CRF (with distsim) conll.crf.chris2009.prop iob2 distsim 1, 5, c

Other results that should be on this page

BioCreative, JNLPBA, MUC, all3.

Notes

1. Test token counts exclude boundaries (blank lines) but they are included in the sequence model used.

2. Postprocessing Perl scripts improved handling of names and datelines. See Klein et al. 2003 CoNLL paper for features used.

3. This model has not been separately optimized on a per-dataset or even per-language basis. The model just uses the feature set that had been found to be effective for English.

4. Official score of system submitted for listed competition.

5. Score of current in-house version.

6. This model adds current, previous, and next word lemma features to the German model (word lemmas are present in the provided CoNLL data but we did not use it at the time of the official competition run).

7. Feature counts differ slightly even with the same training data because the unknown word model does a tiny amount of transductive learning: unknown word features include whether a capitalized word has also been seen all lowercase, and the test set data is included in the dictionary for this purpose.

a. Date: 2005/09/14.

b. Date: 2006/08/28.

c. Date: 2009/07/15.