The Stanford Natural Language Processing Group

Stanford Deterministic Coreference Resolution System

News

December 9, 2015: The deterministic coreference resolution system is still supported in StanfordCoreNLP by using the annotator dcoref. However, we have now added new, better performing statistical and neural coreference systems written by Kevin Clark, which are invoked by default or explicitly using the annotator coref. See the CorefAnnotator documentation.

May 7, 2013: Recent improvements to the Stanford Deterministic Coreference Resolution System (Recasens et al., below) won the best short paper award at NAACL 2013.

June 30, 2011: This system was the top ranked system at the CoNLL-2011 shared task.

About

This system implements the multi-pass sieve coreference resolution (or anaphora resolution) system described in Lee et al. (CoNLL Shared Task 2011) and Raghunathan et al. (EMNLP 2010).

The score obtained is higher than that in EMNLP 2010 paper because of additional sieves and better rules (see Lee et al. 2011 for details). Mention detection is included in the package (see Usage for instructions).

The Computational Linguistics paper includes more details and additional experimental results.

The papers to cite for this system are as follows:

Marta Recasens, Marie-Catherine de Marneffe, and Christopher Potts.
The Life and Death of Discourse Entities: Identifying Singleton Mentions.
In Proceedings of NAACL 2013.

Heeyoung Lee, Angel Chang, Yves Peirsman, Nathanael Chambers, Mihai Surdeanu and Dan Jurafsky.
Deterministic coreference resolution based on entity-centric, precision-ranked rules.
Computational Linguistics 39(4), 2013.

Heeyoung Lee, Yves Peirsman, Angel Chang, Nathanael Chambers, Mihai Surdeanu, Dan Jurafsky.
Stanford's Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task.
In Proceedings of the CoNLL-2011 Shared Task, 2011.

Karthik Raghunathan, Heeyoung Lee, Sudarshan Rangarajan, Nathanael Chambers, Mihai Surdeanu, Dan Jurafsky, Christopher Manning
A Multi-Pass Sieve for Coreference Resolution
EMNLP-2010, Boston, USA. 2010.

Current Evaluation Results

The scores of the dcoref code in v3.6.0 (CoNLL 2011 shared task winner descendant) on the CoNLL 2011 Shared Task dev data set, measured on 2016/02/07 using the v4 scorer (used for the 2011 evaluation).

-----------------------------------------------------------------------------------------------------------------------------------------
                            MUC               B cubed              CEAF (M)            CEAF (E)            BLANC        | 
                       P     R     F1      P     R     F1      P     R     F1      P     R     F1      P     R     F1   | Avg F1
-----------------------------------------------------------------------------------------------------------------------------------------
conllst2011 dev   |   62.1  59.3  60.7  | 74.2  67.7  70.8  | 59.4  59.4  59.4  | 46.1  48.9  47.5  | 79.6  72.4  75.4  |  59.56 
-----------------------------------------------------------------------------------------------------------------------------------------
* Automatic mention detection used. Avg F1 = (MUC + B cubed + CEAFE)/3.

The scores of the dcoref code in v3.6.0 (CoNLL 2011 shared task winner descendant) on the CoNLL 2011/2012 Shared Task dev data sets, measured on 2016/02/07 using the v8.01 scorer (current in 2016).

-----------------------------------------------------------------------------------------------------------------------------------------
                            MUC               B cubed              CEAF (M)            CEAF (E)            BLANC        | 
                       P     R     F1      P     R     F1      P     R     F1      P     R     F1      P     R     F1   | Avg F1
-----------------------------------------------------------------------------------------------------------------------------------------
conllst2011 dev   |   62.1  59.3  60.7  | 56.2  48.6  52.1  | 58.0  57.5  57.8  | 48.9  53.5  51.1  | 54.1  47.2  50.1  |  54.62
conllst2012 dev   |   65.9  64.1  65.0  | 58.7  50.9  54.5  | 59.2  59.6  59.4  | 48.6  54.3  51.3  | 59.5  53.7  56.1  |  56.92
-----------------------------------------------------------------------------------------------------------------------------------------
* Automatic mention detection used. Avg F1 = (MUC + B cubed + CEAFE)/3.

Download

The coreference resolution system is integrated in the Stanford suite of NLP tools, StanfordCoreNLP. Please download the entire suite from this page.

Usage

Running coreference resolution on raw text

This software is now fully incorporated in StanfordCoreNLP, so all you have to do is add the dcoref annotator to the "annotators" property in StanfordCoreNLP. For example, add "dcoref" to the end of the list of text annotators:

annotators = tokenize, ssplit, pos, lemma, ner, parse, dcoref

The properties you can set for the dcoref system itself are the following:

dcoref.demonym                   // The path for a file that includes a list of demonyms 
dcoref.animate                   // The list of animate/inanimate mentions (Ji and Lin, 2009)
dcoref.inanimate 
dcoref.male                      // The list of male/neutral/female mentions (Bergsma and Lin, 2006) 
dcoref.neutral                   // Neutral means a mention that is usually referred by 'it'
dcoref.female 
dcoref.plural                    // The list of plural/singular mentions (Bergsma and Lin, 2006)
dcoref.singular

// above 8 options do not have to be set; default models in StanfordCoreNLP package will be used if unspecified.

dcoref.score = false             // Scoring the output of the system
dcoref.postprocessing = false    // Do post processing
dcoref.maxdist = -1              // Maximum sentence distance between two mentions for resolution (-1: no constraint on the distance)
dcoref.use.big.gender.number = false // Load a big list of gender and number information
dcoref.replicate.conll = false   // Turn on this for replicating conllst result

// if above 5 options are omitted, default values (as shown in above) are used.

sievePasses                      // Sieve passes - each class is defined in dcoref/sievepasses/
                                 // If omitted, the default sieves will be used (recommended).

See StanfordCoreNLP for more details.

How to replicate the results in our CoNLL Shared Task 2011 paper

To replicate the results in the paper run:

java -cp <jars_in_corenlp> -Xmx8g edu.stanford.nlp.dcoref.SieveCoreferenceSystem -props <properties file>

A sample properties file (coref.properties) is included in the dcoref package. The properties file includes the following:

# annotators needed for coreference resolution
annotators = pos, lemma, ner, parse    

# Scoring the output of the system. 
# Scores in log file are different from the output of CoNLL scorer because it is before post processing.
dcoref.score = true                    

                                       
# Do post processing
dcoref.postprocessing = true           
# Maximum sentence distance between two mentions for resolution (-1: no constraint on the distance)
dcoref.maxdist = -1                    
# Load a big list of gender and number information
dcoref.use.big.gender.number = true    
# Older CoreNLP versions loaded huge text file; newer versions load serialized map
# dcoref.big.gender.number = edu/stanford/nlp/models/dcoref/gender.data.gz
dcoref.big.gender.number = edu/stanford/nlp/models/dcoref/gender.map.ser.gz

# Turn on this for replicating conllst result
dcoref.replicate.conll = true          
# Path for the official CoNLL 2011 scorer script. if omitted, no scoring
dcoref.conll.scorer = /PATH/FOR/SCORER  

# Path for log file for coref system evaluation 
dcoref.logFile = /PATH/FOR/LOGS

# for scoring on other corpora, one of following options can be set 
# dcoref.conll2011: path for the directory containing conllst files
# dcoref.ace2004: path for the directory containing ACE2004 files
# dcoref.mucfile: path for the MUC file
dcoref.conll2011 = /PATH/FOR/CORPUS

This system can process ACE2004, MUC6, and CoNLL Shared Task 2011 corpora in their original formats. Examples from the corpora are given here:

CoNLLst 2011:

nw/wsj/00/wsj_0020          0          0        The         DT (TOP_(S_(NP_*          -          -          -          -          *          *     (ARG0*          *          *          *        (11
nw/wsj/00/wsj_0020          0          1       U.S.        NNP         *)          -          -          -          -      (GPE)          *         *)          *          *          *        11)     
nw/wsj/00/wsj_0020          0          2          ,          ,          *          -          -          -          -          *          *          *          *          *          *          -
nw/wsj/00/wsj_0020          0          3   claiming        VBG   (S_(VP_*      claim         01          2          -          *       (V*) (ARGM-ADV*          *          *          *          -

MUC6:

...
<s> By/IN proposing/VBG <COREF ID="13" TYPE="IDENT" REF="6" MIN="date"> a/DT meeting/NN date/NN</COREF> ,/, <COREF ID="14" TYPE="IDENT" REF="0">
<ORGANIZATION> Eastern/NNP</ORGANIZATION></COREF> moved/VBD one/CD step/NN closer/JJR toward/IN reopening/VBG current/JJ high-cost/JJ contract/NN agreements/NNS with/IN <COREF ID="15" TYPE="IDENT" REF="8" MIN="unions"><COREF ID="16" TYPE="IDENT" REF="14"> its/PRP$</COREF> unions/NNS</COREF> ./. </s>
...

ACE2004:

...
<document DOCID="20001115_AFP_ARB.0212.eng">
<entity ID="20001115_AFP_ARB.0212.eng-E1" TYPE="ORG" SUBTYPE="Educational" CLASS="SPC">
  <entity_mention ID="1-47" TYPE="NAM" LDCTYPE="NAM">
      <extent>
            <charseq START="475" END="506">the Globalization Studies Center</charseq>
                </extent>
                    <head>
                          <charseq START="479" END="506">Globalization Studies Center</charseq>
                              </head>
                                </entity_mention>
                                ...

If you have issues getting this to work, you may need to follow a few steps:

Use the latest version of the evaluation software
There are some document naming inconsistencies between the test SET and the test KEY. The following should help. In the /tc/ part of the data, run
```
sed -i s/ch_0001/ch_0009/g res
sed -i s/ch_0002/ch_0019/g res
sed -i s/ch_0003/ch_0029/g res
sed -i s/ch_0004/ch_0039/g res
sed -i s/ch_0005/ch_0049/g res
```
e.g. ch_0005 from the test set is named ch_0049 in the test key.
We used v4 of the coref scorer to get the numbers cited in the paper.

How to run Chinese Coreference

Since CoreNLP version 3.5.2, we added support for Chinese coreference.

Running on raw text:

String text = ...;
String[] args = new String[]{
  "-props", "edu/stanford/nlp/hcoref/properties/zh-dcoref-default.properties"
};

Annotation document = new Annotation(text);
Properties props = StringUtils.argsToProperties(args);
StanfordCoreNLP corenlp = new StanfordCoreNLP(props);
corenlp.annotate(document);
HybridCorefAnnotator hcoref = new HybridCorefAnnotator(props);
hcoref.annotate(document);
Map corefChain = document.get(CorefChainAnnotation.class); 
System.out.println(corefChain);

Running on conll data:


// Note that you have to replace the following properties file with your own.
// To do so, copy the following file, replace the # Evaluation section with
// your own paths and refer to it in args.
String[] args = new String[]{
  "-props", "edu/stanford/nlp/hcoref/properties/zh-dcoref-conll.properties"
}
edu.stanford.nlp.hcoref.CorefSystem.main(args);

Questions

Questions, feedback, and bug reports/fixes can be sent to our mailing lists.

Mailing Lists

We have 3 mailing lists for the Stanford Coreference Resolution System, all of which are shared with other JavaNLP tools (with the exclusion of the parser). Each address is at @lists.stanford.edu:

java-nlp-user This is the best list to post to in order to ask questions, make announcements, or for discussion among JavaNLP users. You have to subscribe to be able to use it. Join the list via this webpage or by emailing java-nlp-user-join@lists.stanford.edu. (Leave the subject and message body empty.) You can also look at the list archives.
java-nlp-announce This list will be used only to announce new versions of Stanford JavaNLP tools. So it will be very low volume (expect 1-3 messages a year). Join the list via this webpage or by emailing java-nlp-announce-join@lists.stanford.edu. (Leave the subject and message body empty.)
java-nlp-support This list goes only to the software maintainers. It's a good address for licensing questions, etc. For general use and support questions, you're better off joining and using java-nlp-user. You cannot join java-nlp-support, but you can mail questions to java-nlp-support@lists.stanford.edu.

Release History

Version 3.6.0 - February 7, 2016

-----------------------------------------------------------------------------------------------------------------------------------------
                            MUC               B cubed              CEAF (M)            CEAF (E)            BLANC        | 
                       P     R     F1      P     R     F1      P     R     F1      P     R     F1      P     R     F1   | Avg F1
-----------------------------------------------------------------------------------------------------------------------------------------
conllst2011 dev   |   62.1  59.3  60.7  | 74.2  67.7  70.8  | 59.4  59.4  59.4  | 46.1  48.9  47.5  | 79.6  72.4  75.4  |  59.56 
-----------------------------------------------------------------------------------------------------------------------------------------
* Automatic mention detection used. Avg F1 = (MUC + B cubed + CEAFE)/3.

The scores of the dcoref code in v3.6.0 (CoNLL 2011 shared task winner descendant) on the CoNLL 2011/2012 Shared Task dev data sets, measured on 2016/02/07 using the v8.01 scorer (current in 2016).

-----------------------------------------------------------------------------------------------------------------------------------------
                            MUC               B cubed              CEAF (M)            CEAF (E)            BLANC        | 
                       P     R     F1      P     R     F1      P     R     F1      P     R     F1      P     R     F1   | Avg F1
-----------------------------------------------------------------------------------------------------------------------------------------
conllst2011 dev   |   62.1  59.3  60.7  | 56.2  48.6  52.1  | 58.0  57.5  57.8  | 48.9  53.5  51.1  | 54.1  47.2  50.1  |  54.62
conllst2012 dev   |   65.9  64.1  65.0  | 58.7  50.9  54.5  | 59.2  59.6  59.4  | 48.6  54.3  51.3  | 59.5  53.7  56.1  |  56.92
-----------------------------------------------------------------------------------------------------------------------------------------
* Automatic mention detection used. Avg F1 = (MUC + B cubed + CEAFE)/3.

July 9, 2013

Single mention detection (Recasens et al. 2013) is integrated. The score may differ due to the change in Parser or NER.

-----------------------------------------------------------------------------------------------------------------------------------------
                            MUC               B cubed              CEAF (M)            CEAF (E)            BLANC        | 
                       P     R     F1      P     R     F1      P     R     F1      P     R     F1      P     R     F1   | Avg F1
-----------------------------------------------------------------------------------------------------------------------------------------
conllst2011 dev   |   62.4  59.3  60.8  | 74.2  67.6  70.8  | 59.3  59.3  59.3  | 45.5  48.6  47.0  | 79.1  72.5  75.3  |  59.5  
-----------------------------------------------------------------------------------------------------------------------------------------
* Automatic mention detection used. Avg F1 = (MUC + B cubed + CEAFE)/3.

June 6, 2011

This release is the code used for CoNLL Shared Task 2011. The score may differ due to the change in Parser or NER.

-----------------------------------------------------------------------------------------------------------------------------------------
                   conllst         MUC               B cubed              CEAF (M)            CEAF (E)            BLANC        | 
                    track     P     R     F1      P     R     F1      P     R     F1      P     R     F1      P     R     F1   | Avg F1
-----------------------------------------------------------------------------------------------------------------------------------------
conllst2011 dev   | close |  59.1  57.5  58.3  | 69.2  71.0  70.1  | 58.6  58.6  58.6  | 46.5  48.1  47.3  | 72.2  78.1  74.8  |  58.6  
conllst2011 dev   | open  |  60.1  59.5  59.8  | 69.5  71.9  70.7  | 59.0  59.0  59.0  | 46.5  47.1  46.8  | 73.8  78.6  76.0  |  59.1
conllst2011 test  | close |  57.5  61.8  59.6  | 68.2  68.4  68.3  | 56.4  56.4  56.4  | 47.8  43.4  45.5  | 76.2  70.6  73.0  |  57.8 
conllst2011 test  | open  |  59.3  62.8  61.0  | 69.0  68.9  68.9  | 56.7  56.7  56.7  | 46.8  43.3  45.0  | 76.6  71.9  74.0  |  58.3
-----------------------------------------------------------------------------------------------------------------------------------------
* Automatic mention detection used. Avg F1 = (MUC + B cubed + CEAFE)/3.

----------------------------------------------------------------------------
                      MUC               B cubed             Pairwise
                 P     R     F1      P     R     F1      P     R     F1
----------------------------------------------------------------------------
ACE2004 dev   | 86.0  75.5  80.4  | 89.3  76.5  82.4  | 81.7  55.2  65.9 
ACE2004 test  | 82.7  70.2  75.9  | 88.7  74.5  81.0  | 77.2  44.6  56.6 
ACE2004 nwire | 84.6  75.1  79.6  | 87.3  74.1  80.2  | 79.4  50.1  61.4
MUC6 test     | 90.6  69.1  78.4  | 90.6  63.1  74.4  | 89.7  57.0  69.7
----------------------------------------------------------------------------
* Gold mentions are used.

August 26, 2010

This release is generally similar to the code used for EMNLP 2010, with one additional sieve: relaxed exact string match.

----------------------------------------------------------------------------
                      MUC               B cubed             Pairwise
                 P     R     F1      P     R     F1      P     R     F1
----------------------------------------------------------------------------
ACE2004 dev   | 84.1  73.9  78.7  | 88.3  74.2  80.7  | 80.0  51.0  62.3
ACE2004 test  | 80.5  72.4  76.2  | 85.4  75.9  80.4  | 68.7  47.9  56.4 
ACE2004 nwire | 83.8  72.8  77.9  | 87.5  72.1  79.0  | 79.3  47.6  59.5
MUC6 test     | 90.3  68.9  78.2  | 90.5  62.3  73.8  | 89.4  55.5  68.5
----------------------------------------------------------------------------

Software > Deterministic Coreference Resolution System

Stanford Deterministic Coreference Resolution System