Skip navigation links

Package edu.stanford.nlp.dcoref

Multi-pass Sieve Coreference Resolution System

See: Description

Package edu.stanford.nlp.dcoref Description

Multi-pass Sieve Coreference Resolution System

[authors] [current results] [changes] [usage]

This system implements the multi-pass sieve coreference resolution system of Raghunathan et al. at EMNLP 2010.

Note that the current code in this package does not implement mention detection. All results reported here use gold mentions (just as in the paper). However, the DeterministicCorefAnnotator in StanfordCoreNLP implements a simple mention detection component, so this code can be used to perform coreference resolution on raw text.

Note that this code is already different from the system reported in the paper. After the EMNLP paper, two additional sieves were included. The current code gives slightly better scores than those in the paper.

Authors

Current Results

 ----------------------------------------------------------------------------
 MUC               B cubed             Pairwise
 P     R     F1      P     R     F1      P     R     F1
 ----------------------------------------------------------------------------
 ACE2004 dev   | 84.5  75.7  79.8  | 88.0  75.8  81.4  | 78.6  53.8  63.9
 ACE2004 test  | 80.4  72.9  76.4  | 85.1  76.4  80.5  | 68.7  48.9  57.1
 ACE2004 nwire | 83.8  74.3  78.8  | 86.9  73.7  79.7  | 78.1  51.7  62.2
 MUC6 test     | 90.5  69.0  78.3  | 90.5  62.5  73.9  | 89.3  56.1  68.9
 ----------------------------------------------------------------------------
 

Changes

August 26, 2010

This release is generally similar to the code used for EMNLP 2010, with one additional sieve: relaxed exact string match.
The score may differ also due to the change in Parser or NER.

Results:

 ----------------------------------------------------------------------------
 MUC               B cubed             Pairwise
 P     R     F1      P     R     F1      P     R     F1
 ----------------------------------------------------------------------------
 ACE2004 dev   | 84.1  73.9  78.7  | 88.3  74.2  80.7  | 80.0  51.0  62.3
 ACE2004 test  | 80.5  72.3  76.2  | 85.4  75.9  80.4  | 68.7  47.8  56.4
 ACE2004 nwire | 83.8  72.8  77.9  | 87.5  72.1  79.0  | 79.3  47.6  59.5
 MUC6 test     | 90.3  68.9  78.2  | 90.5  62.3  73.8  | 89.4  55.5  68.5
 ----------------------------------------------------------------------------
 

Usage

Running coreference resolution on raw text

This software is now fully incorporated in StanfordCoreNLP, so all you have to do is add the dcoref annotator to the "annotators" property in StanfordCoreNLP. For example:
 annotators = tokenize, ssplit, pos, lemma, ner, parse, dcoref
 
The required properties for dcoref are the following:
 dcoref.demonym
 dcoref.animate
 dcoref.inanimate
 dcoref.male
 dcoref.neutral
 dcoref.female
 dcoref.plural
 dcoref.singular
 sievePasses         // If omitted, default value will be used.
 

See StanfordCoreNLP for more details.

How to replicate the results in our EMNLP2010 paper

To replicate the results in the paper run:
 java -Xmx8g edu.stanford.nlp.dcoref.SieveCoreferenceSystem -props <properties file>
 
A sample properties file (coref.properties) is included in dcoref package. The properties file includes the following:
 annotators = pos, lemma, ner    // annotators needed for coreference resolution
 pos.model                       // For POS model
 ner.model.3class
 ner.model.7class                // For NER
 ner.model.MISCclass
 parser.model                    // For parser
 parser.maxlen = 100
 dcoref.demonym                  // The path for a file that includes a list of demonyms
 dcoref.animate                  // The list of animate/inanimate mentions (Ji and Lin, 2009)
 dcoref.inanimate
 dcoref.male                     // The list of male/neutral/female mentions (Bergsma and Lin, 2006)
 dcoref.neutral                  // Neutral means a mention that is usually referred by 'it'
 dcoref.female
 dcoref.plural                   // The list of plural/singular mentions (Bergsma and Lin, 2006)
 dcoref.singular
 sievePasses                     // Sieve passes - each class is defined in dcoref/sievepasses/
 logFile                         // Path for log file for coref system evaluation
 ace2004 or mucfile              // Use either ace2004 or mucfile (not both)
 // ace2004: path for the directory containing ACE2004 files
 // mucfile: path for the MUC file
 
This system can process both ACE2004 and MUC6 corpora in their original formats. Examples of corpus are given below. MUC6:
 ...
 <s> By/IN proposing/VBG <COREF ID="13" TYPE="IDENT" REF="6" MIN="date"> a/DT meeting/NN date/NN</COREF> ,/, <COREF ID="14" TYPE="IDENT" REF="0">
 <ORGANIZATION> Eastern/NNP</ORGANIZATION></COREF> moved/VBD one/CD step/NN closer/JJR toward/IN reopening/VBG current/JJ high-cost/JJ contract/NN agreements/NNS with/IN <COREF ID="15" TYPE="IDENT" REF="8" MIN="unions"><COREF ID="16" TYPE="IDENT" REF="14"> its/PRP$</COREF> unions/NNS</COREF> ./. </s>
 ...
 
ACE2004:
 ...
 <document DOCID="20001115_AFP_ARB.0212.eng">
 <entity ID="20001115_AFP_ARB.0212.eng-E1" TYPE="ORG" SUBTYPE="Educational" CLASS="SPC">
 <entity_mention ID="1-47" TYPE="NAM" LDCTYPE="NAM">
 <extent>
 <charseq START="475" END="506">the Globalization Studies Center</charseq>
 </extent>
 <head>
 <charseq START="479" END="506">Globalization Studies Center</charseq>
 </head>
 </entity_mention>
 ...
 
Skip navigation links

Stanford NLP Group