Interface | Description |
---|---|
CorefMentionFinder |
Interface for finding coref mentions in a document.
|
MentionMatcher |
Are two mentions compatible
|
Class | Description |
---|---|
ACEMentionExtractor |
Extracts
<COREF> mentions from a file annotated in ACE format (ACE2004, ACE2005). |
CoNLL2011DocumentReader |
Read _conll file format from CoNLL2011.
|
CoNLL2011DocumentReader.CorefMentionAnnotation | |
CoNLL2011DocumentReader.CorpusStats | |
CoNLL2011DocumentReader.Document | |
CoNLL2011DocumentReader.NamedEntityAnnotation | |
CoNLL2011DocumentReader.Options |
Flags
|
CoNLLMentionExtractor |
Extracts coref mentions from CoNLL2011 data files.
|
Constants | |
CorefChain |
Output of (deterministic) coref system.
|
CorefChain.CorefMention |
Mention for coref output.
|
CorefChain.CorefMentionComparator | |
CorefChain.MentionComparator | |
CorefCluster |
One cluster for the SieveCoreferenceSystem.
|
CorefCoreAnnotations |
Similar to
CoreAnnotations ,
but this class contains
annotations made specifically for storing Coref data. |
CorefCoreAnnotations.CorefAnnotation |
The standard key for the coref label.
|
CorefCoreAnnotations.CorefChainAnnotation |
CorefChainID - CorefChain map.
|
CorefCoreAnnotations.CorefClusterAnnotation | Deprecated
This was an original dcoref annotation.
|
CorefCoreAnnotations.CorefClusterIdAnnotation |
An integer representing a document-level unique cluster of
coreferent entities.
|
CorefCoreAnnotations.CorefDestAnnotation |
Destination of the coreference link for this word (if any).
|
CorefCoreAnnotations.CorefGraphAnnotation | Deprecated |
CoreferenceSystem |
abstract class for coreference resolution system
|
CorefScorer |
Wrapper for a coreference resolution score: MUC, B cubed, Pairwise.
|
Dictionaries |
Provides accessors for various grammatical, semantic, and world knowledge
lexicons and word lists primarily used by the Sieve coreference system,
but sometimes also drawn on from other code.
|
Document | |
Mention |
One mention for the SieveCoreferenceSystem.
|
MentionExtractor |
Generic mention extractor from a corpus.
|
MUCMentionExtractor |
Extracts <COREF> mentions from a file annotated in MUC format.
|
RuleBasedCorefMentionFinder | |
Rules |
Rules for coref system (mention detection, entity coref, event coref)
The name of the method for mention detection starts with detection,
for entity coref starts with entity, and for event coref starts with event.
|
ScorerBCubed |
B^3 scorer
|
ScorerMUC | |
ScorerPairwise | |
Semantics |
Semantic knowledge: currently WordNet is available
|
SieveCoreferenceSystem |
Multi-pass Sieve coreference resolution system (see EMNLP 2010 paper).
|
SieveOptions | |
SingletonPredictor |
Train the singleton predictor using a logistic regression classifier as
described in Recasens, de Marneffe and Potts, NAACL 2013
Label 0 = Singleton mention.
|
SpeakerInfo |
Information about a speaker
|
Enum | Description |
---|---|
Dictionaries.Animacy | |
Dictionaries.Gender | |
Dictionaries.MentionType | |
Dictionaries.Number | |
Dictionaries.Person | |
Document.DocType | |
ScorerBCubed.BCubedType |
This system implements the multi-pass sieve coreference resolution system of Raghunathan et al. at EMNLP 2010.
(This is an older coreference system; you might also want to look at the systems in the coref
package.)
Note that all the results reported here use gold mentions (just as in the paper). However, the DeterministicCorefAnnotator in StanfordCoreNLP implements a simple mention detection component, so this code can be used to perform coreference resolution on raw text.
Note that this code is already different from the system reported in the paper. After the EMNLP paper, two additional sieves were included. The current code gives slightly better scores than those in the paper.
---------------------------------------------------------------------------- MUC B cubed Pairwise P R F1 P R F1 P R F1 ---------------------------------------------------------------------------- ACE2004 dev | 84.5 75.7 79.8 | 88.0 75.8 81.4 | 78.6 53.8 63.9 ACE2004 test | 80.4 72.9 76.4 | 85.1 76.4 80.5 | 68.7 48.9 57.1 ACE2004 nwire | 83.8 74.3 78.8 | 86.9 73.7 79.7 | 78.1 51.7 62.2 MUC6 test | 90.5 69.0 78.3 | 90.5 62.5 73.9 | 89.3 56.1 68.9 ----------------------------------------------------------------------------
This release is generally similar to the code used for EMNLP 2010,
with one additional sieve: relaxed exact string match.
The score may differ also due to the change in Parser or NER.
Results:
---------------------------------------------------------------------------- MUC B cubed Pairwise P R F1 P R F1 P R F1 ---------------------------------------------------------------------------- ACE2004 dev | 84.1 73.9 78.7 | 88.3 74.2 80.7 | 80.0 51.0 62.3 ACE2004 test | 80.5 72.3 76.2 | 85.4 75.9 80.4 | 68.7 47.8 56.4 ACE2004 nwire | 83.8 72.8 77.9 | 87.5 72.1 79.0 | 79.3 47.6 59.5 MUC6 test | 90.3 68.9 78.2 | 90.5 62.3 73.8 | 89.4 55.5 68.5 ----------------------------------------------------------------------------
annotators = tokenize, ssplit, pos, lemma, ner, parse, dcorefThe required properties for dcoref are the following:
dcoref.demonym dcoref.animate dcoref.inanimate dcoref.male dcoref.neutral dcoref.female dcoref.plural dcoref.singular sievePasses // If omitted, default value will be used.
See StanfordCoreNLP for more details.
java -Xmx8g edu.stanford.nlp.dcoref.SieveCoreferenceSystem -props <properties file>A sample properties file (coref.properties) is included in dcoref package. The properties file includes the following:
annotators = pos, lemma, ner // annotators needed for coreference resolution pos.model // For POS model ner.model.3class ner.model.7class // For NER ner.model.MISCclass parser.model // For parser parser.maxlen = 100 dcoref.demonym // The path for a file that includes a list of demonyms dcoref.animate // The list of animate/inanimate mentions (Ji and Lin, 2009) dcoref.inanimate dcoref.male // The list of male/neutral/female mentions (Bergsma and Lin, 2006) dcoref.neutral // Neutral means a mention that is usually referred by 'it' dcoref.female dcoref.plural // The list of plural/singular mentions (Bergsma and Lin, 2006) dcoref.singular sievePasses // Sieve passes - each class is defined in dcoref/sievepasses/ logFile // Path for log file for coref system evaluation ace2004 or mucfile // Use either ace2004 or mucfile (not both) // ace2004: path for the directory containing ACE2004 files // mucfile: path for the MUC fileThis system can process both ACE2004 and MUC6 corpora in their original formats. Examples of corpus are given below. MUC6:
... <s> By/IN proposing/VBG <COREF ID="13" TYPE="IDENT" REF="6" MIN="date"> a/DT meeting/NN date/NN</COREF> ,/, <COREF ID="14" TYPE="IDENT" REF="0"> <ORGANIZATION> Eastern/NNP</ORGANIZATION></COREF> moved/VBD one/CD step/NN closer/JJR toward/IN reopening/VBG current/JJ high-cost/JJ contract/NN agreements/NNS with/IN <COREF ID="15" TYPE="IDENT" REF="8" MIN="unions"><COREF ID="16" TYPE="IDENT" REF="14"> its/PRP$</COREF> unions/NNS</COREF> ./. </s> ...ACE2004:
... <document DOCID="20001115_AFP_ARB.0212.eng"> <entity ID="20001115_AFP_ARB.0212.eng-E1" TYPE="ORG" SUBTYPE="Educational" CLASS="SPC"> <entity_mention ID="1-47" TYPE="NAM" LDCTYPE="NAM"> <extent> <charseq START="475" END="506">the Globalization Studies Center</charseq> </extent> <head> <charseq START="479" END="506">Globalization Studies Center</charseq> </head> </entity_mention> ...