By default, it uses UTF-8. You can change the encoding used when reading files by setting the encoding property or by supplying the command line flag -encoding FOO.
Here are the steps:
<coreference> <coreference> <mention representative="true"> <sentence>1</sentence> <start>1</start> <end>3</end> <head>2</head> </mention> <mention> <sentence>2</sentence> <start>1</start> <end>2</end> <head>1</head> </mention> </coreference> </coreference>
The entire coref section is demarked by
<coreference> section. Each individual chain is
then demarked by another
<coreference>. (This is
perhaps an unfortunate naming, but at this point there are no plans to
<coreference> section for each chain is
a block describing each of the mentions. One mention will be labeled
representative mention. There are fields
sentence, indexed from 1 the range of words,
start (inclusive) to
inclusive), also indexed from 1, and
head, the index in
the sentence of the head word of this mention.
Either add more memory, use fewer annotators, or give CoreNLP smaller documents. Nearly all our annotators load large model files which use lots of memory. Running the full CoreNLP pipeline requires the sum of all these memory requirements. Additionally, the coreference module operates over an entire document. As the document size increases, its processing time and space increase without bound.
Site design by Bill MacCartney