|
|
By default, it uses UTF-8. You can change the encoding used when reading files by setting the encoding property or by supplying the command line flag -encoding FOO.
Here are the steps:
customAnnotatorClass.FOO=BAR
customAnnotatorClass.filter=com.foo.FilterAnnotator
<coreference>
<coreference>
<mention representative="true">
<sentence>1</sentence>
<start>1</start>
<end>3</end>
<head>2</head>
</mention>
<mention>
<sentence>2</sentence>
<start>1</start>
<end>2</end>
<head>1</head>
</mention>
</coreference>
</coreference>
The entire coref section is demarked by
a <coreference> section. Each individual chain is
then demarked by another <coreference>. (This is
perhaps an unfortunate naming, but at this point there are no plans to
change it.)
Inside the <coreference> section for each chain is
a block describing each of the mentions. One mention will be labeled
the representative mention. There are fields
for sentence, indexed from 1 the range of words,
from start (inclusive) to end (not
inclusive), also indexed from 1, and head, the index in
the sentence of the head word of this mention.
Either add more memory, use fewer annotators, or give CoreNLP smaller documents. Nearly all our annotators load large model files which use lots of memory. Running the full CoreNLP pipeline requires the sum of all these memory requirements. Additionally, the coreference module operates over an entire document. As the document size increases, its processing time and space increase without bound.
|
Local links: NLP lunch · PAIL lunch · NLP Reading Group · JavaNLP (javadocs) · machines · Wiki · Calendar |
Site design by Bill MacCartney |