public class Document
extends java.lang.Object
Modifier and Type | Field and Description |
---|---|
protected CoreNLPProtos.Document.Builder |
impl
The protocol buffer representing this document
|
protected java.util.List<Sentence> |
sentences
The list of sentences associated with this document
|
protected ProtobufAnnotationSerializer |
serializer
A serializer to assist in serializing and deserializing from Protocol buffers
|
Constructor and Description |
---|
Document(Annotation ann) |
Document(CoreNLPProtos.Document proto) |
Document(java.util.Properties props,
Annotation ann)
Convert a CoreNLP Annotation object to a Document.
|
Document(java.util.Properties props,
CoreNLPProtos.Document proto)
Create a Document object from a read Protocol Buffer.
|
Document(java.util.Properties props,
java.lang.String text)
Create a new document from the passed in text and the given properties.
|
Document(java.lang.String text)
Create a new document from the passed in text.
|
Modifier and Type | Method and Description |
---|---|
Annotation |
asAnnotation()
Return this Document as an Annotation object.
|
Document |
cased()
Make this document case sensitive.
|
Document |
caseless()
Make this document caseless.
|
java.util.Map<java.lang.Integer,CorefChain> |
coref() |
java.util.Map<java.lang.Integer,CorefChain> |
coref(java.util.Properties props)
Returns the coref chains in the document.
|
static Document |
deserialize(java.io.InputStream in)
Read a document from an input stream.
|
java.util.Optional<java.lang.String> |
docid()
Returns the document id of the document, if one was found
|
boolean |
equals(java.lang.Object o) |
int |
hashCode() |
java.lang.String |
json(java.util.function.Function<Sentence,java.lang.Object>... functions)
Write this annotation as a JSON string.
|
java.lang.String |
jsonMinified(java.util.function.Function<Sentence,java.lang.Object>... functions)
Like the
Document@json(Function...) function, but with minified JSON more suitable
for sending over the wire. |
Sentence |
sentence(int sentenceIndex) |
Sentence |
sentence(int sentenceIndex,
java.util.Properties props) |
java.util.List<Sentence> |
sentences() |
java.util.List<Sentence> |
sentences(java.util.Properties props)
Get the sentences in this document, as a list.
|
protected java.util.List<Sentence> |
sentences(java.util.Properties props,
Annotator tokenizer)
Get the sentences in this document, as a list.
|
CoreNLPProtos.Document |
serialize()
Serialize this Document as a Protocol Buffer.
|
void |
serialize(java.io.OutputStream out)
Write this document to an output stream.
|
static void |
setBackend(AnnotatorImplementations backend)
Set the backend implementations for our CoreNLP pipeline.
|
Document |
setDocid(java.lang.String docid)
Sets the document id of the document, returning this.
|
java.lang.String |
text()
Get the raw text of the document, as input by, e.g.,
Document(String) . |
java.lang.String |
toString() |
static void |
useServer(java.lang.String host,
int port)
Use the CoreNLP Server (
StanfordCoreNLPServer ) for the
heavyweight backend annotation job. |
static void |
useServer(java.lang.String host,
int port,
java.lang.String apiKey,
java.lang.String apiSecret,
boolean lazy)
Use the CoreNLP Server (
StanfordCoreNLPServer ) for the
heavyweight backend annotation job, authenticating with the given
credentials. |
static void |
useServer(java.lang.String host,
java.lang.String apiKey,
java.lang.String apiSecret) |
static void |
useServer(java.lang.String host,
java.lang.String apiKey,
java.lang.String apiSecret,
boolean lazy) |
java.lang.String |
xml(java.util.function.Function<Sentence,java.lang.Object>... functions)
Write this annotation as an XML string.
|
java.lang.String |
xmlMinified(java.util.function.Function<Sentence,java.lang.Object>... functions)
Like the
Document@xml(Function...) function, but with minified XML more suitable
for sending over the wire. |
protected final CoreNLPProtos.Document.Builder impl
protected java.util.List<Sentence> sentences
protected final ProtobufAnnotationSerializer serializer
public Document(java.util.Properties props, java.lang.String text)
text
- The text of the document.public Document(java.lang.String text)
text
- The text of the document.public Document(java.util.Properties props, Annotation ann)
ann
- The CoreNLP Annotation object.public Document(Annotation ann)
Document(Properties, Annotation)
public Document(java.util.Properties props, CoreNLPProtos.Document proto)
proto
- The protocol buffer representing this document.serialize()
public Document(CoreNLPProtos.Document proto)
public static void setBackend(AnnotatorImplementations backend)
ServerAnnotatorImplementations
.backend
- The backend to use from now on for annotating
documents.public static void useServer(java.lang.String host, int port)
StanfordCoreNLPServer
) for the
heavyweight backend annotation job.host
- The hostname of the server.port
- The port the server is running on.public static void useServer(java.lang.String host, int port, java.lang.String apiKey, java.lang.String apiSecret, boolean lazy)
StanfordCoreNLPServer
) for the
heavyweight backend annotation job, authenticating with the given
credentials.host
- The hostname of the server.port
- The port the server is running on.apiKey
- The api key to use as the username for authenticationapiSecret
- The api secrete to use as the password for authenticationlazy
- Only run the annotations that are required at this time. If this is
false, we will also run a bunch of standard annotations, to cut down on
expected number of round-trips.public static void useServer(java.lang.String host, java.lang.String apiKey, java.lang.String apiSecret, boolean lazy)
public static void useServer(java.lang.String host, java.lang.String apiKey, java.lang.String apiSecret)
public Document caseless()
public Document cased()
Sentence.caseless()
.public CoreNLPProtos.Document serialize()
Document(edu.stanford.nlp.pipeline.CoreNLPProtos.Document)
.public void serialize(java.io.OutputStream out) throws java.io.IOException
out
- The output stream to write to. The stream is not closed after the method returns.java.io.IOException
- Thrown from the underlying write() implementation.deserialize(InputStream)
public static Document deserialize(java.io.InputStream in) throws java.io.IOException
in
- The input stream to deserialize from.java.io.IOException
- Thrown by the underlying parse() implementation.serialize(java.io.OutputStream)
@SafeVarargs public final java.lang.String json(java.util.function.Function<Sentence,java.lang.Object>... functions)
Write this annotation as a JSON string. Optionally, you can also specify a number of operations to call on the document before dumping it to JSON. This allows the user to ensure that certain annotations have been computed before the document is dumped. For example:
String json = new Document("Lucy in the sky with diamonds").json(Sentence::parse, Sentence::ner);
will create a JSON dump of the document, ensuring that at least the parse tree and ner tags are populated.
functions
- The (possibly empty) list of annotations to populate on the document before dumping it
to JSON.@SafeVarargs public final java.lang.String jsonMinified(java.util.function.Function<Sentence,java.lang.Object>... functions)
Document@json(Function...)
function, but with minified JSON more suitable
for sending over the wire.functions
- The (possibly empty) list of annotations to populate on the document before dumping it
to JSON.@SafeVarargs public final java.lang.String xml(java.util.function.Function<Sentence,java.lang.Object>... functions)
String xml = new Document("Lucy in the sky with diamonds").xml(Document::parse, Document::ner);
will create a XML dump of the document, ensuring that at least the parse tree and ner tags are populated.
functions
- The (possibly empty) list of annotations to populate on the document before dumping it
to XML.@SafeVarargs public final java.lang.String xmlMinified(java.util.function.Function<Sentence,java.lang.Object>... functions)
Document@xml(Function...)
function, but with minified XML more suitable
for sending over the wire.functions
- The (possibly empty) list of annotations to populate on the document before dumping it
to XML.public java.util.List<Sentence> sentences(java.util.Properties props)
props
- The properties to use in the WordsToSentencesAnnotator
.protected java.util.List<Sentence> sentences(java.util.Properties props, Annotator tokenizer)
props
- The properties to use in the WordsToSentencesAnnotator
.public java.util.List<Sentence> sentences()
sentences(java.util.Properties)
public Sentence sentence(int sentenceIndex, java.util.Properties props)
sentences(java.util.Properties)
public Sentence sentence(int sentenceIndex)
sentences(java.util.Properties)
public java.lang.String text()
Document(String)
.public java.util.Map<java.lang.Integer,CorefChain> coref(java.util.Properties props)
props
- The properties to use in the DeterministicCorefAnnotator
.public java.util.Map<java.lang.Integer,CorefChain> coref()
coref(java.util.Properties)
public java.util.Optional<java.lang.String> docid()
public Document setDocid(java.lang.String docid)
public Annotation asAnnotation()
Therefore, this method is generally NOT recommended.
public boolean equals(java.lang.Object o)
equals
in class java.lang.Object
public int hashCode()
hashCode
in class java.lang.Object
public java.lang.String toString()
toString
in class java.lang.Object