ProtobufAnnotationSerializer (Stanford JavaNLP API)

java.lang.Object
- edu.stanford.nlp.pipeline.AnnotationSerializer
- - edu.stanford.nlp.pipeline.ProtobufAnnotationSerializer

```
public class ProtobufAnnotationSerializer
extends AnnotationSerializer
```
A serializer using Google's protocol buffer format. The files produced by this serializer, in addition to being language-independent, are a little over 10% the size and 4x faster to read+write versus the default Java serialization (see GenericAnnotationSerializer), when both files are compressed with gzip.

Note that this handles only a subset of the possible annotations that can be attached to a sentence. Nonetheless, it is guaranteed to be lossless with the default set of named annotators you can create from a StanfordCoreNLP pipeline, with default properties defined for each annotator. Note that the serializer does not gzip automatically -- this must be done by passing in a GZipOutputStream and calling a GZipInputStream manually. For most Annotations, gzipping provides a notable decrease in size (~2.5x) due to most of the data being raw Strings.

To allow lossy serialization, use ProtobufAnnotationSerializer(boolean). Otherwise, an exception is thrown if an unknown key appears in the annotation which would not be saved to the protocol buffer. If such keys exist, and are a part of the standard CoreNLP pipeline, please let us know! If you would like to serialize keys in addition to those serialized by default (e.g., you are attaching your own annotations), then you should do the following:
1. Create a .proto file which extends one or more of Document, Sentence, or Token. Each of these have fields 100-255 left open for user extensions. An example of such an extension is:
```
       package edu.stanford.nlp.pipeline;

       option java_package = "com.example.my.awesome.nlp.app";
       option java_outer_classname = "MyAppProtos";

       import "CoreNLP.proto";

       extend Sentence {
         optional uint32 myNewField    = 101;
       }
     
```
2. Compile your .proto file with protoc. For example (from CORENLP_HOME):
```
        protoc -I=src/edu/stanford/nlp/pipeline/:/path/to/folder/contining/your/proto/file --java_out=/path/to/output/src/folder/  /path/to/proto/file
     
```
3. Extend ProtobufAnnotationSerializer to serialize and deserialize your field. Generally, this entail overriding two functions -- one to write the proto and one to read it. In both cases, you usually want to call the superclass' implementation of the function, and add on to it from there. In our running example, adding a field to the CoreNLPProtos.Sentence proto, you would overwrite:
  - toProtoBuilder(edu.stanford.nlp.util.CoreMap, java.util.Set)
  - fromProtoNoTokens(edu.stanford.nlp.pipeline.CoreNLPProtos.Sentence)
  Note, importantly, that for the serializer to be able to check for lossless serialization, all annotations added to the proto must be registered as added by being removed from the set passed to toProtoBuilder(edu.stanford.nlp.util.CoreMap, java.util.Set) (and the analogous functions for documents and tokens).
  
  Lastly, the new annotations must be registered in the original .proto file; this can be achieved by including a static block in the overwritten class:
```
       static {
         ExtensionRegistry registry = ExtensionRegistry.newInstance();
         registry.add(MyAppProtos.myNewField);
         CoreNLPProtos.registerAllExtensions(registry);
       }
     
```
TODOs
- In CoreNLP, the leaves of a tree are == to the tokens in a sentence. This is not the case for a deserialized proto.
Author:

Gabor Angeli

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class ProtobufAnnotationSerializer.LossySerializationException
An exception to denote that the serialization would be lossy.
- Nested classes/interfaces inherited from class edu.stanford.nlp.pipeline.AnnotationSerializer
  AnnotationSerializer.IntermediateEdge, AnnotationSerializer.IntermediateNode, AnnotationSerializer.IntermediateSemanticGraph

Nested Classes
Modifier and Type	Class and Description
`static class`	`ProtobufAnnotationSerializer.LossySerializationException` An exception to denote that the serialization would be lossy.

Field Summary

Fields
Modifier and Type	Field and Description
`boolean`	`enforceLosslessSerialization` If true, serialization is guaranteed to be lossless or else a runtime exception is thrown at serialization time.

Constructor Summary

Constructors
Constructor and Description
`ProtobufAnnotationSerializer()` Create a new Annotation serializer outputting to a protocol buffer format.
`ProtobufAnnotationSerializer(boolean enforceLosslessSerialization)` Create a new Annotation serializer outputting to a protocol buffer format.

Method Summary

All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods
Modifier and Type	Method and Description
`CoreNLPProtos.IndexedWord`	`createIndexedWordProtoFromCL(CoreLabel cl)`
`CoreNLPProtos.IndexedWord`	`createIndexedWordProtoFromIW(IndexedWord iw)`
`static SemanticGraph`	`fromProto(CoreNLPProtos.DependencyGraph proto, java.util.List<CoreLabel> sentence, java.lang.String docid)` Voodoo magic to convert a serialized dependency graph into a `SemanticGraph`.
`Annotation`	`fromProto(CoreNLPProtos.Document proto)` Returns a complete document, intended to mimic a document passes as input to `toProto(Annotation)` as closely as possible.
`static Tree`	`fromProto(CoreNLPProtos.FlattenedParseTree proto)` Retrieve a Tree object from a flattened tree protobuf.
`static Language`	`fromProto(CoreNLPProtos.Language lang)` Return a CoreNLP language from a Protobuf language
`static java.util.HashMap<java.lang.Integer,java.lang.String>`	`fromProto(CoreNLPProtos.MapIntString proto)` Convert a serialized Map back into a Java Map.
`static java.util.HashMap<java.lang.String,java.lang.String>`	`fromProto(CoreNLPProtos.MapStringString proto)` Convert a serialized Map back into a Java Map.
`static OperatorSpec`	`fromProto(CoreNLPProtos.Operator operator)` Return a CoreNLP Operator (Natural Logic operator) from a Protobuf operator
`static Tree`	`fromProto(CoreNLPProtos.ParseTree proto)` Retrieve a Tree object from a saved protobuf.
`static Tree`	`fromProto(CoreNLPProtos.ParseTree proto, java.util.List<CoreLabel> tokens)` Retrieve a Tree object and then attach the tokens passed in.
`static Polarity`	`fromProto(CoreNLPProtos.Polarity polarity)` Return a CoreNLP Polarity (Natural Logic polarity) from a Protobuf operator
`static RelationTriple`	`fromProto(CoreNLPProtos.RelationTriple proto, Annotation doc, int sentenceIndex)` Return a `RelationTriple` object from the serialized representation.
`CoreMap`	`fromProto(CoreNLPProtos.Sentence proto)` Deprecated.
`static SentenceFragment`	`fromProto(CoreNLPProtos.SentenceFragment fragment, SemanticGraph tree)` Returns a sentence fragment from a given protocol buffer, and an associated parse tree.
`CoreLabel`	`fromProto(CoreNLPProtos.Token proto)` Create a CoreLabel from its serialized counterpart.
`protected CoreMap`	`fromProtoNoTokens(CoreNLPProtos.Sentence proto)` Create a CoreMap representing a sentence from this protocol buffer.
`protected void`	`loadSentenceMentions(CoreNLPProtos.Sentence proto, CoreMap sentence)`
`Pair<Annotation,java.io.InputStream>`	`read(java.io.InputStream is)` Read a single object from this stream.
`Annotation`	`readUndelimited(java.io.File in)` Read a single protocol buffer, which constitutes the entire stream.
`protected java.lang.String`	`recoverOriginalText(java.util.List<CoreLabel> tokens, CoreNLPProtos.Sentence sentence)` Recover the `CoreAnnotations.TextAnnotation` field of a sentence from the tokens.
`protected void`	`setSentenceTokenAnnotations(CoreMap sentence, CoreNLPProtos.Sentence protoSentence, java.util.List<CoreLabel> sentenceTokens, java.lang.String docid)` On a partially finished deserialized sentence, set some annotations which should reuse the same token objects as the parent sentence
`static CoreNLPProtos.FlattenedParseTree`	`toFlattenedTree(Tree tree)` Turn the given tree into a FlattedParseTree object from the proto The new structure is useful because the ParseTree object can't represent trees past a certain depth.
`static void`	`toFlattenedTree(Tree tree, CoreNLPProtos.FlattenedParseTree.Builder treeBuilder)`
`static CoreNLPProtos.MapIntString`	`toMapIntStringProto(java.util.Map<java.lang.Integer,java.lang.String> map)` Serialize a Map (from Integers to Strings) to a proto.
`static CoreNLPProtos.MapStringString`	`toMapStringStringProto(java.util.Map<java.lang.String,java.lang.String> map)` Serialize a Map (from Strings to Strings) to a proto.
`CoreNLPProtos.Document`	`toProto(Annotation doc)` Create a Document proto from a CoreMap instance.
`CoreNLPProtos.CorefChain`	`toProto(CorefChain chain)` Create a CorefChain protocol buffer from the given coref chain.
`CoreNLPProtos.Token`	`toProto(CoreLabel coreLabel)` Create a CoreLabel proto from a CoreLabel instance.
`CoreNLPProtos.Sentence`	`toProto(CoreMap sentence)` Create a Sentence proto from a CoreMap instance.
`CoreNLPProtos.Entity`	`toProto(EntityMention ent)` Serialize the given entity mention to the corresponding protocol buffer.
`static CoreNLPProtos.Language`	`toProto(Language lang)` Serialize a CoreNLP Language to a Protobuf Language.
`CoreNLPProtos.Mention`	`toProto(Mention mention)`
`static CoreNLPProtos.Operator`	`toProto(OperatorSpec op)` Return a Protobuf operator from an OperatorSpec (Natural Logic).
`static CoreNLPProtos.Polarity`	`toProto(Polarity pol)` Return a Protobuf polarity from a CoreNLP Polarity (Natural Logic).
`CoreNLPProtos.Relation`	`toProto(RelationMention rel)` Serialize the given relation mention to the corresponding protocol buffer.
`CoreNLPProtos.RelationTriple`	`toProto(RelationTriple triple)` Return a Protobuf RelationTriple from a RelationTriple.
`CoreNLPProtos.DependencyGraph`	`toProto(SemanticGraph graph)` Create a compact representation of the semantic graph for this dependency parse.
`CoreNLPProtos.DependencyGraph`	`toProto(SemanticGraph graph, boolean storeTokens)` Create a compact representation of the semantic graph for this dependency parse.
`static CoreNLPProtos.SentenceFragment`	`toProto(SentenceFragment fragment)` Return a Protobuf RelationTriple from a RelationTriple.
`CoreNLPProtos.SpeakerInfo`	`toProto(SpeakerInfo speakerInfo)`
`CoreNLPProtos.Timex`	`toProto(Timex timex)` Convert the given Timex object to a protocol buffer.
`static CoreNLPProtos.ParseTree`	`toProto(Tree parseTree)` Create a ParseTree proto from a Tree.
`CoreNLPProtos.Document.Builder`	`toProtoBuilder(Annotation doc)` Create a protobuf builder, rather than a compiled protobuf.
`protected CoreNLPProtos.Document.Builder`	`toProtoBuilder(Annotation doc, java.util.Set<java.lang.Class<?>> keysToSerialize)` The method to extend by subclasses of the Protobuf Annotator if custom additions are added to Tokens.
`protected CoreNLPProtos.Token.Builder`	`toProtoBuilder(CoreLabel coreLabel, java.util.Set<java.lang.Class<?>> keysToSerialize)` The method to extend by subclasses of the Protobuf Annotator if custom additions are added to Tokens.
`CoreNLPProtos.Sentence.Builder`	`toProtoBuilder(CoreMap sentence)` Create a protobuf builder, rather than a compiled protobuf.
`protected CoreNLPProtos.Sentence.Builder`	`toProtoBuilder(CoreMap sentence, java.util.Set<java.lang.Class<?>> keysToSerialize)` The method to extend by subclasses of the Protobuf Annotator if custom additions are added to Tokens.
`CoreNLPProtos.NERMention`	`toProtoMention(CoreMap mention)` Convert a mention object to a protocol buffer.
`CoreNLPProtos.Quote`	`toProtoQuote(CoreMap quote)` Convert a quote object to a protocol buffer.
`CoreNLPProtos.Section`	`toProtoSection(CoreMap section)` Create a Section CoreMap protocol buffer from the given Section CoreMap
`java.io.OutputStream`	`write(Annotation corpus, java.io.OutputStream os)` Append a single object to this stream.

Methods inherited from class edu.stanford.nlp.pipeline.AnnotationSerializer
readCoreDocument, writeCoreDocument

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - enforceLosslessSerialization
```
public final boolean enforceLosslessSerialization
```
    If true, serialization is guaranteed to be lossless or else a runtime exception is thrown at serialization time.
- Constructor Detail
  - ProtobufAnnotationSerializer
```
public ProtobufAnnotationSerializer()
```
    Create a new Annotation serializer outputting to a protocol buffer format. This is guaranteed to either be a lossless compression, or throw an exception at serialization time.
  - ProtobufAnnotationSerializer
```
public ProtobufAnnotationSerializer(boolean enforceLosslessSerialization)
```
    Create a new Annotation serializer outputting to a protocol buffer format.
    
    Parameters:
    
    enforceLosslessSerialization - If set to true, a ProtobufAnnotationSerializer.LossySerializationException is thrown at serialization time if the serialization would be lossy. If set to false, these exceptions are ignored.
- Method Detail
  - write
```
public java.io.OutputStream write(Annotation corpus,
                                  java.io.OutputStream os)
                           throws java.io.IOException
```
    Append a single object to this stream. Subsequent calls to append on the same stream must supply the returned output stream; furthermore, implementations of this function must be prepared to handle the same output stream being passed in as it returned on the previous write.
    
    Specified by:
    
    write in class AnnotationSerializer
    
    Parameters:
    
    corpus - The document to serialize to the stream.
    
    os - The output stream to serialize to.
    
    Returns:
    
    The output stream which should be closed when done writing, and which should be passed into subsequent calls to write() on this serializer.
    
    Throws:
    
    java.io.IOException - Thrown if the underlying output stream throws the exception.
  - read
```
public Pair<Annotation,java.io.InputStream> read(java.io.InputStream is)
                                          throws java.io.IOException,
                                                 java.lang.ClassNotFoundException,
                                                 java.lang.ClassCastException
```
    Read a single object from this stream. Subsequent calls to read on the same input stream must supply the returned input stream; furthermore, implementations of this function must be prepared to handle the same input stream being passed to it as it returned on the previous read.
    
    Specified by:
    
    read in class AnnotationSerializer
    
    Parameters:
    
    is - The input stream to read a document from.
    
    Returns:
    
    A pair of the read document, and the implementation-specific input stream which it was actually read from. This stream should be passed to subsequent calls to read on the same stream, and should be closed when reading completes.
    
    Throws:
    
    java.io.IOException - Thrown if the underlying stream throws the exception.
    
    java.lang.ClassNotFoundException - Thrown if an object was read that does not exist in the classpath.
    
    java.lang.ClassCastException - Thrown if the signature of a class changed in way that was incompatible with the serialized document.
  - readUndelimited
```
public Annotation readUndelimited(java.io.File in)
                           throws java.io.IOException
```
    Read a single protocol buffer, which constitutes the entire stream. This is in contrast to the default, where mutliple buffers may come out of the stream, and therefore each one is prepended by the length of the buffer to follow.
    
    Parameters:
    
    in - The file to read.
    
    Returns:
    
    A parsed Annotation.
    
    Throws:
    
    java.io.IOException - In case the stream cannot be read from.
  - toProto
```
public CoreNLPProtos.Token toProto(CoreLabel coreLabel)
```
    Create a CoreLabel proto from a CoreLabel instance. This is not static, as it optionally throws an exception if the serialization is lossy.
    
    Parameters:
    
    coreLabel - The CoreLabel to convert
    
    Returns:
    
    A protocol buffer message corresponding to this CoreLabel
  - toProtoBuilder
```
protected CoreNLPProtos.Token.Builder toProtoBuilder(CoreLabel coreLabel,
                                                     java.util.Set<java.lang.Class<?>> keysToSerialize)
```
    The method to extend by subclasses of the Protobuf Annotator if custom additions are added to Tokens. In contrast to toProto(edu.stanford.nlp.ling.CoreLabel), this function returns a builder that can be extended.
    
    Parameters:
    
    coreLabel - The sentence to save to a protocol buffer
    
    keysToSerialize - A set tracking which keys have been saved. It's important to remove any keys added to the proto from this set, as the code tracks annotations to ensure lossless serialization
  - toProtoBuilder
```
public CoreNLPProtos.Sentence.Builder toProtoBuilder(CoreMap sentence)
```
    Create a protobuf builder, rather than a compiled protobuf. Useful for, e.g., the simple CoreNLP interface.
    
    Parameters:
    
    sentence - The sentence to serialize.
    
    Returns:
    
    A Sentence builder.
  - toProto
```
public CoreNLPProtos.Sentence toProto(CoreMap sentence)
```
    Create a Sentence proto from a CoreMap instance. This is not static, as it optionally throws an exception if the serialization is lossy.
    
    Parameters:
    
    sentence - The CoreMap to convert. Note that it should not be a CoreLabel or an Annotation, and should represent a sentence.
    
    Returns:
    
    A protocol buffer message corresponding to this sentence
    
    Throws:
    
    java.lang.IllegalArgumentException - If the sentence is not a valid sentence (e.g., is a document or a word).
  - toProtoBuilder
```
protected CoreNLPProtos.Sentence.Builder toProtoBuilder(CoreMap sentence,
                                                        java.util.Set<java.lang.Class<?>> keysToSerialize)
```
    The method to extend by subclasses of the Protobuf Annotator if custom additions are added to Tokens. In contrast to toProto(edu.stanford.nlp.ling.CoreLabel), this function returns a builder that can be extended.
    
    Parameters:
    
    sentence - The sentence to save to a protocol buffer
    
    keysToSerialize - A set tracking which keys have been saved. It's important to remove any keys added to the proto from this set, as the code tracks annotations to ensure lossless serialization.
  - toProto
```
public CoreNLPProtos.Document toProto(Annotation doc)
```
    Create a Document proto from a CoreMap instance. This is not static, as it optionally throws an exception if the serialization is lossy.
    
    Parameters:
    
    doc - The Annotation to convert.
    
    Returns:
    
    A protocol buffer message corresponding to this document
  - toProtoBuilder
```
public CoreNLPProtos.Document.Builder toProtoBuilder(Annotation doc)
```
    Create a protobuf builder, rather than a compiled protobuf. Useful for, e.g., the simple CoreNLP interface.
    
    Parameters:
    
    doc - The document to serialize.
    
    Returns:
    
    A Document builder.
  - toProtoBuilder
```
protected CoreNLPProtos.Document.Builder toProtoBuilder(Annotation doc,
                                                        java.util.Set<java.lang.Class<?>> keysToSerialize)
```
    The method to extend by subclasses of the Protobuf Annotator if custom additions are added to Tokens. In contrast to toProto(edu.stanford.nlp.ling.CoreLabel), this function returns a builder that can be extended.
    
    Parameters:
    
    doc - The sentence to save to a protocol buffer
    
    keysToSerialize - A set tracking which keys have been saved. It's important to remove any keys added to the proto from this set, as the code tracks annotations to ensure lossless serializationA set tracking which keys have been saved. It's important to remove any keys added to the proto* from this set, as the code tracks annotations to ensure lossless serialization.
  - toProto
```
public static CoreNLPProtos.ParseTree toProto(Tree parseTree)
```
    Create a ParseTree proto from a Tree. If the Tree is a scored tree, the scores will be preserved.
    
    Parameters:
    
    parseTree - The parse tree to convert.
    
    Returns:
    
    A protocol buffer message corresponding to this tree.
  - toProto
```
public CoreNLPProtos.DependencyGraph toProto(SemanticGraph graph)
```
    Create a compact representation of the semantic graph for this dependency parse.
    
    Parameters:
    
    graph - The dependency graph to save.
    
    Returns:
    
    A protocol buffer message corresponding to this parse.
  - toProto
```
public CoreNLPProtos.DependencyGraph toProto(SemanticGraph graph,
                                             boolean storeTokens)
```
    Create a compact representation of the semantic graph for this dependency parse.
    
    Parameters:
    
    graph - The dependency graph to save.
    
    Returns:
    
    A protocol buffer message corresponding to this parse.
  - toProto
```
public CoreNLPProtos.CorefChain toProto(CorefChain chain)
```
    Create a CorefChain protocol buffer from the given coref chain.
    
    Parameters:
    
    chain - The coref chain to convert.
    
    Returns:
    
    A protocol buffer message corresponding to this chain.
  - toProtoSection
```
public CoreNLPProtos.Section toProtoSection(CoreMap section)
```
    Create a Section CoreMap protocol buffer from the given Section CoreMap
    
    Parameters:
    
    section - The CoreMap representing the section to serialize to a proto.
    
    Returns:
    
    The protocol buffer version of the section
  - createIndexedWordProtoFromIW
```
public CoreNLPProtos.IndexedWord createIndexedWordProtoFromIW(IndexedWord iw)
```
  - createIndexedWordProtoFromCL
```
public CoreNLPProtos.IndexedWord createIndexedWordProtoFromCL(CoreLabel cl)
```
  - toProto
```
public CoreNLPProtos.Mention toProto(Mention mention)
```
  - toProto
```
public CoreNLPProtos.SpeakerInfo toProto(SpeakerInfo speakerInfo)
```
  - toProto
```
public CoreNLPProtos.Timex toProto(Timex timex)
```
    Convert the given Timex object to a protocol buffer.
    
    Parameters:
    
    timex - The timex to convert.
    
    Returns:
    
    A protocol buffer corresponding to this Timex object.
  - toProto
```
public CoreNLPProtos.Entity toProto(EntityMention ent)
```
    Serialize the given entity mention to the corresponding protocol buffer.
    
    Parameters:
    
    ent - The entity mention to serialize.
    
    Returns:
    
    A protocol buffer corresponding to the serialized entity mention.
  - toProto
```
public CoreNLPProtos.Relation toProto(RelationMention rel)
```
    Serialize the given relation mention to the corresponding protocol buffer.
    
    Parameters:
    
    rel - The relation mention to serialize.
    
    Returns:
    
    A protocol buffer corresponding to the serialized relation mention.
  - toProto
```
public static CoreNLPProtos.Language toProto(Language lang)
```
    Serialize a CoreNLP Language to a Protobuf Language.
    
    Parameters:
    
    lang - The language to serialize.
    
    Returns:
    
    The language in a Protobuf enum.
  - toProto
```
public static CoreNLPProtos.Operator toProto(OperatorSpec op)
```
    Return a Protobuf operator from an OperatorSpec (Natural Logic).
  - toProto
```
public static CoreNLPProtos.Polarity toProto(Polarity pol)
```
    Return a Protobuf polarity from a CoreNLP Polarity (Natural Logic).
  - toProto
```
public static CoreNLPProtos.SentenceFragment toProto(SentenceFragment fragment)
```
    Return a Protobuf RelationTriple from a RelationTriple.
  - toProto
```
public CoreNLPProtos.RelationTriple toProto(RelationTriple triple)
```
    Return a Protobuf RelationTriple from a RelationTriple.
  - toMapStringStringProto
```
public static CoreNLPProtos.MapStringString toMapStringStringProto(java.util.Map<java.lang.String,java.lang.String> map)
```
    Serialize a Map (from Strings to Strings) to a proto.
    
    Parameters:
    
    map - The map to serialize.
    
    Returns:
    
    A proto representation of the map.
  - toMapIntStringProto
```
public static CoreNLPProtos.MapIntString toMapIntStringProto(java.util.Map<java.lang.Integer,java.lang.String> map)
```
    Serialize a Map (from Integers to Strings) to a proto.
    
    Parameters:
    
    map - The map to serialize.
    
    Returns:
    
    A proto representation of the map.
  - toProtoQuote
```
public CoreNLPProtos.Quote toProtoQuote(CoreMap quote)
```
    Convert a quote object to a protocol buffer.
  - toProtoMention
```
public CoreNLPProtos.NERMention toProtoMention(CoreMap mention)
```
    Convert a mention object to a protocol buffer.
  - fromProto
```
public CoreLabel fromProto(CoreNLPProtos.Token proto)
```
    Create a CoreLabel from its serialized counterpart. Note that this is, by itself, a lossy operation. Fields like the docid (sentence index, etc.) are only known from the enclosing document, and are not tracked in the protobuf.
    
    Parameters:
    
    proto - The serialized protobuf to read the CoreLabel from.
    
    Returns:
    
    A CoreLabel, missing the fields that are not stored in the CoreLabel protobuf.
  - fromProto
```
@Deprecated
public CoreMap fromProto(CoreNLPProtos.Sentence proto)
```
    Deprecated.
    
    Create a CoreMap representing a sentence from this protocol buffer. This should not be used if you are reading a whole document, as it populates the tokens independent of the document tokens, which is not the behavior an Annotation expects.
    
    Parameters:
    
    proto - The protocol buffer to read from.
    
    Returns:
    
    A CoreMap representing the sentence.
  - fromProtoNoTokens
```
protected CoreMap fromProtoNoTokens(CoreNLPProtos.Sentence proto)
```
    Create a CoreMap representing a sentence from this protocol buffer. Note that the sentence is very lossy -- most glaringly, the tokens are missing, awaiting a document to be filled in from.
    
    Parameters:
    
    proto - The serialized protobuf to read the sentence from.
    
    Returns:
    
    A CoreMap, representing a sentence as stored in the protocol buffer (and therefore missing some fields)
  - setSentenceTokenAnnotations
```
protected void setSentenceTokenAnnotations(CoreMap sentence,
                                           CoreNLPProtos.Sentence protoSentence,
                                           java.util.List<CoreLabel> sentenceTokens,
                                           java.lang.String docid)
```
    On a partially finished deserialized sentence, set some annotations which should reuse the same token objects as the parent sentence
  - loadSentenceMentions
```
protected void loadSentenceMentions(CoreNLPProtos.Sentence proto,
                                    CoreMap sentence)
```
  - fromProto
```
public Annotation fromProto(CoreNLPProtos.Document proto)
```
    Returns a complete document, intended to mimic a document passes as input to toProto(Annotation) as closely as possible. That is, most common fields are serialized, but there is not guarantee that custom additions will be saved and retrieved.
    
    Parameters:
    
    proto - The protocol buffer to read the document from.
    
    Returns:
    
    An Annotation corresponding to the read protobuf.
  - toFlattenedTree
```
public static void toFlattenedTree(Tree tree,
                                   CoreNLPProtos.FlattenedParseTree.Builder treeBuilder)
```
  - toFlattenedTree
```
public static CoreNLPProtos.FlattenedParseTree toFlattenedTree(Tree tree)
```
    Turn the given tree into a FlattedParseTree object from the proto
    The new structure is useful because the ParseTree object can't represent trees past a certain depth. Unfortunately, we can't just replace ParseTree with this as there are existing serializations with the old version
    This works by recursively calling the toFlattenedTree helper method. In fact, that recursion could be eliminated with a stack object, but presumably it won't be so deep that it kills the Java stack
  - fromProto
```
public static Tree fromProto(CoreNLPProtos.FlattenedParseTree proto)
```
    Retrieve a Tree object from a flattened tree protobuf.
    
    Parameters:
    
    proto - The serialized tree.
    
    Returns:
    
    A Tree object corresponding to the saved tree. This will always be a LabeledScoredTreeNode.
  - fromProto
```
public static Tree fromProto(CoreNLPProtos.ParseTree proto,
                             java.util.List<CoreLabel> tokens)
```
    Retrieve a Tree object and then attach the tokens passed in. Useful for keeping the tokens in the tree synchronized with the tokens in a sentence.
  - fromProto
```
public static Tree fromProto(CoreNLPProtos.ParseTree proto)
```
    Retrieve a Tree object from a saved protobuf. This is not intended to be used on its own, but it is safe (lossless) to do so and therefore it is left visible.
    
    Parameters:
    
    proto - The serialized tree.
    
    Returns:
    
    A Tree object corresponding to the saved tree. This will always be a LabeledScoredTreeNode.
  - fromProto
```
public static Language fromProto(CoreNLPProtos.Language lang)
```
    Return a CoreNLP language from a Protobuf language
  - fromProto
```
public static OperatorSpec fromProto(CoreNLPProtos.Operator operator)
```
    Return a CoreNLP Operator (Natural Logic operator) from a Protobuf operator
  - fromProto
```
public static Polarity fromProto(CoreNLPProtos.Polarity polarity)
```
    Return a CoreNLP Polarity (Natural Logic polarity) from a Protobuf operator
  - fromProto
```
public static SemanticGraph fromProto(CoreNLPProtos.DependencyGraph proto,
                                      java.util.List<CoreLabel> sentence,
                                      java.lang.String docid)
```
    Voodoo magic to convert a serialized dependency graph into a SemanticGraph.
    This method needs the words from the sentence, such as we have when converting an entire document in the fromProto(CoreNLPProtos.Document) method.
    
    Parameters:
    
    proto - The serialized representation of the graph. This relies heavily on indexing into the original document.
    
    sentence - The raw sentence that this graph was saved from must be provided, as it is not saved in the serialized representation.
    
    docid - A docid must be supplied, as it is not saved by the serialized representation.
    
    Returns:
    
    A semantic graph corresponding to the saved object, on the provided sentence.
  - fromProto
```
public static RelationTriple fromProto(CoreNLPProtos.RelationTriple proto,
                                       Annotation doc,
                                       int sentenceIndex)
```
    Return a RelationTriple object from the serialized representation. This requires a sentence and a document so that (1) we have a docid for the dependency tree can be accurately rebuilt, and (2) we have references to the tokens to include in the relation triple.
    
    Parameters:
    
    proto - The serialized relation triples.
    
    doc - The document we are deserializing. This document should already have a docid annotation set, if there is one.
    
    sentenceIndex - The index of the sentence this extraction should be attached to.
    
    Returns:
    
    A relation triple as a Java object, corresponding to the seriaized proto.
  - fromProto
```
public static SentenceFragment fromProto(CoreNLPProtos.SentenceFragment fragment,
                                         SemanticGraph tree)
```
    Returns a sentence fragment from a given protocol buffer, and an associated parse tree.
    
    Parameters:
    
    fragment - The saved sentence fragment.
    
    tree - The parse tree for the whole sentence.
    
    Returns:
    
    A SentenceFragment object corresponding to the saved proto.
  - fromProto
```
public static java.util.HashMap<java.lang.String,java.lang.String> fromProto(CoreNLPProtos.MapStringString proto)
```
    Convert a serialized Map back into a Java Map.
    
    Parameters:
    
    proto - The serialized map.
    
    Returns:
    
    A Java Map corresponding to the serialized map.
  - fromProto
```
public static java.util.HashMap<java.lang.Integer,java.lang.String> fromProto(CoreNLPProtos.MapIntString proto)
```
    Convert a serialized Map back into a Java Map.
    
    Parameters:
    
    proto - The serialized map.
    
    Returns:
    
    A Java Map corresponding to the serialized map.
  - recoverOriginalText
```
protected java.lang.String recoverOriginalText(java.util.List<CoreLabel> tokens,
                                               CoreNLPProtos.Sentence sentence)
```
    Recover the CoreAnnotations.TextAnnotation field of a sentence from the tokens. This is useful if the text was not set in the protocol buffer, and therefore needs to be reconstructed from tokens.
    
    Parameters:
    
    tokens - The list of tokens representing this sentence.
    
    Returns:
    
    The original text of the sentence.

Class ProtobufAnnotationSerializer

Nested Class Summary

Nested classes/interfaces inherited from class edu.stanford.nlp.pipeline.AnnotationSerializer

Field Summary

Constructor Summary

Method Summary

Methods inherited from class edu.stanford.nlp.pipeline.AnnotationSerializer

Methods inherited from class java.lang.Object

Field Detail

enforceLosslessSerialization

Constructor Detail

ProtobufAnnotationSerializer

ProtobufAnnotationSerializer

Method Detail

write

read

readUndelimited

toProto

toProtoBuilder

toProtoBuilder

toProto

toProtoBuilder

toProto

toProtoBuilder

toProtoBuilder

toProto

toProto

toProto

toProto

toProtoSection

createIndexedWordProtoFromIW

createIndexedWordProtoFromCL

toProto

toProto

toProto

toProto

toProto

toProto

toProto

toProto

toProto

toProto

toMapStringStringProto

toMapIntStringProto

toProtoQuote

toProtoMention

fromProto

fromProto

fromProtoNoTokens

setSentenceTokenAnnotations

loadSentenceMentions

fromProto

toFlattenedTree

toFlattenedTree

fromProto

fromProto

fromProto

fromProto

fromProto

fromProto

fromProto

fromProto

fromProto

fromProto

fromProto

recoverOriginalText