public class StanfordCoreNLP extends AnnotationPipeline
This class is designed to apply multiple Annotators
to an Annotation. The idea is that you first
build up the pipeline by adding Annotators, and then
you take the objects you wish to annotate and pass
them in and get in return a fully annotated object.
At the command-line level you can, e.g., tokenize text with StanfordCoreNLP with a command like:
java edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit -file document.txt
The main entry point for the API is StanfordCoreNLP.process() .
Implementation note: There are other annotation pipelines, but they don't extend this one. Look for classes that implement Annotator and which have "Pipeline" in their name.
Annotator.Requirement
Modifier and Type | Field and Description |
---|---|
static String |
CUSTOM_ANNOTATOR_PREFIX |
static String |
DEFAULT_NEWLINE_IS_SENTENCE_BREAK |
static String |
DEFAULT_OUTPUT_FORMAT |
static String |
NEWLINE_IS_SENTENCE_BREAK_PROPERTY |
static String |
NEWLINE_SPLITTER_PROPERTY |
protected static AnnotatorPool |
pool
Maintains the shared pool of annotators
|
TIME
BINARIZED_TREES_REQUIREMENT, CLEAN_XML_REQUIREMENT, COLUMN_DATA_CLASSIFIER, DETERMINISTIC_COREF_REQUIREMENT, GENDER_REQUIREMENT, GUTIME_REQUIREMENT, HEIDELTIME_REQUIREMENT, LEMMA_REQUIREMENT, NER_REQUIREMENT, NUMBER_REQUIREMENT, PARSE_AND_TAG, PARSE_REQUIREMENT, PARSE_TAG_BINARIZED_TREES, POS_REQUIREMENT, QUANTIFIABLE_ENTITY_NORMALIZATION_REQUIREMENT, RELATION_EXTRACTOR_REQUIREMENT, SSPLIT_REQUIREMENT, STANFORD_CLEAN_XML, STANFORD_COLUMN_DATA_CLASSIFIER, STANFORD_DEPENDENCIES, STANFORD_DETERMINISTIC_COREF, STANFORD_GENDER, STANFORD_LEMMA, STANFORD_NER, STANFORD_PARSE, STANFORD_POS, STANFORD_REGEXNER, STANFORD_RELATION, STANFORD_SENTIMENT, STANFORD_SSPLIT, STANFORD_TOKENIZE, STANFORD_TRUECASE, STEM_REQUIREMENT, SUTIME_REQUIREMENT, TIME_WORDS_REQUIREMENT, TOKENIZE_AND_SSPLIT, TOKENIZE_REQUIREMENT, TOKENIZE_SSPLIT_NER, TOKENIZE_SSPLIT_PARSE, TOKENIZE_SSPLIT_PARSE_NER, TOKENIZE_SSPLIT_POS, TOKENIZE_SSPLIT_POS_LEMMA, TRUECASE_REQUIREMENT
Constructor and Description |
---|
StanfordCoreNLP()
Constructs a pipeline using as properties the properties file found in the classpath
|
StanfordCoreNLP(Properties props)
Construct a basic pipeline.
|
StanfordCoreNLP(Properties props,
boolean enforceRequirements) |
StanfordCoreNLP(String propsFileNamePrefix)
Constructs a pipeline with the properties read from this file, which must be found in the classpath
|
StanfordCoreNLP(String propsFileNamePrefix,
boolean enforceRequirements) |
Modifier and Type | Method and Description |
---|---|
void |
annotate(Annotation annotation)
Run the pipeline on an input annotation.
|
static void |
clearAnnotatorPool()
Call this if you are no longer using StanfordCoreNLP and want to
release the memory associated with the annotators.
|
void |
conllPrint(Annotation annotation,
Writer w)
Displays the output of many annotators in CoNLL format.
|
protected AnnotatorImplementations |
getAnnotatorImplementations()
Get the implementation of each relevant annotator in the pipeline.
|
double |
getBeamPrintingOption() |
TreePrint |
getConstituentTreePrinter() |
protected AnnotatorPool |
getDefaultAnnotatorPool(Properties inputProps,
AnnotatorImplementations annotatorImplementation)
Construct the default annotator pool from the passed properties, and overwriting annotations which have changed
since the last
|
TreePrint |
getDependencyTreePrinter() |
String |
getEncoding() |
static Annotator |
getExistingAnnotator(String name) |
boolean |
getPrintSingletons() |
Properties |
getProperties()
Fetches the Properties object used to construct this Annotator
|
static boolean |
isXMLOutputPresent() |
void |
jsonPrint(Annotation annotation,
Writer w)
Displays the output of all annotators in JSON format.
|
static void |
main(String[] args)
This can be used just for testing or for command-line text processing.
|
void |
prettyPrint(Annotation annotation,
OutputStream os)
Displays the output of all annotators in a format easily readable by people.
|
void |
prettyPrint(Annotation annotation,
PrintWriter os)
Displays the output of all annotators in a format easily readable by people.
|
protected static void |
printHelp(PrintStream os,
String helpTopic)
Prints the list of properties required to run the pipeline
|
Annotation |
process(String text)
Runs the entire pipeline on the content of the given text passed in.
|
void |
processFiles(Collection<File> files) |
void |
processFiles(Collection<File> files,
int numThreads) |
void |
processFiles(String base,
Collection<File> files,
int numThreads) |
void |
run() |
String |
timingInformation()
Return a String that gives detailed human-readable information about
how much time was spent by each annotator and by the entire annotation
pipeline.
|
static boolean |
usesBinaryTrees(Properties props)
Determines whether the parser annotator should default to
producing binary trees.
|
void |
xmlPrint(Annotation annotation,
OutputStream os)
Displays the output of all annotators in XML format.
|
void |
xmlPrint(Annotation annotation,
Writer w)
Wrapper around xmlPrint(Annotation, OutputStream).
|
addAnnotator, annotate, annotate, annotate, annotate, getTotalTime, requirementsSatisfied, requires
public static final String CUSTOM_ANNOTATOR_PREFIX
public static final String NEWLINE_SPLITTER_PROPERTY
public static final String NEWLINE_IS_SENTENCE_BREAK_PROPERTY
public static final String DEFAULT_NEWLINE_IS_SENTENCE_BREAK
public static final String DEFAULT_OUTPUT_FORMAT
protected static AnnotatorPool pool
public StanfordCoreNLP()
public StanfordCoreNLP(Properties props)
public StanfordCoreNLP(Properties props, boolean enforceRequirements)
public StanfordCoreNLP(String propsFileNamePrefix)
propsFileNamePrefix
- public StanfordCoreNLP(String propsFileNamePrefix, boolean enforceRequirements)
protected AnnotatorImplementations getAnnotatorImplementations()
Get the implementation of each relevant annotator in the pipeline. The primary use of this method is to be overwritten by subclasses of StanfordCoreNLP to call different annotators that obey the exact same contract as the default annotator.
The canonical use case for this is as an implementation of the Curator server, where the annotators make server calls rather than calling each annotator locally.
AnnotatorImplementations
.public Properties getProperties()
public TreePrint getConstituentTreePrinter()
public TreePrint getDependencyTreePrinter()
public double getBeamPrintingOption()
public String getEncoding()
public boolean getPrintSingletons()
public static boolean isXMLOutputPresent()
public static void clearAnnotatorPool()
protected AnnotatorPool getDefaultAnnotatorPool(Properties inputProps, AnnotatorImplementations annotatorImplementation)
inputProps
- annotatorImplementation
- public void annotate(Annotation annotation)
AnnotationPipeline
annotate
in interface Annotator
annotate
in class AnnotationPipeline
annotation
- The input annotation, usually a raw documentpublic static boolean usesBinaryTrees(Properties props)
public Annotation process(String text)
text
- The text to processpublic void prettyPrint(Annotation annotation, OutputStream os)
annotation
- Contains the output of all annotatorsos
- The output streampublic void prettyPrint(Annotation annotation, PrintWriter os)
annotation
- Contains the output of all annotatorsos
- The output streampublic void xmlPrint(Annotation annotation, Writer w) throws IOException
annotation
- w
- The Writer to send the output toIOException
public void jsonPrint(Annotation annotation, Writer w) throws IOException
annotation
- Contains the output of all annotatorsw
- The Writer to send the output toIOException
public void conllPrint(Annotation annotation, Writer w) throws IOException
annotation
- Contains the output of all annotatorsw
- The Writer to send the output toIOException
public void xmlPrint(Annotation annotation, OutputStream os) throws IOException
annotation
- Contains the output of all annotatorsos
- The output streamIOException
protected static void printHelp(PrintStream os, String helpTopic)
os
- PrintStream to print usage tohelpTopic
- a topic to print help about (or null for general options)public String timingInformation()
println()
.timingInformation
in class AnnotationPipeline
public void processFiles(String base, Collection<File> files, int numThreads) throws IOException
IOException
public void processFiles(Collection<File> files, int numThreads) throws IOException
IOException
public void processFiles(Collection<File> files) throws IOException
IOException
public void run() throws IOException
IOException
public static void main(String[] args) throws IOException, ClassNotFoundException
Example usage:
java -mx6g edu.stanford.nlp.pipeline.StanfordCoreNLP properties
args
- List of required propertiesIOException
- If IO problemClassNotFoundException
- If class loading problem