public class WordsToSentencesAnnotator extends Object implements Annotator
List<? extends CoreLabel>
under the TokensAnnotation
field, and runs it
through WordToSentenceProcessor
and puts the new List<List<? extends CoreLabel>>
under the SentencesAnnotation
field.Annotator.Requirement
BINARIZED_TREES_REQUIREMENT, CLEAN_XML_REQUIREMENT, COLUMN_DATA_CLASSIFIER, DETERMINISTIC_COREF_REQUIREMENT, GENDER_REQUIREMENT, GUTIME_REQUIREMENT, HEIDELTIME_REQUIREMENT, LEMMA_REQUIREMENT, NER_REQUIREMENT, NUMBER_REQUIREMENT, PARSE_AND_TAG, PARSE_REQUIREMENT, PARSE_TAG_BINARIZED_TREES, POS_REQUIREMENT, QUANTIFIABLE_ENTITY_NORMALIZATION_REQUIREMENT, RELATION_EXTRACTOR_REQUIREMENT, SSPLIT_REQUIREMENT, STANFORD_CLEAN_XML, STANFORD_COLUMN_DATA_CLASSIFIER, STANFORD_DEPENDENCIES, STANFORD_DETERMINISTIC_COREF, STANFORD_GENDER, STANFORD_LEMMA, STANFORD_NER, STANFORD_PARSE, STANFORD_POS, STANFORD_REGEXNER, STANFORD_RELATION, STANFORD_SENTIMENT, STANFORD_SSPLIT, STANFORD_TOKENIZE, STANFORD_TRUECASE, STEM_REQUIREMENT, SUTIME_REQUIREMENT, TIME_WORDS_REQUIREMENT, TOKENIZE_AND_SSPLIT, TOKENIZE_REQUIREMENT, TOKENIZE_SSPLIT_NER, TOKENIZE_SSPLIT_PARSE, TOKENIZE_SSPLIT_PARSE_NER, TOKENIZE_SSPLIT_POS, TOKENIZE_SSPLIT_POS_LEMMA, TRUECASE_REQUIREMENT
Constructor and Description |
---|
WordsToSentencesAnnotator() |
WordsToSentencesAnnotator(boolean verbose) |
WordsToSentencesAnnotator(boolean verbose,
String boundaryTokenRegex,
Set<String> boundaryToDiscard,
Set<String> htmlElementsToDiscard,
String newlineIsSentenceBreak) |
WordsToSentencesAnnotator(boolean verbose,
String boundaryTokenRegex,
Set<String> boundaryToDiscard,
Set<String> htmlElementsToDiscard,
String newlineIsSentenceBreak,
String boundaryMultiTokenRegex,
Set<String> tokenRegexesToDiscard) |
Modifier and Type | Method and Description |
---|---|
void |
annotate(Annotation annotation)
If setCountLineNumbers is set to true, we count line numbers by
telling the underlying splitter to return empty lists of tokens
and then treating those empty lists as empty lines.
|
static WordsToSentencesAnnotator |
newlineSplitter(boolean verbose,
String... nlToken)
Return a WordsToSentencesAnnotator that splits on newlines (only), which are then deleted.
|
static WordsToSentencesAnnotator |
nonSplitter(boolean verbose)
Return a WordsToSentencesAnnotator that never splits the token stream.
|
Set<Annotator.Requirement> |
requirementsSatisfied()
Returns a set of requirements for which tasks this annotator can
provide.
|
Set<Annotator.Requirement> |
requires()
Returns the set of tasks which this annotator requires in order
to perform.
|
public WordsToSentencesAnnotator()
public WordsToSentencesAnnotator(boolean verbose)
public WordsToSentencesAnnotator(boolean verbose, String boundaryTokenRegex, Set<String> boundaryToDiscard, Set<String> htmlElementsToDiscard, String newlineIsSentenceBreak)
public static WordsToSentencesAnnotator newlineSplitter(boolean verbose, String... nlToken)
verbose
- Whether it is verbose.nlToken
- Zero or more new line tokens, which might be a \n or the fake
newline tokens returned from the tokenizer.public static WordsToSentencesAnnotator nonSplitter(boolean verbose)
verbose
- Whether it is verbose.public void annotate(Annotation annotation)
public Set<Annotator.Requirement> requires()
Annotator
public Set<Annotator.Requirement> requirementsSatisfied()
Annotator
requirementsSatisfied
in interface Annotator