About | Download | Usage | Questions | Mailing lists | Release history
TokensRegexAnnotator is an customizable annotator for the StanfordCoreNLP pipeline. It is part of the TokensRegex, a framework for defining patterns over text and mapping to semantic objects represented as Java objects.
By using the TokensRegexAnnotator, you can customize annotations based on regular expressions over sequences of tokens. It uses TokensRegex rules to define what patterns to match and what to annotate.
If you only want to use TokensRegex to recognize named entities using regular expression, then you should use the TokensRegexNERAnnotator instead.
customAnnotatorClass.[name]=edu.stanford.nlp.pipeline.TokensRegexAnnotator [name].rules = [path to rules file]Example:
customAnnotatorClass.color=edu.stanford.nlp.pipeline.TokensRegexAnnotator color.rules = color.rules.txt
java -cp stanford-corenlp-2012-05-22.jar:stanford-corenlp-2012-05-22-models.jar:xom.jar:joda-time.jar -Xmx3g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,color -properties color.properties -file color.input.txt