edu.stanford.nlp.ling.tokensregex (Stanford JavaNLP API)

Interface Summary
Interface	Description
Env.Binder	Interface for performing custom binding of values to the environment
MultiPatternMatcher.NodePatternTrigger<T>	A function which returns a collections of patterns that may match when given a single node from a larger sequence.
MultiPatternMatcher.SequencePatternTrigger<T>	A function which returns a collections of patterns that may match when a sequence of nodes.
NodePatternTransformer<T1,T2>	Interface to transform a node pattern from a `NodePattern<T1>` into a `NodePattern <T2>`.
PhraseTable.WordList
SequenceMatchAction<T>	Performs action on a sequence
SequenceMatcher.MatchReplacement<T>	Interface that specifies what to replace a matched pattern with.
SequenceMatchResult<T>	The result of a match against a sequence.
SequenceMatchRules.ExtractRule<I,O>	Interface for a rule that extracts a list of matched items from an input.
SequenceMatchRules.Rule	A sequence match rule.
SequencePattern.NodesMatchChecker<T>
SequencePattern.Parser<T>

Class Summary
Class	Description
BasicSequenceMatchResult<T>	Basic results for a Sequence Match
BasicSequenceMatchResult.MatchedGroup
ComplexNodePattern<M,K>	Pattern for matching a complex data structure
ComplexNodePattern.AbstractStringAnnotationPattern
ComplexNodePattern.AttributesEqualMatchChecker<K>
ComplexNodePattern.IntegerAnnotationPattern
ComplexNodePattern.NilAnnotationPattern
ComplexNodePattern.NotNilAnnotationPattern
ComplexNodePattern.NumericAnnotationPattern
ComplexNodePattern.SequenceRegexPattern<T>
ComplexNodePattern.StringAnnotationPattern
ComplexNodePattern.StringAnnotationRegexPattern
ComplexNodePattern.StringInSetAnnotationPattern
CoreMapExpressionExtractor<T extends MatchedExpression>	Represents a list of assignment and extraction rules over sequence patterns.
CoreMapExpressionExtractor.Stage<T>	Describes one stage of extraction.
CoreMapExpressionNodePattern	Pattern for matching a CoreMap using a generic expression
CoreMapNodePattern	Pattern for matching a CoreMap
CoreMapNodePattern.AttributesEqualMatchChecker<K>
CoreMapNodePatternTrigger	Trigger for CoreMap Node Patterns.
CoreMapSequenceMatchAction<T extends CoreMap>	Performs a action on a matched sequence
CoreMapSequenceMatchAction.AnnotateAction<T extends CoreMap>
CoreMapSequenceMatchAction.MergeAction
CoreMapSequenceMatcher<T extends CoreMap>	CoreMap Sequence Matcher for regular expressions for sequences over CoreMaps.
CoreMapSequenceMatcher.BasicCoreMapSequenceMatcher
Env	Holds environment variables to be used for compiling string into a pattern.
EnvLookup	Provides lookup functions using an Env
MapNodePattern<M extends java.util.Map<K,java.lang.Object>,K>	Pattern for matching a Map from keys K to objects
MatchedExpression	Matched Expression represents a chunk of text that was matched from an original segment of text.
MatchedExpression.SingleAnnotationExtractor	Function that takes a CoreMap, applies an extraction function to it, to get a value.
MultiCoreMapNodePattern	Pattern for matching across multiple core maps.
MultiCoreMapNodePattern.StringSequenceAnnotationPattern
MultiNodePattern<T>	Matches potentially multiple node (i.e does match across multiple tokens)
MultiNodePattern.IntersectMultiNodePattern<T>
MultiNodePattern.UnionMultiNodePattern<T>
MultiPatternMatcher<T>	Matcher that takes in multiple patterns.
MultiPatternMatcher.BasicSequencePatternTrigger<T>	Simple SequencePatternTrigger that looks at each node, and identifies which patterns may potentially match each node, and then aggregates (union) all these patterns together.
MultiWordStringMatcher	Finds multi word strings in a piece of text
MultiWordStringMatcher.LongestStringComparator
NodePattern<T>	Matches a Node (i.e a Token).
NodePattern.AnyNodePattern<T>	Matches any node
NodePattern.ConjNodePattern<T>	Given a list of patterns p1,...,pn, matches if all patterns p1,...,pn matches
NodePattern.DisjNodePattern<T>	Given a list of patterns p1,...,pn, matches if one of the patterns p1,...,pn matches
NodePattern.EqualsNodePattern<T>	Matches a constant value of type T using equals()
NodePattern.NegateNodePattern<T>	Given a node pattern p, a node x matches if p does not match x
PhraseTable	Table used to lookup multi-word phrases.
PhraseTable.Phrase	A phrase is a multiword expression
PhraseTable.PhraseMatch	Represents a matched phrase
PhraseTable.PhraseStringCollection
PhraseTable.StringList
PhraseTable.TokenList
ProcessTokensRegexRequest	This class contains static methods for processing tokensregex requests on a Document.
SequenceMatchAction.BoundAction<T>
SequenceMatchAction.BranchAction<T>
SequenceMatchAction.NextMatchAction<T>
SequenceMatchAction.SeriesAction<T>
SequenceMatchAction.StartMatchAction<T>
SequenceMatcher<T>	A generic sequence matcher.
SequenceMatcher.BasicMatchReplacement<T>	Replacement item is a sequence of items.
SequenceMatcher.GroupMatchReplacement<T>	Replacement item is a matched group specified with a group id.
SequenceMatcher.NamedGroupMatchReplacement<T>	Replacement item is a matched group specified with a group name.
SequenceMatchResult.GroupToIntervalFunc<MR extends java.util.regex.MatchResult>
SequenceMatchResult.MatchedGroupInfo<T>	Information about a matched group.
SequenceMatchRules	Rules for matching sequences using regular expressions.
SequenceMatchRules.AnnotationExtractRule<S,T extends MatchedExpression>	Rule that specifies how to extract sequence of MatchedExpression from an annotation (CoreMap).
SequenceMatchRules.AnnotationExtractRuleCreator
SequenceMatchRules.AnnotationMatchedFilter
SequenceMatchRules.AssignmentRule	Rule that specifies what value to assign to a variable.
SequenceMatchRules.BasicSequenceExtractRule	Extraction rule.
SequenceMatchRules.CompositeExtractRuleCreator
SequenceMatchRules.CoreMapExtractRule<T,O>	Extraction rule to apply a extraction rule on a particular CoreMap field.
SequenceMatchRules.CoreMapFunctionApplier<T,O>
SequenceMatchRules.CoreMapToListExtractRule<O>	Extraction rule that treats a single CoreMap as a list/sequence of CoreMaps.
SequenceMatchRules.CoreMapToListFunctionApplier<O>
SequenceMatchRules.FilterExtractRule<I,O>	Extraction rule that filters the input before passing it on to the next extractor.
SequenceMatchRules.ListExtractRule<I,O>	Extraction rule that applies a list of rules in sequence and aggregates all matches found.
SequenceMatchRules.MultiSequencePatternExtractRule<T,O>
SequenceMatchRules.MultiTokenPatternExtractRuleCreator
SequenceMatchRules.SequenceMatchedExpressionExtractor
SequenceMatchRules.SequenceMatchResultExtractor<T>
SequenceMatchRules.SequencePatternExtractRule<T,O>
SequenceMatchRules.StringMatchedExpressionExtractor
SequenceMatchRules.StringMatchResultExtractor
SequenceMatchRules.StringPatternExtractRule<O>
SequenceMatchRules.TextPatternExtractRuleCreator
SequenceMatchRules.TokenPatternExtractRuleCreator
SequencePattern<T>	Generic Sequence Pattern for regular expressions.
SequencePattern.AndPatternExpr
SequencePattern.BackRefPatternExpr
SequencePattern.GroupPatternExpr	Expression that represents a group.
SequencePattern.MultiNodePatternExpr	Represents a pattern that can match multiple nodes.
SequencePattern.NodePatternExpr	Represents one element to be matched.
SequencePattern.OrPatternExpr	Expression that represents a disjunction.
SequencePattern.PatternExpr	Represents a sequence pattern expressions (before translating into NFA).
SequencePattern.RepeatPatternExpr	Expression that represents a pattern that repeats for a number of times.
SequencePattern.SequenceEndPatternExpr
SequencePattern.SequencePatternExpr	Represents a sequence of patterns to be matched.
SequencePattern.SequenceStartPatternExpr
SequencePattern.SpecialNodePatternExpr	Represents one element to be matched.
SequencePattern.ValuePatternExpr
TokenSequenceMatcher	Token Sequence Matcher for regular expressions over sequences of tokens.
TokenSequencePattern	Token Sequence Pattern for regular expressions over sequences of tokens (each represented as a `CoreMap`).

Enum Summary
Enum	Description
MultiWordStringMatcher.MatchType	if `matchType` is `EXCT`: match exact string if `matchType` is `EXCTWS`: match exact string, except whitespace can match multiple whitespaces if `matchType` is `LWS`: match case insensitive string, except whitespace can match multiple whitespaces if `matchType` is `LNRM`: disregards punctuation, does case insensitive match if `matchType` is `REGEX`: interprets string as regex already
SequenceMatcher.FindType	Type of search to perform FIND_NONOVERLAPPING - Find nonoverlapping matches (default) FIND_ALL - Find all potential matches Greedy/reluctant quantifiers are not enforced (perhaps should add syntax where some of them are enforced...)

Package edu.stanford.nlp.ling.tokensregex Description

This package contains a library, TokensRegex, for matching regular expressions over tokens. TokensRegex is incorporated into the TokensRegexAnnotator, the TokensRegexNERAnnotator, and the SUTime functionality in NERCombinerAnnotator.

Rules for extracting expression using TokensRegex

TokensRegex provides a language for specifying rules to extract expressions over a token sequence.

CoreMapExpressionExtractor and SequenceMatchRules describes the language and how the extraction rules are created.

Core classes for token sequence matching using TokensRegex

At the core of TokensRegex are the TokenSequenceMatcher and TokenSequencePattern classes which can be used to match patterns over a sequences of tokens. The usage is designed to follow the paradigm of the Java regular expression library java.util.regex. The usage is similar except that matches are done over List<CoreMap> instead of over String.

Example:

  List<CoreLabel> tokens = ...;
 TokenSequencePattern pattern = TokenSequencePattern.compile(...);
 TokenSequenceMatcher matcher = pattern.getMatcher(tokens);

The classes SequenceMatcher and SequencePattern can be used to build classes for recognizing regular expressions over sequences of arbitrary types.

Utility classes

TokensRegex also offers a group of utility classes.

MultiPatternMatcher provides utility functions for finding expressions with multiple patterns. For instance, using MultiPatternMatcher.findNonOverlapping(java.util.List<? extends T>) you can find all nonoverlapping subsequences for a given set of patterns.

To find character offsets of multiple word expressions in a String, you can also use MultiWordStringMatcher.findTargetStringOffsets(java.lang.String, java.lang.String).

Author:: Angel Chang (angelx@stanford.edu)