public class TokenSequencePattern extends SequencePattern<CoreMap>
CoreMap
).
Sequences over tokens can be matched like strings.
To use:
TokenSequencePattern p = TokenSequencePattern.compile("....");
TokenSequenceMatcher m = p.getMatcher(tokens);
while (m.find()) ....
Supports the following:
X Y
X | Y
X & Y
(X)
(with numeric group id)(?$var X)
(with group name "$var")(?:X)
m.group()
) or list of tokens (m.groupNodes()
).
m.group(id)
or m.groupNodes(id)
m.group("$var")
or m.groupNodes("$var")
SequenceMatchResult
for more accessor functions to retrieve matches.
X+, X?, X*, X{n,m}, X{n}, X{n,}
X+?, X??, X*?, X{n,m}?, X{n}?, X{n,}?
\captureid
[pattern] => [value]
.
Value for matched expression can be accessed using m.groupValue()
( one => 1 | two => 2 | three => 3 | ...)
Individual tokens are marked by "[" TOKEN_EXPR "]"
Possible TOKEN_EXPR
:
{ lemma:/.../; tag:"NNP" }
= attributes that need to all match.
If only one attribute, the {} can be dropped.
AnnotationLookup
for a list of predefined token attribute names.
/.../
used for regular expressions,
"..."
for exact string matches
{ word>=2 }
">=", "<=", ">", "<",
or "=="
{ word::IS_NUM } , { word::IS_NIL }
or
{ word::NOT_EXISTS }, { word::NOT_NIL }
or { word::EXISTS }
/.../
or "..."
!{...}
{...} & {...}
or {...} | {...}
Special tokens:
Any token: []
String pattern match across multiple tokens:
(?m){min,max} /pattern/
Special expressions: indicated by double braces: {{ expr }}
See Expressions
for syntax.
Binding of variables for use in compiling patterns:
Env env = TokenSequencePattern.getNewEnv()
to create a new environment for binding env.bind("numtype", CoreAnnotations.NumericTypeAnnotation.class);
// Bind string for later compilation using: compile("/it/ /was/ $RELDAY");
env.bind("$RELDAY", "/today|yesterday|tomorrow|tonight|tonite/");
// Bind pre-compiled patter for later compilation using: compile("/it/ /was/ $RELDAY");
env.bind("$RELDAY", TokenSequencePattern.compile(env, "/today|yesterday|tomorrow|tonight|tonite/"));
// Bind node pattern so we can do patterns like: compile("... temporal::IS_TIMEX_DATE ...");
// (TimexTypeMatchNodePattern is a NodePattern that implements some custom logic)
env.bind("::IS_TIMEX_DATE", new TimexTypeMatchNodePattern(SUTime.TimexType.DATE));
Actions (partially implemented)
pattern ==> action
&annotate( { ner="DATE" } )
pattern.getAction().apply(match, groupid)
TokenSequenceMatcher
,
Serialized FormSequencePattern.AndPatternExpr, SequencePattern.BackRefPatternExpr, SequencePattern.GroupPatternExpr, SequencePattern.MultiNodePatternExpr, SequencePattern.NodePatternExpr, SequencePattern.NodesMatchChecker<T>, SequencePattern.OrPatternExpr, SequencePattern.Parser<T>, SequencePattern.PatternExpr, SequencePattern.RepeatPatternExpr, SequencePattern.SequenceEndPatternExpr, SequencePattern.SequencePatternExpr, SequencePattern.SequenceStartPatternExpr, SequencePattern.SpecialNodePatternExpr, SequencePattern.ValuePatternExpr
Modifier and Type | Field and Description |
---|---|
static TokenSequencePattern |
ANY_NODE_PATTERN |
ANY_NODE_PATTERN_EXPR, MATCH_STATE, NODES_EQUAL_CHECKER, SEQ_BEGIN_PATTERN_EXPR, SEQ_END_PATTERN_EXPR
Constructor and Description |
---|
TokenSequencePattern(java.lang.String patternStr,
SequencePattern.PatternExpr nodeSequencePattern) |
TokenSequencePattern(java.lang.String patternStr,
SequencePattern.PatternExpr nodeSequencePattern,
SequenceMatchAction<CoreMap> action) |
Modifier and Type | Method and Description |
---|---|
static TokenSequencePattern |
compile(Env env,
java.lang.String... strings)
Compiles a sequence of regular expressions into a TokenSequencePattern
using the specified environment.
|
static TokenSequencePattern |
compile(Env env,
java.lang.String string)
Compiles a regular expression over tokens into a TokenSequencePattern
using the specified environment.
|
static TokenSequencePattern |
compile(SequencePattern.PatternExpr nodeSequencePattern)
Compiles a PatternExpr into a TokenSequencePattern.
|
static TokenSequencePattern |
compile(java.lang.String... strings)
Compiles a sequence of regular expressions into a TokenSequencePattern
using the default environment.
|
static TokenSequencePattern |
compile(java.lang.String string)
Compiles a regular expression over tokens into a TokenSequencePattern
using the default environment.
|
TokenSequenceMatcher |
getMatcher(java.util.List<? extends CoreMap> tokens)
Returns a TokenSequenceMatcher that can be used to match this pattern
against the specified list of tokens.
|
static MultiPatternMatcher<CoreMap> |
getMultiPatternMatcher(java.util.Collection<TokenSequencePattern> patterns)
Create a multi-pattern matcher for matching across multiple TokensRegex patterns.
|
static MultiPatternMatcher<CoreMap> |
getMultiPatternMatcher(java.lang.String... patterns)
Create a multi-pattern matcher for matching across multiple TokensRegex patterns from Strings.
|
static MultiPatternMatcher<CoreMap> |
getMultiPatternMatcher(TokenSequencePattern... patterns)
Create a multi-pattern matcher for matching across multiple TokensRegex patterns.
|
static Env |
getNewEnv() |
TokenSequenceMatcher |
matcher(java.util.List<? extends CoreMap> tokens)
Returns a TokenSequenceMatcher that can be used to match this pattern
against the specified list of tokens.
|
java.lang.String |
toString()
Returns a String representation of the TokenSequencePattern.
|
findNodePattern, findNodePatterns, getAction, getPatternExpr, getPriority, getTotalGroups, getWeight, pattern, setAction, setPriority, setWeight, transform
public static final TokenSequencePattern ANY_NODE_PATTERN
public TokenSequencePattern(java.lang.String patternStr, SequencePattern.PatternExpr nodeSequencePattern)
public TokenSequencePattern(java.lang.String patternStr, SequencePattern.PatternExpr nodeSequencePattern, SequenceMatchAction<CoreMap> action)
public static Env getNewEnv()
public static TokenSequencePattern compile(java.lang.String string)
string
- Regular expression to be compiledpublic static TokenSequencePattern compile(Env env, java.lang.String string)
env
- Environment to usestring
- Regular expression to be compiledpublic static TokenSequencePattern compile(java.lang.String... strings)
strings
- List of regular expression to be compiledpublic static TokenSequencePattern compile(Env env, java.lang.String... strings)
env
- Environment to usestrings
- List of regular expression to be compiledpublic static TokenSequencePattern compile(SequencePattern.PatternExpr nodeSequencePattern)
nodeSequencePattern
- A sequence pattern expression (before translation into a NFA)public TokenSequenceMatcher getMatcher(java.util.List<? extends CoreMap> tokens)
getMatcher
in class SequencePattern<CoreMap>
tokens
- List of tokens to match againstpublic TokenSequenceMatcher matcher(java.util.List<? extends CoreMap> tokens)
tokens
- List of tokens to match againstpublic java.lang.String toString()
toString
in class SequencePattern<CoreMap>
public static MultiPatternMatcher<CoreMap> getMultiPatternMatcher(java.util.Collection<TokenSequencePattern> patterns)
patterns
- Collection of input patternspublic static MultiPatternMatcher<CoreMap> getMultiPatternMatcher(TokenSequencePattern... patterns)
patterns
- Input patternspublic static MultiPatternMatcher<CoreMap> getMultiPatternMatcher(java.lang.String... patterns)
patterns
- Input patterns in String format