public class SequenceMatchRules
extends java.lang.Object
There are 2 types of rules:
#
or //
can be used to indicates one-line comments.
Assignment Rules are used to assign values to variables.
The basic format is: variable = value
.
Variable Names:
$
Value Types:
Type | Format | Example | Description |
---|---|---|---|
BOOLEAN | TRUE | FALSE | TRUE | |
STRING | "..." | "red" | |
INTEGER | [+-]\d+ | 1500 | |
LONG | [+-]\d+L | 1500000000000L | |
DOUBLE | [+-]\d*\.\d+ | 6.98 | |
REGEX | /.../ | /[Aa]pril/ |
String regular expression Pattern |
TOKENS_REGEX | ( [...] [...] ... ) | ( /up/ /to/ /4/ /months/ ) |
Tokens regular expression TokenSequencePattern |
LIST | ( [item1] , [item2], ... ) | ("red", "blue", "yellow" ) |
Some typical uses and examples for assignment rules include:
Class
).
tokens = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$TokensAnnotation" }
$SEASON = "/spring|summer|fall|autumn|winter/" $NUM = ( [ { numcomptype:NUMBER } ] )
Env
), which can be accessed using the variable ENV
.
Members of the Environment can be set as needed.
# Set default parameters to be used when reading rules
ENV.defaults["ruleType"] = "tokens"
# Set default string pattern flags (to case-insensitive)
ENV.defaultStringPatternFlags = 2
# Specifies that the result should go into the tokens
key (as defined above).
ENV.defaultResultAnnotationKey = tokens
Predefined values are:
Variable | Type | Description |
---|---|---|
ENV | Env | The environment with respect to which the rules are applied. |
TRUE | BOOLEAN | The Boolean value true . |
FALSE | BOOLEAN | The Boolean value false . |
NIL | The null value. | |
tags | Class | The annotation key Tags.TagsAnnotation . |
Extraction Rules specifies how regular expression patterns are to be matched against text.
See CoreMapExpressionExtractor
for more information on the types of the rules, and in what sequence the rules are applied.
A basic rule can be specified using the following template:
{ # Type of the rule ruleType: "tokens" | "text" | "composite" | "filter", # Pattern to match against pattern: ( <TokenSequencePattern> ) | /<TextPattern>/, # Resulting value to go into the resulting annotation result: ... # More fields following... }Example:
{ ruleType: "tokens", pattern: ( /one/ ), result: 1 }
Extraction rule fields (most fields are optional):
Field | Values | Example | Description |
---|---|---|---|
ruleType | "tokens" | "text" | "composite" | "filter" |
tokens | Type of the rule (required). |
pattern | <Token Sequence Pattern> = (...) | <Text Pattern> = /.../ |
( /winter/ /of/ $YEAR ) | Pattern to match against.
See TokenSequencePattern and Pattern for
how to specify patterns over tokens and strings (required). |
action | <Action List> = (...) |
( Annotate($0, ner, "DATE") ) | List of actions to apply when the pattern is triggered.
Each action is a TokensRegex Expression |
result | <Expression> |
Resulting value to go into the resulting annotation. See Expressions for how to specify the result. | |
name | STRING |
Name to identify the extraction rule. | |
stage | INTEGER |
Stage at which the rule is to be applied. Rules are grouped in stages, which are applied from lowest to highest. | |
active | Boolean |
Whether this rule is enabled (active) or not (default true). | |
priority | DOUBLE |
Priority of rule. Within a stage, matches from higher priority rules are preferred. | |
weight | DOUBLE |
Weight of rule (not currently used). | |
over | CLASS |
Annotation field to check pattern against. | |
matchFindType | FIND_NONOVERLAPPING | FIND_ALL |
Whether to find all matched expression or just the nonoverlapping ones (default FIND_NONOVERLAPPING ). | |
matchWithResults | Boolean |
Whether results of the matches should be returned (default false). Set to true to access captured groups of embedded regular expressions. | |
matchedExpressionGroup | Integer |
2 | What group should be treated as the matched expression group (default 0). |
CoreMapExpressionExtractor
,
TokenSequencePattern
Modifier and Type | Field and Description |
---|---|
static SequenceMatchRules.CompositeExtractRuleCreator |
COMPOSITE_EXTRACT_RULE_CREATOR |
static java.lang.String |
COMPOSITE_RULE_TYPE |
static SequenceMatchRules.AnnotationExtractRuleCreator |
DEFAULT_EXTRACT_RULE_CREATOR |
static java.lang.String |
FILTER_RULE_TYPE |
static SequenceMatchRules.MultiTokenPatternExtractRuleCreator |
MULTI_TOKEN_PATTERN_EXTRACT_RULE_CREATOR |
static SequenceMatchRules.TextPatternExtractRuleCreator |
TEXT_PATTERN_EXTRACT_RULE_CREATOR |
static java.lang.String |
TEXT_PATTERN_RULE_TYPE |
static SequenceMatchRules.TokenPatternExtractRuleCreator |
TOKEN_PATTERN_EXTRACT_RULE_CREATOR |
static java.lang.String |
TOKEN_PATTERN_RULE_TYPE |
public static final java.lang.String COMPOSITE_RULE_TYPE
public static final java.lang.String TOKEN_PATTERN_RULE_TYPE
public static final java.lang.String TEXT_PATTERN_RULE_TYPE
public static final java.lang.String FILTER_RULE_TYPE
public static final SequenceMatchRules.TokenPatternExtractRuleCreator TOKEN_PATTERN_EXTRACT_RULE_CREATOR
public static final SequenceMatchRules.CompositeExtractRuleCreator COMPOSITE_EXTRACT_RULE_CREATOR
public static final SequenceMatchRules.TextPatternExtractRuleCreator TEXT_PATTERN_EXTRACT_RULE_CREATOR
public static final SequenceMatchRules.MultiTokenPatternExtractRuleCreator MULTI_TOKEN_PATTERN_EXTRACT_RULE_CREATOR
public static final SequenceMatchRules.AnnotationExtractRuleCreator DEFAULT_EXTRACT_RULE_CREATOR
public static SequenceMatchRules.AssignmentRule createAssignmentRule(Env env, AssignableExpression var, Expression result)
public static SequenceMatchRules.Rule createRule(Env env, Expressions.CompositeValue cv)
protected static SequenceMatchRules.AnnotationExtractRule createExtractionRule(Env env, java.util.Map<java.lang.String,java.lang.Object> attributes)
public static SequenceMatchRules.AnnotationExtractRule createExtractionRule(Env env, java.lang.String ruleType, java.lang.Object pattern, Expression result)
public static SequenceMatchRules.AnnotationExtractRule createTokenPatternRule(Env env, SequencePattern.PatternExpr expr, Expression result)
public static SequenceMatchRules.AnnotationExtractRule createTextPatternRule(Env env, java.lang.String expr, Expression result)
public static SequenceMatchRules.AnnotationExtractRule createMultiTokenPatternRule(Env env, SequenceMatchRules.AnnotationExtractRule template, java.util.List<TokenSequencePattern> patterns)
public static MatchedExpression.SingleAnnotationExtractor createAnnotationExtractor(Env env, SequenceMatchRules.AnnotationExtractRule r)