SequenceMatchRules (Stanford JavaNLP API)

java.lang.Object
- edu.stanford.nlp.ling.tokensregex.SequenceMatchRules

public class SequenceMatchRules
extends java.lang.Object

Rules for matching sequences using regular expressions.

There are 2 types of rules:

Assignment rules which assign a value to a variable for later use.
Extraction rules which specifies how regular expression patterns are to be matched against text, which matched text expressions are to extracted, and what value to assign to the matched expression.

NOTE: # or // can be used to indicates one-line comments.

Assignment Rules are used to assign values to variables. The basic format is: variable = value.

Variable Names:

Variable names should follow the pattern [A-Za-z_][A-Za-z0-9_]*
Variable names for use in regular expressions (to be expanded later) must start with $

Value Types:

Value Types
Type	Format	Example	Description
`BOOLEAN`	`TRUE \| FALSE`	`TRUE`
`STRING`	`"..."`	`"red"`
`INTEGER`	`[+-]\d+`	`1500`
`LONG`	`[+-]\d+L`	`1500000000000L`
`DOUBLE`	`[+-]\d*\.\d+`	`6.98`
`REGEX`	`/.../`	`/[Aa]pril/`	String regular expression `Pattern`
`TOKENS_REGEX`	`( [...] [...] ... )`	`( /up/ /to/ /4/ /months/ )`	Tokens regular expression `TokenSequencePattern`
`LIST`	`( [item1] , [item2], ... )`	`("red", "blue", "yellow" )`

Some typical uses and examples for assignment rules include:

Assignment of value to variables for use in later rules

Binding of text key to annotation key (as Class).

      tokens = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$TokensAnnotation" }

Defining regular expressions macros to be embedded in other regular expressions

      $SEASON = "/spring|summer|fall|autumn|winter/"
      $NUM = ( [ { numcomptype:NUMBER } ] )

Setting default environment variables. Rules are applied with respect to an environment (Env), which can be accessed using the variable ENV. Members of the Environment can be set as needed.

      # Set default parameters to be used when reading rules
      ENV.defaults["ruleType"] = "tokens"
      # Set default string pattern flags (to case-insensitive)
      ENV.defaultStringPatternFlags = 2
      # Specifies that the result should go into the  tokens  key (as defined above).
      ENV.defaultResultAnnotationKey = tokens

Defining options

Predefined values are:

Predefined values
Variable	Type	Description
`ENV`	`Env`	The environment with respect to which the rules are applied.
`TRUE`	`BOOLEAN`	The `Boolean` value `true`.
`FALSE`	`BOOLEAN`	The `Boolean` value `false`.
`NIL`		The `null` value.
`tags`	`Class`	The annotation key `Tags.TagsAnnotation`.

Extraction Rules specifies how regular expression patterns are to be matched against text. See CoreMapExpressionExtractor for more information on the types of the rules, and in what sequence the rules are applied. A basic rule can be specified using the following template:

   {
     # Type of the rule
     ruleType: "tokens" | "text" | "composite" | "filter",
     # Pattern to match against
     pattern: ( <TokenSequencePattern> ) | /<TextPattern>/,
     # Resulting value to go into the resulting annotation
     result: ...

     # More fields following...
   }

Example:

   {
     ruleType: "tokens",
     pattern: ( /one/ ),
     result: 1
   }

Extraction rule fields (most fields are optional):

Extraction rule fields
Field	Values	Example	Description
`ruleType`	`"tokens" \| "text" \| "composite" \| "filter"`	`tokens`	Type of the rule (required).
`pattern`	`<Token Sequence Pattern> = (...) \| <Text Pattern> = /.../`	`( /winter/ /of/ $YEAR )`	Pattern to match against. See `TokenSequencePattern` and `Pattern` for how to specify patterns over tokens and strings (required).
`action`	`<Action List> = (...)`	`( Annotate($0, ner, "DATE") )`	List of actions to apply when the pattern is triggered. Each action is a `TokensRegex Expression`
`result`	`<Expression>`		Resulting value to go into the resulting annotation. See `Expressions` for how to specify the result.
`name`	`STRING`		Name to identify the extraction rule.
`stage`	`INTEGER`		Stage at which the rule is to be applied. Rules are grouped in stages, which are applied from lowest to highest.
`active`	`Boolean`		Whether this rule is enabled (active) or not (default true).
`priority`	`DOUBLE`		Priority of rule. Within a stage, matches from higher priority rules are preferred.
`weight`	`DOUBLE`		Weight of rule (not currently used).
`over`	`CLASS`		Annotation field to check pattern against.
`matchFindType`	`FIND_NONOVERLAPPING \| FIND_ALL`		Whether to find all matched expression or just the nonoverlapping ones (default `FIND_NONOVERLAPPING`).
`matchWithResults`	`Boolean`		Whether results of the matches should be returned (default false). Set to true to access captured groups of embedded regular expressions.
`matchedExpressionGroup`	`Integer`	`2`	What group should be treated as the matched expression group (default 0).

Author:: Angel Chang
See Also:: CoreMapExpressionExtractor, TokenSequencePattern

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`SequenceMatchRules.AnnotationExtractRule<S,T extends MatchedExpression>` Rule that specifies how to extract sequence of MatchedExpression from an annotation (CoreMap).
`static class`	`SequenceMatchRules.AnnotationExtractRuleCreator`
`static class`	`SequenceMatchRules.AnnotationMatchedFilter`
`static class`	`SequenceMatchRules.AssignmentRule` Rule that specifies what value to assign to a variable.
`static class`	`SequenceMatchRules.BasicSequenceExtractRule` Extraction rule.
`static class`	`SequenceMatchRules.CompositeExtractRuleCreator`
`static class`	`SequenceMatchRules.CoreMapExtractRule<T,O>` Extraction rule to apply a extraction rule on a particular CoreMap field.
`static class`	`SequenceMatchRules.CoreMapFunctionApplier<T,O>`
`static class`	`SequenceMatchRules.CoreMapToListExtractRule<O>` Extraction rule that treats a single CoreMap as a list/sequence of CoreMaps.
`static class`	`SequenceMatchRules.CoreMapToListFunctionApplier<O>`
`static interface`	`SequenceMatchRules.ExtractRule<I,O>` Interface for a rule that extracts a list of matched items from an input.
`static class`	`SequenceMatchRules.FilterExtractRule<I,O>` Extraction rule that filters the input before passing it on to the next extractor.
`static class`	`SequenceMatchRules.ListExtractRule<I,O>` Extraction rule that applies a list of rules in sequence and aggregates all matches found.
`static class`	`SequenceMatchRules.MultiSequencePatternExtractRule<T,O>`
`static class`	`SequenceMatchRules.MultiTokenPatternExtractRuleCreator`
`static interface`	`SequenceMatchRules.Rule` A sequence match rule.
`static class`	`SequenceMatchRules.SequenceMatchedExpressionExtractor`
`static class`	`SequenceMatchRules.SequenceMatchResultExtractor<T>`
`static class`	`SequenceMatchRules.SequencePatternExtractRule<T,O>`
`static class`	`SequenceMatchRules.StringMatchedExpressionExtractor`
`static class`	`SequenceMatchRules.StringMatchResultExtractor`
`static class`	`SequenceMatchRules.StringPatternExtractRule<O>`
`static class`	`SequenceMatchRules.TextPatternExtractRuleCreator`
`static class`	`SequenceMatchRules.TokenPatternExtractRuleCreator`

Field Summary

Fields
Modifier and Type	Field and Description
`static SequenceMatchRules.CompositeExtractRuleCreator`	`COMPOSITE_EXTRACT_RULE_CREATOR`
`static java.lang.String`	`COMPOSITE_RULE_TYPE`
`static SequenceMatchRules.AnnotationExtractRuleCreator`	`DEFAULT_EXTRACT_RULE_CREATOR`
`static java.lang.String`	`FILTER_RULE_TYPE`
`static SequenceMatchRules.MultiTokenPatternExtractRuleCreator`	`MULTI_TOKEN_PATTERN_EXTRACT_RULE_CREATOR`
`static SequenceMatchRules.TextPatternExtractRuleCreator`	`TEXT_PATTERN_EXTRACT_RULE_CREATOR`
`static java.lang.String`	`TEXT_PATTERN_RULE_TYPE`
`static SequenceMatchRules.TokenPatternExtractRuleCreator`	`TOKEN_PATTERN_EXTRACT_RULE_CREATOR`
`static java.lang.String`	`TOKEN_PATTERN_RULE_TYPE`

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static MatchedExpression.SingleAnnotationExtractor`	`createAnnotationExtractor(Env env, SequenceMatchRules.AnnotationExtractRule r)`
`static SequenceMatchRules.AssignmentRule`	`createAssignmentRule(Env env, AssignableExpression var, Expression result)`
`protected static SequenceMatchRules.AnnotationExtractRule`	`createExtractionRule(Env env, java.util.Map<java.lang.String,java.lang.Object> attributes)`
`static SequenceMatchRules.AnnotationExtractRule`	`createExtractionRule(Env env, java.lang.String ruleType, java.lang.Object pattern, Expression result)`
`static SequenceMatchRules.AnnotationExtractRule`	`createMultiTokenPatternRule(Env env, SequenceMatchRules.AnnotationExtractRule template, java.util.List<TokenSequencePattern> patterns)`
`static SequenceMatchRules.Rule`	`createRule(Env env, Expressions.CompositeValue cv)`
`static SequenceMatchRules.AnnotationExtractRule`	`createTextPatternRule(Env env, java.lang.String expr, Expression result)`
`static SequenceMatchRules.AnnotationExtractRule`	`createTokenPatternRule(Env env, SequencePattern.PatternExpr expr, Expression result)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

COMPOSITE_RULE_TYPE

public static final java.lang.String COMPOSITE_RULE_TYPE

See Also:: Constant Field Values

TOKEN_PATTERN_RULE_TYPE

public static final java.lang.String TOKEN_PATTERN_RULE_TYPE

See Also:: Constant Field Values

TEXT_PATTERN_RULE_TYPE

public static final java.lang.String TEXT_PATTERN_RULE_TYPE

See Also:: Constant Field Values

FILTER_RULE_TYPE

public static final java.lang.String FILTER_RULE_TYPE

See Also:: Constant Field Values

TOKEN_PATTERN_EXTRACT_RULE_CREATOR

public static final SequenceMatchRules.TokenPatternExtractRuleCreator TOKEN_PATTERN_EXTRACT_RULE_CREATOR

COMPOSITE_EXTRACT_RULE_CREATOR

public static final SequenceMatchRules.CompositeExtractRuleCreator COMPOSITE_EXTRACT_RULE_CREATOR

TEXT_PATTERN_EXTRACT_RULE_CREATOR

public static final SequenceMatchRules.TextPatternExtractRuleCreator TEXT_PATTERN_EXTRACT_RULE_CREATOR

MULTI_TOKEN_PATTERN_EXTRACT_RULE_CREATOR

public static final SequenceMatchRules.MultiTokenPatternExtractRuleCreator MULTI_TOKEN_PATTERN_EXTRACT_RULE_CREATOR

DEFAULT_EXTRACT_RULE_CREATOR

public static final SequenceMatchRules.AnnotationExtractRuleCreator DEFAULT_EXTRACT_RULE_CREATOR

Method Detail

createAssignmentRule

public static SequenceMatchRules.AssignmentRule createAssignmentRule(Env env,
                                                                     AssignableExpression var,
                                                                     Expression result)

createRule

public static SequenceMatchRules.Rule createRule(Env env,
                                                 Expressions.CompositeValue cv)

createExtractionRule

protected static SequenceMatchRules.AnnotationExtractRule createExtractionRule(Env env,
                                                                               java.util.Map<java.lang.String,java.lang.Object> attributes)

createExtractionRule

public static SequenceMatchRules.AnnotationExtractRule createExtractionRule(Env env,
                                                                            java.lang.String ruleType,
                                                                            java.lang.Object pattern,
                                                                            Expression result)

createTokenPatternRule

public static SequenceMatchRules.AnnotationExtractRule createTokenPatternRule(Env env,
                                                                              SequencePattern.PatternExpr expr,
                                                                              Expression result)

createTextPatternRule

public static SequenceMatchRules.AnnotationExtractRule createTextPatternRule(Env env,
                                                                             java.lang.String expr,
                                                                             Expression result)

createMultiTokenPatternRule

public static SequenceMatchRules.AnnotationExtractRule createMultiTokenPatternRule(Env env,
                                                                                   SequenceMatchRules.AnnotationExtractRule template,
                                                                                   java.util.List<TokenSequencePattern> patterns)

createAnnotationExtractor

public static MatchedExpression.SingleAnnotationExtractor createAnnotationExtractor(Env env,
                                                                                    SequenceMatchRules.AnnotationExtractRule r)

Class SequenceMatchRules

Nested Class Summary

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

COMPOSITE_RULE_TYPE

TOKEN_PATTERN_RULE_TYPE

TEXT_PATTERN_RULE_TYPE

FILTER_RULE_TYPE

TOKEN_PATTERN_EXTRACT_RULE_CREATOR

COMPOSITE_EXTRACT_RULE_CREATOR

TEXT_PATTERN_EXTRACT_RULE_CREATOR

MULTI_TOKEN_PATTERN_EXTRACT_RULE_CREATOR

DEFAULT_EXTRACT_RULE_CREATOR

Method Detail

createAssignmentRule

createRule

createExtractionRule

createExtractionRule

createTokenPatternRule

createTextPatternRule

createMultiTokenPatternRule

createAnnotationExtractor