public class TrieMapMatcher<K,V> extends Object
TrieMapMatcher
provides functions to match against a trie.
It can be used to:
- Find matches in a document (findAllMatches and findNonOverlapping)
- Find approximate matches in a document (findClosestMatches)
- Segment a sequence based on entries in the trie (segment)
TODO: Have TrieMapMatcher implement a matcher interfaceModifier and Type | Field and Description |
---|---|
static Comparator<Match> |
MATCH_LENGTH_ENDPOINTS_COMPARATOR |
static java.util.function.Function<Match,Double> |
MATCH_LENGTH_SCORER |
Constructor and Description |
---|
TrieMapMatcher(TrieMap<K,V> root) |
TrieMapMatcher(TrieMap<K,V> root,
List<K> multimatchDelimiter) |
Modifier and Type | Method and Description |
---|---|
static <K,V> MatchCostFunction<K,V> |
defaultCost() |
List<Match<K,V>> |
findAllMatches(K... list)
Given a sequence to search through (e.g.
|
List<Match<K,V>> |
findAllMatches(List<K> list)
Given a sequence to search through (e.g.
|
List<Match<K,V>> |
findAllMatches(List<K> list,
int start,
int end)
Given a sequence to search through (e.g.
|
List<ApproxMatch<K,V>> |
findClosestMatches(K[] target,
int n)
Given a target sequence, returns the n closes matches (or sequences of matches) from the trie.
|
List<ApproxMatch<K,V>> |
findClosestMatches(K[] target,
int n,
boolean multimatch,
boolean keepAlignments)
Given a target sequence, returns the n closes matches (or sequences of matches) from the trie.
|
List<ApproxMatch<K,V>> |
findClosestMatches(K[] target,
MatchCostFunction<K,V> costFunction,
Double maxCost,
int n,
boolean multimatch,
boolean keepAlignments)
Given a target sequence, returns the n closes matches (or sequences of matches) from the trie
based on the cost function (lower cost mean better match).
|
List<ApproxMatch<K,V>> |
findClosestMatches(List<K> target,
int n)
Given a target sequence, returns the n closes matches (or sequences of matches) from the trie.
|
List<ApproxMatch<K,V>> |
findClosestMatches(List<K> target,
int n,
boolean multimatch,
boolean keepAlignments)
Given a target sequence, returns the n closes matches (or sequences of matches) from the trie.
|
List<ApproxMatch<K,V>> |
findClosestMatches(List<K> target,
MatchCostFunction<K,V> costFunction,
double maxCost,
int n,
boolean multimatch,
boolean keepAlignments)
Given a target sequence, returns the n closes matches (or sequences of matches) from the trie
based on the cost function (lower cost mean better match).
|
List<Match<K,V>> |
findNonOverlapping(K... list)
Given a sequence to search through (e.g.
|
List<Match<K,V>> |
findNonOverlapping(List<K> list)
Given a sequence to search through (e.g.
|
List<Match<K,V>> |
findNonOverlapping(List<K> list,
int start,
int end)
Given a sequence to search through (e.g.
|
List<Match<K,V>> |
findNonOverlapping(List<K> list,
int start,
int end,
Comparator<? super Match<K,V>> compareFunc)
Given a sequence to search through (e.g.
|
List<Match<K,V>> |
findNonOverlapping(List<K> list,
int start,
int end,
java.util.function.Function<? super Match<K,V>,Double> scoreFunc)
Given a sequence to search through (e.g.
|
List<Match<K,V>> |
getNonOverlapping(List<Match<K,V>> allMatches)
Given a list of matches, returns all non-overlapping matches.
|
List<Match<K,V>> |
getNonOverlapping(List<Match<K,V>> allMatches,
Comparator<? super Match<K,V>> compareFunc)
Given a list of matches, returns all non-overlapping matches.
|
List<Match<K,V>> |
getNonOverlapping(List<Match<K,V>> allMatches,
java.util.function.Function<? super Match<K,V>,Double> scoreFunc) |
static <K,V> Comparator<edu.stanford.nlp.ling.tokensregex.matcher.TrieMapMatcher.PartialApproxMatch<K,V>> |
partialMatchComparator() |
List<Match<K,V>> |
segment(K... list)
Segment a sequence into sequence of sub-sequences by attempting to find the longest non-overlapping
sub-sequences.
|
List<Match<K,V>> |
segment(List<K> list)
Segment a sequence into sequence of sub-sequences by attempting to find the longest non-overlapping
sub-sequences.
|
List<Match<K,V>> |
segment(List<K> list,
java.util.function.Function<? super Match<K,V>,Double> scoreFunc) |
List<Match<K,V>> |
segment(List<K> list,
int start,
int end)
Segment a sequence into sequence of sub-sequences by attempting to find the longest non-overlapping
sub-sequences.
|
List<Match<K,V>> |
segment(List<K> list,
int start,
int end,
Comparator<? super Match<K,V>> compareFunc)
Segment a sequence into sequence of sub-sequences by attempting to find the non-overlapping
sub-sequences that comes earlier using the compareFunc.
|
List<Match<K,V>> |
segment(List<K> list,
int start,
int end,
java.util.function.Function<? super Match<K,V>,Double> scoreFunc)
Segment a sequence into sequence of sub-sequences by attempting to maximize the total score
Non-matched parts will be included as a match with a null value.
|
protected void |
updateAllMatches(TrieMap<K,V> trie,
List<Match<K,V>> matches,
List<K> matched,
List<K> list,
int start,
int end) |
protected void |
updateAllMatchesWithStart(TrieMap<K,V> trie,
List<Match<K,V>> matches,
List<K> matched,
List<K> list,
int start,
int end) |
public static final Comparator<Match> MATCH_LENGTH_ENDPOINTS_COMPARATOR
public static final java.util.function.Function<Match,Double> MATCH_LENGTH_SCORER
public List<ApproxMatch<K,V>> findClosestMatches(K[] target, int n)
target
- Target sequence to matchn
- Number of matches to return. The actual number of matches may be less.public List<ApproxMatch<K,V>> findClosestMatches(K[] target, int n, boolean multimatch, boolean keepAlignments)
target
- Target sequence to matchn
- Number of matches to return. The actual number of matches may be less.multimatch
- If true, attempt to return matches with sequences of elements from the trie.
Otherwise, only each match will contain one element from the trie.keepAlignments
- If true, alignment information is returnedpublic List<ApproxMatch<K,V>> findClosestMatches(K[] target, MatchCostFunction<K,V> costFunction, Double maxCost, int n, boolean multimatch, boolean keepAlignments)
target
- Target sequence to matchcostFunction
- Cost function to usemaxCost
- Matches with a cost higher than this are discardedn
- Number of matches to return. The actual number of matches may be less.multimatch
- If true, attempt to return matches with sequences of elements from the trie.
Otherwise, only each match will contain one element from the trie.keepAlignments
- If true, alignment information is returnedpublic List<ApproxMatch<K,V>> findClosestMatches(List<K> target, int n)
target
- Target sequence to matchn
- Number of matches to return. The actual number of matches may be less.public List<ApproxMatch<K,V>> findClosestMatches(List<K> target, int n, boolean multimatch, boolean keepAlignments)
target
- Target sequence to matchn
- Number of matches to return. The actual number of matches may be less.multimatch
- If true, attempt to return matches with sequences of elements from the trie.
Otherwise, only each match will contain one element from the trie.keepAlignments
- If true, alignment information is returnedpublic List<ApproxMatch<K,V>> findClosestMatches(List<K> target, MatchCostFunction<K,V> costFunction, double maxCost, int n, boolean multimatch, boolean keepAlignments)
target
- Target sequence to matchcostFunction
- Cost function to usemaxCost
- Matches with a cost higher than this are discardedn
- Number of matches to return. The actual number of matches may be less.multimatch
- If true, attempt to return matches with sequences of elements from the trie.
Otherwise, only each match will contain one element from the trie.keepAlignments
- If true, alignment information is returnedpublic List<Match<K,V>> findAllMatches(K... list)
list
- Sequence to search throughpublic List<Match<K,V>> findAllMatches(List<K> list)
list
- Sequence to search throughpublic List<Match<K,V>> findAllMatches(List<K> list, int start, int end)
list
- Sequence to search throughstart
- start index to start search atend
- end index (exclusive) to end search atpublic List<Match<K,V>> findNonOverlapping(K... list)
list
- Sequence to search throughpublic List<Match<K,V>> findNonOverlapping(List<K> list)
list
- Sequence to search throughpublic List<Match<K,V>> findNonOverlapping(List<K> list, int start, int end)
list
- Sequence to search throughstart
- start index to start search atend
- end index (exclusive) to end search atpublic List<Match<K,V>> findNonOverlapping(List<K> list, int start, int end, Comparator<? super Match<K,V>> compareFunc)
list
- Sequence to search throughstart
- start index to start search atend
- end index (exclusive) to end search atcompareFunc
- Comparison function to use for evaluating which overlapping sub-sequence to keep.
Earlier sub-sequences based on the comparison function are favored.public List<Match<K,V>> findNonOverlapping(List<K> list, int start, int end, java.util.function.Function<? super Match<K,V>,Double> scoreFunc)
list
- Sequence to search throughstart
- start index to start search atend
- end index (exclusive) to end search atscoreFunc
- Scoring function indicating how good the match ispublic List<Match<K,V>> segment(K... list)
list
- Sequence to search throughpublic List<Match<K,V>> segment(List<K> list)
list
- Sequence to search throughpublic List<Match<K,V>> segment(List<K> list, int start, int end)
list
- Sequence to search throughstart
- start index to start search atend
- end index (exclusive) to end search atpublic List<Match<K,V>> segment(List<K> list, int start, int end, Comparator<? super Match<K,V>> compareFunc)
list
- Sequence to search throughstart
- start index to start search atend
- end index (exclusive) to end search atcompareFunc
- Comparison function to use for evaluating which overlapping sub-sequence to keep.
Earlier sub-sequences based on the comparison function are favored.public List<Match<K,V>> segment(List<K> list, int start, int end, java.util.function.Function<? super Match<K,V>,Double> scoreFunc)
list
- Sequence to search throughstart
- start index to start search atend
- end index (exclusive) to end search atscoreFunc
- Scoring function indicating how good the match ispublic List<Match<K,V>> segment(List<K> list, java.util.function.Function<? super Match<K,V>,Double> scoreFunc)
public List<Match<K,V>> getNonOverlapping(List<Match<K,V>> allMatches)
allMatches
- List of matchespublic List<Match<K,V>> getNonOverlapping(List<Match<K,V>> allMatches, Comparator<? super Match<K,V>> compareFunc)
allMatches
- List of matchescompareFunc
- Comparison function to use for evaluating which overlapping sub-sequence to keep.
Earlier sub-sequences based on the comparison function are favored.public List<Match<K,V>> getNonOverlapping(List<Match<K,V>> allMatches, java.util.function.Function<? super Match<K,V>,Double> scoreFunc)
protected void updateAllMatches(TrieMap<K,V> trie, List<Match<K,V>> matches, List<K> matched, List<K> list, int start, int end)
protected void updateAllMatchesWithStart(TrieMap<K,V> trie, List<Match<K,V>> matches, List<K> matched, List<K> list, int start, int end)
public static <K,V> MatchCostFunction<K,V> defaultCost()
public static <K,V> Comparator<edu.stanford.nlp.ling.tokensregex.matcher.TrieMapMatcher.PartialApproxMatch<K,V>> partialMatchComparator()