public class MaxMatchSegmenter extends java.lang.Object implements WordSegmenter
MaxMatchSegmenter
contains a greedy version of this algorithm.
Note that the output segmentation may need to postprocessing for the segmentation
of non-Chinese characters (e.g., punctuation, foreign names).| Modifier and Type | Class and Description |
|---|---|
static class |
MaxMatchSegmenter.MatchHeuristic |
| Constructor and Description |
|---|
MaxMatchSegmenter() |
| Modifier and Type | Method and Description |
|---|---|
void |
finishTraining() |
java.util.ArrayList<Word> |
greedilySegmentWords(java.lang.String s)
Returns a lexicon-based segmentation.
|
void |
initializeTraining(double numTrees) |
void |
loadSegmenter(java.lang.String filename) |
static void |
main(java.lang.String[] args) |
java.util.ArrayList<Word> |
maxMatchSegmentation()
Returns the lexicon-based segmentation that minimizes the number of words.
|
java.util.List<HasWord> |
segment(java.lang.String s) |
java.util.ArrayList<Word> |
segmentWords(MaxMatchSegmenter.MatchHeuristic h)
Returns the lexicon-based segmentation following heuristic h.
|
void |
train(java.util.Collection<Tree> trees) |
void |
train(java.util.List<TaggedWord> sentence) |
void |
train(Tree tree) |
public void initializeTraining(double numTrees)
initializeTraining in interface WordSegmenterpublic void train(java.util.Collection<Tree> trees)
train in interface WordSegmenterpublic void train(Tree tree)
train in interface WordSegmenterpublic void train(java.util.List<TaggedWord> sentence)
train in interface WordSegmenterpublic void finishTraining()
finishTraining in interface WordSegmenterpublic void loadSegmenter(java.lang.String filename)
loadSegmenter in interface WordSegmenterpublic java.util.List<HasWord> segment(java.lang.String s)
segment in interface WordSegmenterpublic java.util.ArrayList<Word> maxMatchSegmentation()
public java.util.ArrayList<Word> segmentWords(MaxMatchSegmenter.MatchHeuristic h) throws java.lang.UnsupportedOperationException
h - Heuristic to use for segmentation.java.lang.UnsupportedOperationExceptionbuildSegmentationLattice(java.lang.String)public java.util.ArrayList<Word> greedilySegmentWords(java.lang.String s)
s - Input (unsegmented) string.public static void main(java.lang.String[] args)