public class MaxMatchSegmenter extends java.lang.Object implements WordSegmenter
MaxMatchSegmenter
contains a greedy version of this algorithm.
Note that the output segmentation may need to postprocessing for the segmentation
of non-Chinese characters (e.g., punctuation, foreign names).Modifier and Type | Class and Description |
---|---|
static class |
MaxMatchSegmenter.MatchHeuristic |
Constructor and Description |
---|
MaxMatchSegmenter() |
Modifier and Type | Method and Description |
---|---|
void |
finishTraining() |
java.util.ArrayList<Word> |
greedilySegmentWords(java.lang.String s)
Returns a lexicon-based segmentation.
|
void |
initializeTraining(double numTrees) |
void |
loadSegmenter(java.lang.String filename) |
static void |
main(java.lang.String[] args) |
java.util.ArrayList<Word> |
maxMatchSegmentation()
Returns the lexicon-based segmentation that minimizes the number of words.
|
java.util.List<HasWord> |
segment(java.lang.String s) |
java.util.ArrayList<Word> |
segmentWords(MaxMatchSegmenter.MatchHeuristic h)
Returns the lexicon-based segmentation following heuristic h.
|
void |
train(java.util.Collection<Tree> trees) |
void |
train(java.util.List<TaggedWord> sentence) |
void |
train(Tree tree) |
public void initializeTraining(double numTrees)
initializeTraining
in interface WordSegmenter
public void train(java.util.Collection<Tree> trees)
train
in interface WordSegmenter
public void train(Tree tree)
train
in interface WordSegmenter
public void train(java.util.List<TaggedWord> sentence)
train
in interface WordSegmenter
public void finishTraining()
finishTraining
in interface WordSegmenter
public void loadSegmenter(java.lang.String filename)
loadSegmenter
in interface WordSegmenter
public java.util.List<HasWord> segment(java.lang.String s)
segment
in interface WordSegmenter
public java.util.ArrayList<Word> maxMatchSegmentation()
public java.util.ArrayList<Word> segmentWords(MaxMatchSegmenter.MatchHeuristic h) throws java.lang.UnsupportedOperationException
h
- Heuristic to use for segmentation.java.lang.UnsupportedOperationException
buildSegmentationLattice(java.lang.String)
public java.util.ArrayList<Word> greedilySegmentWords(java.lang.String s)
s
- Input (unsegmented) string.public static void main(java.lang.String[] args)