public class ArabicSegmenter extends Object implements WordSegmenter, Serializable, ThreadsafeProcessor<String,String>
This package includes a JFlex-based orthographic normalization package that runs on the input prior to processing by the CRF-based segmentation model. The normalization options are configurable, but must be consistent for both training and test data.
Constructor and Description |
---|
ArabicSegmenter(ArabicSegmenter other)
Copy constructor.
|
ArabicSegmenter(Properties props)
Make an Arabic Segmenter.
|
Modifier and Type | Method and Description |
---|---|
void |
finishTraining() |
void |
initializeTraining(double numTrees) |
void |
loadSegmenter(String filename) |
void |
loadSegmenter(String filename,
Properties p) |
static void |
main(String[] args) |
ThreadsafeProcessor<String,String> |
newInstance()
Return a new threadsafe instance.
|
String |
process(String nextInput)
Set the input item that will be processed when a thread is allocated to
this processor.
|
long |
segment(BufferedReader br,
PrintWriter pwOut)
Segment all strings from an input.
|
List<HasWord> |
segment(String line) |
String |
segmentString(String line) |
void |
serializeSegmenter(String filename) |
void |
train()
Train a segmenter from raw text.
|
void |
train(Collection<Tree> trees) |
void |
train(List<TaggedWord> sentence) |
void |
train(Tree tree) |
public ArabicSegmenter(Properties props)
props
- Options for how to tokenize. See the main method of ArabicTokenizer
for detailspublic ArabicSegmenter(ArabicSegmenter other)
other
- public void initializeTraining(double numTrees)
initializeTraining
in interface WordSegmenter
public void train(Collection<Tree> trees)
train
in interface WordSegmenter
public void train(Tree tree)
train
in interface WordSegmenter
public void train(List<TaggedWord> sentence)
train
in interface WordSegmenter
public void finishTraining()
finishTraining
in interface WordSegmenter
public String process(String nextInput)
ThreadsafeProcessor
process
in interface ThreadsafeProcessor<String,String>
nextInput
- the object to be processedpublic ThreadsafeProcessor<String,String> newInstance()
ThreadsafeProcessor
newInstance
in interface ThreadsafeProcessor<String,String>
public List<HasWord> segment(String line)
segment
in interface WordSegmenter
public long segment(BufferedReader br, PrintWriter pwOut)
br
- -- input stream to segmentpwOut
- -- output stream to write the segmenter textpublic void train()
public void serializeSegmenter(String filename)
public void loadSegmenter(String filename, Properties p)
public void loadSegmenter(String filename)
loadSegmenter
in interface WordSegmenter
public static void main(String[] args)
args
-