Skip navigation links

Package edu.stanford.nlp.wordseg

A package for doing Chinese word segmentation.

See: Description

Package edu.stanford.nlp.wordseg Description

A package for doing Chinese word segmentation.

This package makes use of the CRFClassifier class (a conditional random field sequence classifier) to do Chinese word segmentation.

On the Stanford NLP machines, usable properties files can be found at: /u/nlp/data/chinese-segmenter/Sighan2005/prop

Usage: For simplified Chinese:

java -mx200m edu.stanford.nlp.ie.crf.CRFClassifier -sighanCorporaDict $CH_SEG/data -NormalizationTable $CH_SEG/data/norm.simp.utf8 -normTableEncoding UTF-8 -loadClassifier $CH_SEG/data/ctb.gz -testFile $file -inputEncoding $enc
Author:
Pi-Chuan Chang, Huihsin Tseng, Galen Andrew
Skip navigation links

Stanford NLP Group