edu.stanford.nlp.tagger.maxent
Class Train
java.lang.Object
edu.stanford.nlp.tagger.maxent.Train
public class Train
- extends java.lang.Object
This class is used to train a POS tagger from the command line.
Options are specified via a properties file and command line
arguments.
Simple usage:
java edu.stanford.nlp.tagger.maxent.Train -file <inputfile> -model <model prefix>
This will generate a set of files with prefix <model prefix>
which correspond to a model trained with <input file>.
There are many options for training. While they can be specified on
the command line, the easiest way to deal with them is via a
properties file, which is passed in with the -props argument.
First, generate a default properties file with "-genprops":
java edu.stanford.nlp.tagger.maxent.Train -genprops > <properties file>
Edit the file. Comments within provide documentation.
Now to start the training procedure:
java edu.stanford.nlp.tagger.maxent.Train -props <properties file> -file <inputfile> -model <model prefix>
Any parameters from the properties file can be overridden on the
commandline. The final configuration will be stored as
<model prefix>.props.
The training file should be in the following format: one word and
one tag per line separated by a space or a tab. Each sentence
should end in an EOS word-tag pair. (Actually, I'm not entirely
sure that is still the case, but it probably won't hurt. -wmorgan)
If you need to add a list of closed-class tags for a new language,
do so in TTags
(and update the documentation in TaggerConfig
).
Once trained, you can test your model performance with Test
and tag some data with MaxentTagger
.
Method Summary |
static void |
main(java.lang.String[] args)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
main
public static void main(java.lang.String[] args)
throws java.lang.Exception
- Throws:
java.lang.Exception