edu.stanford.nlp.tagger.maxent
Class TaggerConfig

java.lang.Object
  extended by java.util.Dictionary<K,V>
      extended by java.util.Hashtable<java.lang.Object,java.lang.Object>
          extended by java.util.Properties
              extended by edu.stanford.nlp.tagger.maxent.TaggerConfig
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, java.util.Map<java.lang.Object,java.lang.Object>

public class TaggerConfig
extends java.util.Properties

Reads and stores configuration information for a POS tagger. Implementation note: To add a new parameter: (1) define a default String value, (2) add it to defaultValues hash, (3) add line to constructor, (4) add getter method, (5) add to dump() method, (6) add to printGenProps() method, (7) add to class javadoc of MaxentTagger.

Author:
William Morgan, Anna Rafferty, Michel Galley
See Also:
Serialized Form

Nested Class Summary
static class TaggerConfig.Mode
           
 
Field Summary
static java.lang.String APPROXIMATE
           
static java.lang.String ARCH
           
static java.lang.String CLOSED_CLASS_THRESHOLD
           
static java.lang.String CUR_WORD_MIN_FEATURE_THRESH
           
static java.lang.String DEBUG
           
static java.lang.String DEFAULT_REG_L1
           
static java.lang.String ENCODING
           
static java.lang.String ENCODING_PROPERTY
           
static java.lang.String INIT_FROM_TREES
           
static java.lang.String ITERATIONS
           
static java.lang.String LANG
           
static java.lang.String LEARN_CLOSED_CLASS
           
static java.lang.String MIN_FEATURE_THRESH
           
static java.lang.String OCCURING_TAGS_ONLY
           
static java.lang.String OUTPUT_FILE
           
static java.lang.String OUTPUT_FORMAT
           
static java.lang.String OUTPUT_FORMAT_OPTIONS
           
static java.lang.String POSSIBLE_TAGS_ONLY
           
static java.lang.String RARE_WORD_MIN_FEATURE_THRESH
           
static java.lang.String RARE_WORD_THRESH
           
static java.lang.String SEARCH
           
static java.lang.String SGML
           
static java.lang.String SIGMA_SQUARED
           
static java.lang.String TAG_INSIDE
           
static java.lang.String TAG_SEPARATOR
           
static java.lang.String TAG_SEPARATOR_PROPERTY
           
static java.lang.String TOKENIZE
           
static java.lang.String TOKENIZER_FACTORY
           
static java.lang.String TOKENIZER_OPTIONS
           
static java.lang.String TREE_NORMALIZER
           
static java.lang.String TREE_RANGE
           
static java.lang.String TREE_TRANSFORMER
           
static java.lang.String VERBOSE
           
static java.lang.String VERBOSE_RESULTS
           
static java.lang.String VERY_COMMON_WORD_THRESH
           
static java.lang.String XML_INPUT
           
 
Fields inherited from class java.util.Properties
defaults
 
Constructor Summary
TaggerConfig(java.util.Properties props)
           
TaggerConfig(java.lang.String... args)
           
TaggerConfig(TaggerConfig old)
          We force you to pass in a TaggerConfig rather than any other superclass so that we know the arg error checking has already occurred
 
Method Summary
 void dump()
           
 void dump(java.io.PrintStream stream)
           
 java.lang.String getArch()
           
 java.lang.String[] getClosedClassTags()
           
 int getClosedTagThreshold()
           
 int getCurWordMinFeatureThresh()
           
 boolean getDebug()
           
 java.lang.String getDebugPrefix()
           
 double getDefaultScore()
          Returns a default score to be used for each tag that is incompatible with the current word (e.g., the tag CC for the word "apple").
static java.lang.String getDefaultTagSeparator()
           
 java.lang.String getEncoding()
           
 java.lang.String getFile()
           
 int getIterations()
           
 java.lang.String getJarModel()
           
 java.lang.String getLang()
           
 boolean getLearnClosedClassTags()
           
 int getMinFeatureThresh()
           
 TaggerConfig.Mode getMode()
           
 java.lang.String getModel()
           
 java.lang.String[] getOpenClassTags()
           
 java.lang.String getOutputFile()
           
 java.lang.String getOutputFormat()
           
 boolean getOutputLemmas()
           
 java.lang.String[] getOutputOptions()
           
 boolean getOutputOptionsContains(java.lang.String sought)
           
 boolean getOutputVerbosity()
           
 int getRareWordMinFeatureThresh()
           
 int getRareWordThresh()
           
 double getRegL1()
           
 java.lang.String getSearch()
           
 java.lang.String getSentenceDelimiter()
          This returns the sentence delimiter used when tokenizing text using the tokenizer requested in this config.
 boolean getSGML()
           
 double getSigmaSquared()
           
 java.lang.String getTagInside()
          Return a regex of XML elements to tag inside of.
 java.lang.String getTagSeparator()
           
 boolean getTokenize()
           
 java.lang.String getTokenizerFactory()
           
 boolean getTokenizerInvertible()
           
 java.lang.String getTokenizerOptions()
           
 boolean getVerbose()
           
 boolean getVerboseResults()
           
 int getVeryCommonWordThresh()
           
 java.lang.String[] getXMLInput()
           
 boolean occuringTagsOnly()
           
 boolean possibleTagsOnly()
           
static TaggerConfig readConfig(java.io.DataInputStream stream)
          Read in a TaggerConfig.
 void saveConfig(java.io.OutputStream os)
          Serialize the TaggerConfig.
 java.lang.String toString()
           
 boolean useStdin()
          Returns whether or not we should use stdin for reading when tagging data.
 
Methods inherited from class java.util.Properties
getProperty, getProperty, list, list, load, load, loadFromXML, propertyNames, save, setProperty, store, store, storeToXML, storeToXML, stringPropertyNames
 
Methods inherited from class java.util.Hashtable
clear, clone, contains, containsKey, containsValue, elements, entrySet, equals, get, hashCode, isEmpty, keys, keySet, put, putAll, rehash, remove, size, values
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

SEARCH

public static final java.lang.String SEARCH
See Also:
Constant Field Values

TAG_SEPARATOR

public static final java.lang.String TAG_SEPARATOR
See Also:
Constant Field Values

TOKENIZE

public static final java.lang.String TOKENIZE
See Also:
Constant Field Values

DEBUG

public static final java.lang.String DEBUG
See Also:
Constant Field Values

ITERATIONS

public static final java.lang.String ITERATIONS
See Also:
Constant Field Values

ARCH

public static final java.lang.String ARCH
See Also:
Constant Field Values

RARE_WORD_THRESH

public static final java.lang.String RARE_WORD_THRESH

MIN_FEATURE_THRESH

public static final java.lang.String MIN_FEATURE_THRESH

CUR_WORD_MIN_FEATURE_THRESH

public static final java.lang.String CUR_WORD_MIN_FEATURE_THRESH

RARE_WORD_MIN_FEATURE_THRESH

public static final java.lang.String RARE_WORD_MIN_FEATURE_THRESH

VERY_COMMON_WORD_THRESH

public static final java.lang.String VERY_COMMON_WORD_THRESH

OCCURING_TAGS_ONLY

public static final java.lang.String OCCURING_TAGS_ONLY

POSSIBLE_TAGS_ONLY

public static final java.lang.String POSSIBLE_TAGS_ONLY

SIGMA_SQUARED

public static final java.lang.String SIGMA_SQUARED

ENCODING

public static final java.lang.String ENCODING
See Also:
Constant Field Values

LEARN_CLOSED_CLASS

public static final java.lang.String LEARN_CLOSED_CLASS
See Also:
Constant Field Values

CLOSED_CLASS_THRESHOLD

public static final java.lang.String CLOSED_CLASS_THRESHOLD

VERBOSE

public static final java.lang.String VERBOSE
See Also:
Constant Field Values

VERBOSE_RESULTS

public static final java.lang.String VERBOSE_RESULTS
See Also:
Constant Field Values

SGML

public static final java.lang.String SGML
See Also:
Constant Field Values

INIT_FROM_TREES

public static final java.lang.String INIT_FROM_TREES
See Also:
Constant Field Values

LANG

public static final java.lang.String LANG
See Also:
Constant Field Values

TOKENIZER_FACTORY

public static final java.lang.String TOKENIZER_FACTORY
See Also:
Constant Field Values

XML_INPUT

public static final java.lang.String XML_INPUT
See Also:
Constant Field Values

TREE_TRANSFORMER

public static final java.lang.String TREE_TRANSFORMER
See Also:
Constant Field Values

TREE_NORMALIZER

public static final java.lang.String TREE_NORMALIZER
See Also:
Constant Field Values

TREE_RANGE

public static final java.lang.String TREE_RANGE
See Also:
Constant Field Values

TAG_INSIDE

public static final java.lang.String TAG_INSIDE
See Also:
Constant Field Values

APPROXIMATE

public static final java.lang.String APPROXIMATE
See Also:
Constant Field Values

TOKENIZER_OPTIONS

public static final java.lang.String TOKENIZER_OPTIONS
See Also:
Constant Field Values

DEFAULT_REG_L1

public static final java.lang.String DEFAULT_REG_L1
See Also:
Constant Field Values

OUTPUT_FILE

public static final java.lang.String OUTPUT_FILE
See Also:
Constant Field Values

OUTPUT_FORMAT

public static final java.lang.String OUTPUT_FORMAT
See Also:
Constant Field Values

OUTPUT_FORMAT_OPTIONS

public static final java.lang.String OUTPUT_FORMAT_OPTIONS
See Also:
Constant Field Values

ENCODING_PROPERTY

public static final java.lang.String ENCODING_PROPERTY
See Also:
Constant Field Values

TAG_SEPARATOR_PROPERTY

public static final java.lang.String TAG_SEPARATOR_PROPERTY
See Also:
Constant Field Values
Constructor Detail

TaggerConfig

public TaggerConfig(TaggerConfig old)
We force you to pass in a TaggerConfig rather than any other superclass so that we know the arg error checking has already occurred


TaggerConfig

public TaggerConfig(java.lang.String... args)

TaggerConfig

public TaggerConfig(java.util.Properties props)
Method Detail

getModel

public java.lang.String getModel()

getJarModel

public java.lang.String getJarModel()

getFile

public java.lang.String getFile()

getOutputFile

public java.lang.String getOutputFile()

getOutputFormat

public java.lang.String getOutputFormat()

getOutputOptions

public java.lang.String[] getOutputOptions()

getOutputVerbosity

public boolean getOutputVerbosity()

getOutputLemmas

public boolean getOutputLemmas()

getOutputOptionsContains

public boolean getOutputOptionsContains(java.lang.String sought)

getSearch

public java.lang.String getSearch()

getSigmaSquared

public double getSigmaSquared()

getIterations

public int getIterations()

getRareWordThresh

public int getRareWordThresh()

getMinFeatureThresh

public int getMinFeatureThresh()

getCurWordMinFeatureThresh

public int getCurWordMinFeatureThresh()

getRareWordMinFeatureThresh

public int getRareWordMinFeatureThresh()

getVeryCommonWordThresh

public int getVeryCommonWordThresh()

occuringTagsOnly

public boolean occuringTagsOnly()

possibleTagsOnly

public boolean possibleTagsOnly()

getLang

public java.lang.String getLang()

getOpenClassTags

public java.lang.String[] getOpenClassTags()

getClosedClassTags

public java.lang.String[] getClosedClassTags()

getLearnClosedClassTags

public boolean getLearnClosedClassTags()

getClosedTagThreshold

public int getClosedTagThreshold()

getArch

public java.lang.String getArch()

getDebug

public boolean getDebug()

getDebugPrefix

public java.lang.String getDebugPrefix()

getTokenizerFactory

public java.lang.String getTokenizerFactory()

getDefaultTagSeparator

public static java.lang.String getDefaultTagSeparator()

getTagSeparator

public final java.lang.String getTagSeparator()

getTokenize

public boolean getTokenize()

getEncoding

public java.lang.String getEncoding()

getRegL1

public double getRegL1()

getXMLInput

public java.lang.String[] getXMLInput()

getVerbose

public boolean getVerbose()

getVerboseResults

public boolean getVerboseResults()

getSGML

public boolean getSGML()

getTagInside

public java.lang.String getTagInside()
Return a regex of XML elements to tag inside of. This may return an empty String, but never null.

Returns:
A regex of XML elements to tag inside of

getTokenizerOptions

public java.lang.String getTokenizerOptions()

getTokenizerInvertible

public boolean getTokenizerInvertible()

getDefaultScore

public double getDefaultScore()
Returns a default score to be used for each tag that is incompatible with the current word (e.g., the tag CC for the word "apple"). Using a default score may slightly decrease performance for some languages (e.g., Chinese and German), but allows the tagger to run considerably faster (since the computation of the normalization term Z requires much less feature extraction). This approximation does not decrease performance in English (on the WSJ). If this function returns 0.0, the tagger will compute exact scores.

Returns:
default score

dump

public void dump()

dump

public void dump(java.io.PrintStream stream)

toString

public java.lang.String toString()
Overrides:
toString in class java.util.Hashtable<java.lang.Object,java.lang.Object>

getSentenceDelimiter

public java.lang.String getSentenceDelimiter()
This returns the sentence delimiter used when tokenizing text using the tokenizer requested in this config. In general, it is assumed the tokenizer doesn't need a sentence delimiter... if you use the whitespace tokenizer, though, a newline breaks sentences.


useStdin

public boolean useStdin()
Returns whether or not we should use stdin for reading when tagging data. For now, this returns true iff the filename given was "stdin". (TODO: kind of ugly)


getMode

public TaggerConfig.Mode getMode()

saveConfig

public void saveConfig(java.io.OutputStream os)
                throws java.io.IOException
Serialize the TaggerConfig.

Parameters:
os - Where to write this TaggerConfig
Throws:
java.io.IOException - If any IO problems

readConfig

public static TaggerConfig readConfig(java.io.DataInputStream stream)
                               throws java.io.IOException,
                                      java.lang.ClassNotFoundException
Read in a TaggerConfig.

Parameters:
stream - Where to read from
Returns:
The TaggerConfig
Throws:
java.io.IOException - Misc IOError
java.lang.ClassNotFoundException - Class error


Stanford NLP Group