edu.stanford.nlp.tagger.maxent
Class TaggerConfig

java.lang.Object
  extended by java.util.Dictionary<K,V>
      extended by java.util.Hashtable<Object,Object>
          extended by java.util.Properties
              extended by edu.stanford.nlp.tagger.maxent.TaggerConfig
All Implemented Interfaces:
Serializable, Cloneable, Map<Object,Object>

public class TaggerConfig
extends Properties

Reads and stores configuration information for a POS tagger. Implementation note: To add a new parameter: (1) define a default String value, (2) add it to defaultValues hash, (3) add line to constructor, (4) add getter method, (5) add to dump() method, (6) add to printGenProps() method, (7) add to class javadoc of MaxentTagger.

Author:
William Morgan, Anna Rafferty, Michel Galley
See Also:
Serialized Form

Nested Class Summary
static class TaggerConfig.Mode
           
 
Field Summary
static String APPROXIMATE
           
static String ARCH
           
static String CLOSED_CLASS_THRESHOLD
           
static String CUR_WORD_MIN_FEATURE_THRESH
           
static String DEBUG
           
static String DEFAULT_REG_L1
           
static String ENCODING
           
static String ENCODING_PROPERTY
           
static String INIT_FROM_TREES
           
static String ITERATIONS
           
static String LANG
           
static String LEARN_CLOSED_CLASS
           
static String MIN_FEATURE_THRESH
           
static String OCCURING_TAGS_ONLY
           
static String OUTPUT_FILE
           
static String OUTPUT_FORMAT
           
static String OUTPUT_FORMAT_OPTIONS
           
static String POSSIBLE_TAGS_ONLY
           
static String RARE_WORD_MIN_FEATURE_THRESH
           
static String RARE_WORD_THRESH
           
static String SEARCH
           
static String SGML
           
static String SIGMA_SQUARED
           
static String TAG_INSIDE
           
static String TAG_SEPARATOR
           
static String TAG_SEPARATOR_PROPERTY
           
static String TOKENIZE
           
static String TOKENIZER_FACTORY
           
static String TOKENIZER_OPTIONS
           
static String TREE_NORMALIZER
           
static String TREE_RANGE
           
static String TREE_TRANSFORMER
           
static String VERBOSE
           
static String VERBOSE_RESULTS
           
static String VERY_COMMON_WORD_THRESH
           
static String WORD_FUNCTION
           
static String XML_INPUT
           
 
Fields inherited from class java.util.Properties
defaults
 
Constructor Summary
TaggerConfig(Properties props)
           
TaggerConfig(String... args)
           
TaggerConfig(TaggerConfig old)
          We force you to pass in a TaggerConfig rather than any other superclass so that we know the arg error checking has already occurred
 
Method Summary
 void dump()
           
 void dump(PrintStream stream)
           
 String getArch()
           
 String[] getClosedClassTags()
           
 int getClosedTagThreshold()
           
 int getCurWordMinFeatureThresh()
           
 boolean getDebug()
           
 String getDebugPrefix()
           
 double getDefaultScore()
          Returns a default score to be used for each tag that is incompatible with the current word (e.g., the tag CC for the word "apple").
static String getDefaultTagSeparator()
           
 String getEncoding()
           
 String getFile()
           
 int getIterations()
           
 String getJarModel()
           
 String getLang()
           
 boolean getLearnClosedClassTags()
           
 int getMinFeatureThresh()
           
 TaggerConfig.Mode getMode()
           
 String getModel()
           
 String[] getOpenClassTags()
           
 String getOutputFile()
           
 String getOutputFormat()
           
 boolean getOutputLemmas()
           
 String[] getOutputOptions()
           
 boolean getOutputOptionsContains(String sought)
           
 boolean getOutputVerbosity()
           
 int getRareWordMinFeatureThresh()
           
 int getRareWordThresh()
           
 double getRegL1()
           
 String getSearch()
           
 String getSentenceDelimiter()
          This returns the sentence delimiter used when tokenizing text using the tokenizer requested in this config.
 boolean getSGML()
           
 double getSigmaSquared()
           
 String getTagInside()
          Return a regex of XML elements to tag inside of.
 String getTagSeparator()
           
 boolean getTokenize()
           
 String getTokenizerFactory()
           
 boolean getTokenizerInvertible()
           
 String getTokenizerOptions()
           
 boolean getVerbose()
           
 boolean getVerboseResults()
           
 int getVeryCommonWordThresh()
           
 String getWordFunction()
           
 String[] getXMLInput()
           
 boolean occuringTagsOnly()
           
 boolean possibleTagsOnly()
           
static TaggerConfig readConfig(DataInputStream stream)
          Read in a TaggerConfig.
 void saveConfig(OutputStream os)
          Serialize the TaggerConfig.
 String toString()
           
 boolean useStdin()
          Returns whether or not we should use stdin for reading when tagging data.
 
Methods inherited from class java.util.Properties
getProperty, getProperty, list, list, load, load, loadFromXML, propertyNames, save, setProperty, store, store, storeToXML, storeToXML, stringPropertyNames
 
Methods inherited from class java.util.Hashtable
clear, clone, contains, containsKey, containsValue, elements, entrySet, equals, get, hashCode, isEmpty, keys, keySet, put, putAll, rehash, remove, size, values
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

SEARCH

public static final String SEARCH
See Also:
Constant Field Values

TAG_SEPARATOR

public static final String TAG_SEPARATOR
See Also:
Constant Field Values

TOKENIZE

public static final String TOKENIZE
See Also:
Constant Field Values

DEBUG

public static final String DEBUG
See Also:
Constant Field Values

ITERATIONS

public static final String ITERATIONS
See Also:
Constant Field Values

ARCH

public static final String ARCH
See Also:
Constant Field Values

WORD_FUNCTION

public static final String WORD_FUNCTION
See Also:
Constant Field Values

RARE_WORD_THRESH

public static final String RARE_WORD_THRESH

MIN_FEATURE_THRESH

public static final String MIN_FEATURE_THRESH

CUR_WORD_MIN_FEATURE_THRESH

public static final String CUR_WORD_MIN_FEATURE_THRESH

RARE_WORD_MIN_FEATURE_THRESH

public static final String RARE_WORD_MIN_FEATURE_THRESH

VERY_COMMON_WORD_THRESH

public static final String VERY_COMMON_WORD_THRESH

OCCURING_TAGS_ONLY

public static final String OCCURING_TAGS_ONLY

POSSIBLE_TAGS_ONLY

public static final String POSSIBLE_TAGS_ONLY

SIGMA_SQUARED

public static final String SIGMA_SQUARED

ENCODING

public static final String ENCODING
See Also:
Constant Field Values

LEARN_CLOSED_CLASS

public static final String LEARN_CLOSED_CLASS
See Also:
Constant Field Values

CLOSED_CLASS_THRESHOLD

public static final String CLOSED_CLASS_THRESHOLD

VERBOSE

public static final String VERBOSE
See Also:
Constant Field Values

VERBOSE_RESULTS

public static final String VERBOSE_RESULTS
See Also:
Constant Field Values

SGML

public static final String SGML
See Also:
Constant Field Values

INIT_FROM_TREES

public static final String INIT_FROM_TREES
See Also:
Constant Field Values

LANG

public static final String LANG
See Also:
Constant Field Values

TOKENIZER_FACTORY

public static final String TOKENIZER_FACTORY
See Also:
Constant Field Values

XML_INPUT

public static final String XML_INPUT
See Also:
Constant Field Values

TREE_TRANSFORMER

public static final String TREE_TRANSFORMER
See Also:
Constant Field Values

TREE_NORMALIZER

public static final String TREE_NORMALIZER
See Also:
Constant Field Values

TREE_RANGE

public static final String TREE_RANGE
See Also:
Constant Field Values

TAG_INSIDE

public static final String TAG_INSIDE
See Also:
Constant Field Values

APPROXIMATE

public static final String APPROXIMATE
See Also:
Constant Field Values

TOKENIZER_OPTIONS

public static final String TOKENIZER_OPTIONS
See Also:
Constant Field Values

DEFAULT_REG_L1

public static final String DEFAULT_REG_L1
See Also:
Constant Field Values

OUTPUT_FILE

public static final String OUTPUT_FILE
See Also:
Constant Field Values

OUTPUT_FORMAT

public static final String OUTPUT_FORMAT
See Also:
Constant Field Values

OUTPUT_FORMAT_OPTIONS

public static final String OUTPUT_FORMAT_OPTIONS
See Also:
Constant Field Values

ENCODING_PROPERTY

public static final String ENCODING_PROPERTY
See Also:
Constant Field Values

TAG_SEPARATOR_PROPERTY

public static final String TAG_SEPARATOR_PROPERTY
See Also:
Constant Field Values
Constructor Detail

TaggerConfig

public TaggerConfig(TaggerConfig old)
We force you to pass in a TaggerConfig rather than any other superclass so that we know the arg error checking has already occurred


TaggerConfig

public TaggerConfig(String... args)

TaggerConfig

public TaggerConfig(Properties props)
Method Detail

getModel

public String getModel()

getJarModel

public String getJarModel()

getFile

public String getFile()

getOutputFile

public String getOutputFile()

getOutputFormat

public String getOutputFormat()

getOutputOptions

public String[] getOutputOptions()

getOutputVerbosity

public boolean getOutputVerbosity()

getOutputLemmas

public boolean getOutputLemmas()

getOutputOptionsContains

public boolean getOutputOptionsContains(String sought)

getSearch

public String getSearch()

getSigmaSquared

public double getSigmaSquared()

getIterations

public int getIterations()

getRareWordThresh

public int getRareWordThresh()

getMinFeatureThresh

public int getMinFeatureThresh()

getCurWordMinFeatureThresh

public int getCurWordMinFeatureThresh()

getRareWordMinFeatureThresh

public int getRareWordMinFeatureThresh()

getVeryCommonWordThresh

public int getVeryCommonWordThresh()

occuringTagsOnly

public boolean occuringTagsOnly()

possibleTagsOnly

public boolean possibleTagsOnly()

getLang

public String getLang()

getOpenClassTags

public String[] getOpenClassTags()

getClosedClassTags

public String[] getClosedClassTags()

getLearnClosedClassTags

public boolean getLearnClosedClassTags()

getClosedTagThreshold

public int getClosedTagThreshold()

getArch

public String getArch()

getWordFunction

public String getWordFunction()

getDebug

public boolean getDebug()

getDebugPrefix

public String getDebugPrefix()

getTokenizerFactory

public String getTokenizerFactory()

getDefaultTagSeparator

public static String getDefaultTagSeparator()

getTagSeparator

public final String getTagSeparator()

getTokenize

public boolean getTokenize()

getEncoding

public String getEncoding()

getRegL1

public double getRegL1()

getXMLInput

public String[] getXMLInput()

getVerbose

public boolean getVerbose()

getVerboseResults

public boolean getVerboseResults()

getSGML

public boolean getSGML()

getTagInside

public String getTagInside()
Return a regex of XML elements to tag inside of. This may return an empty String, but never null.

Returns:
A regex of XML elements to tag inside of

getTokenizerOptions

public String getTokenizerOptions()

getTokenizerInvertible

public boolean getTokenizerInvertible()

getDefaultScore

public double getDefaultScore()
Returns a default score to be used for each tag that is incompatible with the current word (e.g., the tag CC for the word "apple"). Using a default score may slightly decrease performance for some languages (e.g., Chinese and German), but allows the tagger to run considerably faster (since the computation of the normalization term Z requires much less feature extraction). This approximation does not decrease performance in English (on the WSJ). If this function returns 0.0, the tagger will compute exact scores.

Returns:
default score

dump

public void dump()

dump

public void dump(PrintStream stream)

toString

public String toString()
Overrides:
toString in class Hashtable<Object,Object>

getSentenceDelimiter

public String getSentenceDelimiter()
This returns the sentence delimiter used when tokenizing text using the tokenizer requested in this config. In general, it is assumed the tokenizer doesn't need a sentence delimiter... if you use the whitespace tokenizer, though, a newline breaks sentences.


useStdin

public boolean useStdin()
Returns whether or not we should use stdin for reading when tagging data. For now, this returns true iff the filename given was "stdin". (TODO: kind of ugly)


getMode

public TaggerConfig.Mode getMode()

saveConfig

public void saveConfig(OutputStream os)
                throws IOException
Serialize the TaggerConfig.

Parameters:
os - Where to write this TaggerConfig
Throws:
IOException - If any IO problems

readConfig

public static TaggerConfig readConfig(DataInputStream stream)
                               throws IOException,
                                      ClassNotFoundException
Read in a TaggerConfig.

Parameters:
stream - Where to read from
Returns:
The TaggerConfig
Throws:
IOException - Misc IOError
ClassNotFoundException - Class error


Stanford NLP Group