edu.stanford.nlp.tagger.maxent
Class TaggerConfig

java.lang.Object
  extended by java.util.Dictionary<K,V>
      extended by java.util.Hashtable<Object,Object>
          extended by java.util.Properties
              extended by edu.stanford.nlp.tagger.maxent.TaggerConfig
All Implemented Interfaces:
Serializable, Cloneable, Map<Object,Object>

public class TaggerConfig
extends Properties

Reads and stores configuration information for a POS tagger. Implementation note: To add a new parameter: (1) define a default String value, (2) add it to defaultValues hash, (3) add line to constructor, (4) add getter method, (5) add to dump() method, (6) add to printGenProps() method, (7) add to class javadoc of MaxentTagger.

Author:
William Morgan, Anna Rafferty, Michel Galley
See Also:
Serialized Form

Nested Class Summary
static class TaggerConfig.Mode
           
 
Field Summary
static String JAR_TAGGER_PATH
          The directory in a jar file in which to find a tagger resource specified by jar:file
 
Fields inherited from class java.util.Properties
defaults
 
Constructor Summary
TaggerConfig(String... args)
           
 
Method Summary
 void dump()
           
 void dump(PrintStream stream)
           
 String getArch()
           
 String[] getClosedClassTags()
           
 int getClosedTagThreshold()
           
 int getCurWordMinFeatureThresh()
           
 boolean getDebug()
           
 String getDebugPrefix()
           
 double getDefaultScore()
          Returns a default score to be used for each tag that is incompatible with the current word (e.g., the tag CC for the word "apple").
static String getDefaultTagSeparator()
           
 String getEncoding()
           
 String getFile()
           
 boolean getInitFromTrees()
           
 int getIterations()
           
 String getJarModel()
           
 String getLang()
           
 boolean getLearnClosedClassTags()
           
 int getMinFeatureThresh()
           
 TaggerConfig.Mode getMode()
           
 String getModel()
           
 String[] getOpenClassTags()
           
 String getOutputFile()
           
 String getOutputFormat()
           
 String[] getOutputOptions()
           
 int getRareWordMinFeatureThresh()
           
 int getRareWordThresh()
           
 double getRegL1()
           
 String getSearch()
           
 String getSentenceDelimiter()
          This returns the sentence delimiter used when tokenizing text using the tokenizer requested in this config.
 boolean getSGML()
           
 double getSigmaSquared()
           
 String getTagInside()
          Return a regex of XML elements to tag inside of.
 String getTagSeparator()
           
 boolean getTokenize()
           
 String getTokenizerFactory()
           
 String getTokenizerOptions()
           
 TreeNormalizer getTreeNormalizer()
           
 String getTreeRange()
           
 TreeTransformer getTreeTransformer()
           
 boolean getVerbose()
           
 boolean getVerboseResults()
           
 int getVeryCommonWordThresh()
           
 String[] getXMLInput()
           
 boolean occuringTagsOnly()
           
 boolean possibleTagsOnly()
           
static TaggerConfig readConfig(DataInputStream stream)
          Read in a TaggerConfig.
 void saveConfig(OutputStream os)
          Serialize the TaggerConfig.
 String toString()
           
 boolean useStdin()
          Returns whether or not we should use stdin for reading when tagging data.
 
Methods inherited from class java.util.Properties
getProperty, getProperty, list, list, load, load, loadFromXML, propertyNames, save, setProperty, store, store, storeToXML, storeToXML, stringPropertyNames
 
Methods inherited from class java.util.Hashtable
clear, clone, contains, containsKey, containsValue, elements, entrySet, equals, get, hashCode, isEmpty, keys, keySet, put, putAll, rehash, remove, size, values
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

JAR_TAGGER_PATH

public static final String JAR_TAGGER_PATH
The directory in a jar file in which to find a tagger resource specified by jar:file

See Also:
Constant Field Values
Constructor Detail

TaggerConfig

public TaggerConfig(String... args)
Method Detail

getModel

public String getModel()

getJarModel

public String getJarModel()

getFile

public String getFile()

getOutputFile

public String getOutputFile()

getOutputFormat

public String getOutputFormat()

getOutputOptions

public String[] getOutputOptions()

getSearch

public String getSearch()

getSigmaSquared

public double getSigmaSquared()

getIterations

public int getIterations()

getRareWordThresh

public int getRareWordThresh()

getMinFeatureThresh

public int getMinFeatureThresh()

getCurWordMinFeatureThresh

public int getCurWordMinFeatureThresh()

getRareWordMinFeatureThresh

public int getRareWordMinFeatureThresh()

getVeryCommonWordThresh

public int getVeryCommonWordThresh()

occuringTagsOnly

public boolean occuringTagsOnly()

possibleTagsOnly

public boolean possibleTagsOnly()

getLang

public String getLang()

getOpenClassTags

public String[] getOpenClassTags()

getClosedClassTags

public String[] getClosedClassTags()

getLearnClosedClassTags

public boolean getLearnClosedClassTags()

getClosedTagThreshold

public int getClosedTagThreshold()

getArch

public String getArch()

getDebug

public boolean getDebug()

getDebugPrefix

public String getDebugPrefix()

getTokenizerFactory

public String getTokenizerFactory()

getDefaultTagSeparator

public static final String getDefaultTagSeparator()

getTagSeparator

public final String getTagSeparator()

getTokenize

public boolean getTokenize()

getEncoding

public String getEncoding()

getRegL1

public double getRegL1()

getXMLInput

public String[] getXMLInput()

getInitFromTrees

public boolean getInitFromTrees()

getTreeRange

public String getTreeRange()

getVerbose

public boolean getVerbose()

getVerboseResults

public boolean getVerboseResults()

getSGML

public boolean getSGML()

getTagInside

public String getTagInside()
Return a regex of XML elements to tag inside of. This may return an empty String, but never null.

Returns:
A regex of XML elements to tag inside of

getTokenizerOptions

public String getTokenizerOptions()

getDefaultScore

public double getDefaultScore()
Returns a default score to be used for each tag that is incompatible with the current word (e.g., the tag CC for the word "apple"). Using a default score may slightly decrease performance for some languages (e.g., Chinese and German), but allows the tagger to run considerably faster (since the computation of the normalization term Z requires much less feature extraction). This approximation does not decrease performance in English (on the WSJ). If this function returns 0.0, the tagger will compute exact scores.

Returns:
default score

getTreeTransformer

public TreeTransformer getTreeTransformer()

getTreeNormalizer

public TreeNormalizer getTreeNormalizer()

dump

public void dump()

dump

public void dump(PrintStream stream)

toString

public String toString()
Overrides:
toString in class Hashtable<Object,Object>

getSentenceDelimiter

public String getSentenceDelimiter()
This returns the sentence delimiter used when tokenizing text using the tokenizer requested in this config. In general, it is assumed the tokenizer doesn't need a sentence delimiter... if you use the whitespace tokenizer, though, a newline breaks sentences.


useStdin

public boolean useStdin()
Returns whether or not we should use stdin for reading when tagging data. For now, this returns true iff the filename given was "stdin". (TODO: kind of ugly)


getMode

public TaggerConfig.Mode getMode()

saveConfig

public void saveConfig(OutputStream os)
                throws IOException
Serialize the TaggerConfig.

Parameters:
os - Where to write this TaggerConfig
Throws:
IOException - If any IO problems

readConfig

public static TaggerConfig readConfig(DataInputStream stream)
                               throws IOException,
                                      ClassNotFoundException
Read in a TaggerConfig.

Parameters:
stream - Where to read from
Returns:
The TaggerConfig
Throws:
IOException - Misc IOError
ClassNotFoundException - Class error


Stanford NLP Group