edu.stanford.nlp.tagger.maxent
Class TaggerConfig
java.lang.Object
java.util.Dictionary<K,V>
java.util.Hashtable<Object,Object>
java.util.Properties
edu.stanford.nlp.tagger.maxent.TaggerConfig
- All Implemented Interfaces:
- Serializable, Cloneable, Map<Object,Object>
public class TaggerConfig
- extends Properties
Reads and stores configuration information for a POS tagger.
Implementation note: To add a new parameter: (1) define a default
String value, (2) add it to defaultValues hash, (3) add line to constructor,
(4) add getter method, (5) add to dump() method, (6) add to printGenProps()
method, (7) add to class javadoc of MaxentTagger.
- Author:
- William Morgan, Anna Rafferty, Michel Galley
- See Also:
- Serialized Form
Field Summary |
static String |
JAR_TAGGER_PATH
The directory in a jar file in which to find a tagger resource specified by jar:file |
Methods inherited from class java.util.Properties |
getProperty, getProperty, list, list, load, load, loadFromXML, propertyNames, save, setProperty, store, store, storeToXML, storeToXML, stringPropertyNames |
Methods inherited from class java.util.Hashtable |
clear, clone, contains, containsKey, containsValue, elements, entrySet, equals, get, hashCode, isEmpty, keys, keySet, put, putAll, rehash, remove, size, values |
JAR_TAGGER_PATH
public static final String JAR_TAGGER_PATH
- The directory in a jar file in which to find a tagger resource specified by jar:file
- See Also:
- Constant Field Values
TaggerConfig
public TaggerConfig(String... args)
getModel
public String getModel()
getJarModel
public String getJarModel()
getFile
public String getFile()
getOutputFile
public String getOutputFile()
getOutputFormat
public String getOutputFormat()
getOutputOptions
public String[] getOutputOptions()
getSearch
public String getSearch()
getSigmaSquared
public double getSigmaSquared()
getIterations
public int getIterations()
getRareWordThresh
public int getRareWordThresh()
getMinFeatureThresh
public int getMinFeatureThresh()
getCurWordMinFeatureThresh
public int getCurWordMinFeatureThresh()
getRareWordMinFeatureThresh
public int getRareWordMinFeatureThresh()
getVeryCommonWordThresh
public int getVeryCommonWordThresh()
occuringTagsOnly
public boolean occuringTagsOnly()
possibleTagsOnly
public boolean possibleTagsOnly()
getLang
public String getLang()
getOpenClassTags
public String[] getOpenClassTags()
getClosedClassTags
public String[] getClosedClassTags()
getLearnClosedClassTags
public boolean getLearnClosedClassTags()
getClosedTagThreshold
public int getClosedTagThreshold()
getArch
public String getArch()
getDebug
public boolean getDebug()
getDebugPrefix
public String getDebugPrefix()
getTokenizerFactory
public String getTokenizerFactory()
getDefaultTagSeparator
public static final String getDefaultTagSeparator()
getTagSeparator
public final String getTagSeparator()
getTokenize
public boolean getTokenize()
getEncoding
public String getEncoding()
getRegL1
public double getRegL1()
getXMLInput
public String[] getXMLInput()
getInitFromTrees
public boolean getInitFromTrees()
getTreeRange
public String getTreeRange()
getVerbose
public boolean getVerbose()
getVerboseResults
public boolean getVerboseResults()
getSGML
public boolean getSGML()
getTagInside
public String getTagInside()
- Return a regex of XML elements to tag inside of. This may return an
empty String, but never null.
- Returns:
- A regex of XML elements to tag inside of
getTokenizerOptions
public String getTokenizerOptions()
getDefaultScore
public double getDefaultScore()
- Returns a default score to be used for each tag that is incompatible with
the current word (e.g., the tag CC for the word "apple"). Using a default
score may slightly decrease performance for some languages (e.g., Chinese and
German), but allows the tagger to run considerably faster (since the computation
of the normalization term Z requires much less feature extraction). This approximation
does not decrease performance in English (on the WSJ). If this function returns
0.0, the tagger will compute exact scores.
- Returns:
- default score
getTreeTransformer
public TreeTransformer getTreeTransformer()
getTreeNormalizer
public TreeNormalizer getTreeNormalizer()
dump
public void dump()
dump
public void dump(PrintStream stream)
toString
public String toString()
- Overrides:
toString
in class Hashtable<Object,Object>
getSentenceDelimiter
public String getSentenceDelimiter()
- This returns the sentence delimiter used when tokenizing text
using the tokenizer requested in this config. In general, it is
assumed the tokenizer doesn't need a sentence delimiter... if you
use the whitespace tokenizer, though, a newline breaks sentences.
useStdin
public boolean useStdin()
- Returns whether or not we should use stdin for reading when
tagging data. For now, this returns true iff the filename given
was "stdin".
(TODO: kind of ugly)
getMode
public TaggerConfig.Mode getMode()
saveConfig
public void saveConfig(OutputStream os)
throws IOException
- Serialize the TaggerConfig.
- Parameters:
os
- Where to write this TaggerConfig
- Throws:
IOException
- If any IO problems
readConfig
public static TaggerConfig readConfig(DataInputStream stream)
throws IOException,
ClassNotFoundException
- Read in a TaggerConfig.
- Parameters:
stream
- Where to read from
- Returns:
- The TaggerConfig
- Throws:
IOException
- Misc IOError
ClassNotFoundException
- Class error
Stanford NLP Group