edu.stanford.nlp.tagger.maxent
Class TaggerConfig
java.lang.Object
java.util.Dictionary<K,V>
java.util.Hashtable<Object,Object>
java.util.Properties
edu.stanford.nlp.tagger.maxent.TaggerConfig
- All Implemented Interfaces:
- Serializable, Cloneable, Map<Object,Object>
public class TaggerConfig
- extends Properties
Reads and stores configuration information for a POS tagger.
Implementation note: To add a new parameter: (1) define a default
String value, (2) add it to defaultValues hash, (3) add line to constructor,
(4) add getter method, (5) add to dump() method, (6) add to printGenProps()
method, (7) add to class javadoc of MaxentTagger.
- Author:
- William Morgan, Anna Rafferty, Michel Galley
- See Also:
- Serialized Form
Constructor Summary |
TaggerConfig(String... args)
|
TaggerConfig(TaggerConfig old)
We force you to pass in a TaggerConfig rather than any other
superclass so that we know the arg error checking has already occurred |
Methods inherited from class java.util.Properties |
getProperty, getProperty, list, list, load, load, loadFromXML, propertyNames, save, setProperty, store, store, storeToXML, storeToXML, stringPropertyNames |
Methods inherited from class java.util.Hashtable |
clear, clone, contains, containsKey, containsValue, elements, entrySet, equals, get, hashCode, isEmpty, keys, keySet, put, putAll, rehash, remove, size, values |
SEARCH
public static final String SEARCH
- See Also:
- Constant Field Values
TAG_SEPARATOR
public static final String TAG_SEPARATOR
- See Also:
- Constant Field Values
TOKENIZE
public static final String TOKENIZE
- See Also:
- Constant Field Values
DEBUG
public static final String DEBUG
- See Also:
- Constant Field Values
ITERATIONS
public static final String ITERATIONS
- See Also:
- Constant Field Values
ARCH
public static final String ARCH
- See Also:
- Constant Field Values
RARE_WORD_THRESH
public static final String RARE_WORD_THRESH
MIN_FEATURE_THRESH
public static final String MIN_FEATURE_THRESH
CUR_WORD_MIN_FEATURE_THRESH
public static final String CUR_WORD_MIN_FEATURE_THRESH
RARE_WORD_MIN_FEATURE_THRESH
public static final String RARE_WORD_MIN_FEATURE_THRESH
VERY_COMMON_WORD_THRESH
public static final String VERY_COMMON_WORD_THRESH
OCCURING_TAGS_ONLY
public static final String OCCURING_TAGS_ONLY
POSSIBLE_TAGS_ONLY
public static final String POSSIBLE_TAGS_ONLY
SIGMA_SQUARED
public static final String SIGMA_SQUARED
ENCODING
public static final String ENCODING
- See Also:
- Constant Field Values
LEARN_CLOSED_CLASS
public static final String LEARN_CLOSED_CLASS
- See Also:
- Constant Field Values
CLOSED_CLASS_THRESHOLD
public static final String CLOSED_CLASS_THRESHOLD
VERBOSE
public static final String VERBOSE
- See Also:
- Constant Field Values
VERBOSE_RESULTS
public static final String VERBOSE_RESULTS
- See Also:
- Constant Field Values
SGML
public static final String SGML
- See Also:
- Constant Field Values
INIT_FROM_TREES
public static final String INIT_FROM_TREES
- See Also:
- Constant Field Values
LANG
public static final String LANG
- See Also:
- Constant Field Values
TOKENIZER_FACTORY
public static final String TOKENIZER_FACTORY
- See Also:
- Constant Field Values
XML_INPUT
public static final String XML_INPUT
- See Also:
- Constant Field Values
TREE_TRANSFORMER
public static final String TREE_TRANSFORMER
- See Also:
- Constant Field Values
TREE_NORMALIZER
public static final String TREE_NORMALIZER
- See Also:
- Constant Field Values
TREE_RANGE
public static final String TREE_RANGE
- See Also:
- Constant Field Values
TAG_INSIDE
public static final String TAG_INSIDE
- See Also:
- Constant Field Values
APPROXIMATE
public static final String APPROXIMATE
- See Also:
- Constant Field Values
TOKENIZER_OPTIONS
public static final String TOKENIZER_OPTIONS
- See Also:
- Constant Field Values
DEFAULT_REG_L1
public static final String DEFAULT_REG_L1
- See Also:
- Constant Field Values
OUTPUT_FILE
public static final String OUTPUT_FILE
- See Also:
- Constant Field Values
OUTPUT_FORMAT
public static final String OUTPUT_FORMAT
- See Also:
- Constant Field Values
OUTPUT_FORMAT_OPTIONS
public static final String OUTPUT_FORMAT_OPTIONS
- See Also:
- Constant Field Values
ENCODING_PROPERTY
public static final String ENCODING_PROPERTY
- See Also:
- Constant Field Values
TAG_SEPARATOR_PROPERTY
public static final String TAG_SEPARATOR_PROPERTY
- See Also:
- Constant Field Values
JAR_TAGGER_PATH
public static final String JAR_TAGGER_PATH
- The directory in a jar file in which to find a tagger resource specified by jar:file
- See Also:
- Constant Field Values
TaggerConfig
public TaggerConfig(TaggerConfig old)
- We force you to pass in a TaggerConfig rather than any other
superclass so that we know the arg error checking has already occurred
TaggerConfig
public TaggerConfig(String... args)
getModel
public String getModel()
getJarModel
public String getJarModel()
getFile
public String getFile()
getOutputFile
public String getOutputFile()
getOutputFormat
public String getOutputFormat()
getOutputOptions
public String[] getOutputOptions()
getOutputVerbosity
public boolean getOutputVerbosity()
getOutputLemmas
public boolean getOutputLemmas()
getOutputOptionsContains
public boolean getOutputOptionsContains(String sought)
getSearch
public String getSearch()
getSigmaSquared
public double getSigmaSquared()
getIterations
public int getIterations()
getRareWordThresh
public int getRareWordThresh()
getMinFeatureThresh
public int getMinFeatureThresh()
getCurWordMinFeatureThresh
public int getCurWordMinFeatureThresh()
getRareWordMinFeatureThresh
public int getRareWordMinFeatureThresh()
getVeryCommonWordThresh
public int getVeryCommonWordThresh()
occuringTagsOnly
public boolean occuringTagsOnly()
possibleTagsOnly
public boolean possibleTagsOnly()
getLang
public String getLang()
getOpenClassTags
public String[] getOpenClassTags()
getClosedClassTags
public String[] getClosedClassTags()
getLearnClosedClassTags
public boolean getLearnClosedClassTags()
getClosedTagThreshold
public int getClosedTagThreshold()
getArch
public String getArch()
getDebug
public boolean getDebug()
getDebugPrefix
public String getDebugPrefix()
getTokenizerFactory
public String getTokenizerFactory()
getDefaultTagSeparator
public static String getDefaultTagSeparator()
getTagSeparator
public final String getTagSeparator()
getTokenize
public boolean getTokenize()
getEncoding
public String getEncoding()
getRegL1
public double getRegL1()
getXMLInput
public String[] getXMLInput()
getVerbose
public boolean getVerbose()
getVerboseResults
public boolean getVerboseResults()
getSGML
public boolean getSGML()
getTagInside
public String getTagInside()
- Return a regex of XML elements to tag inside of. This may return an
empty String, but never null.
- Returns:
- A regex of XML elements to tag inside of
getTokenizerOptions
public String getTokenizerOptions()
getTokenizerInvertible
public boolean getTokenizerInvertible()
getDefaultScore
public double getDefaultScore()
- Returns a default score to be used for each tag that is incompatible with
the current word (e.g., the tag CC for the word "apple"). Using a default
score may slightly decrease performance for some languages (e.g., Chinese and
German), but allows the tagger to run considerably faster (since the computation
of the normalization term Z requires much less feature extraction). This approximation
does not decrease performance in English (on the WSJ). If this function returns
0.0, the tagger will compute exact scores.
- Returns:
- default score
dump
public void dump()
dump
public void dump(PrintStream stream)
toString
public String toString()
- Overrides:
toString
in class Hashtable<Object,Object>
getSentenceDelimiter
public String getSentenceDelimiter()
- This returns the sentence delimiter used when tokenizing text
using the tokenizer requested in this config. In general, it is
assumed the tokenizer doesn't need a sentence delimiter... if you
use the whitespace tokenizer, though, a newline breaks sentences.
useStdin
public boolean useStdin()
- Returns whether or not we should use stdin for reading when
tagging data. For now, this returns true iff the filename given
was "stdin".
(TODO: kind of ugly)
getMode
public TaggerConfig.Mode getMode()
saveConfig
public void saveConfig(OutputStream os)
throws IOException
- Serialize the TaggerConfig.
- Parameters:
os
- Where to write this TaggerConfig
- Throws:
IOException
- If any IO problems
readConfig
public static TaggerConfig readConfig(DataInputStream stream)
throws IOException,
ClassNotFoundException
- Read in a TaggerConfig.
- Parameters:
stream
- Where to read from
- Returns:
- The TaggerConfig
- Throws:
IOException
- Misc IOError
ClassNotFoundException
- Class error
Stanford NLP Group