edu.stanford.nlp.parser.lexparser
Class Options.LexOptions

java.lang.Object
  extended by edu.stanford.nlp.parser.lexparser.Options.LexOptions
All Implemented Interfaces:
java.io.Serializable
Enclosing class:
Options

public static class Options.LexOptions
extends java.lang.Object
implements java.io.Serializable

See Also:
Serialized Form

Field Summary
static java.lang.String DEFAULT_WORD_VECTOR_FILE
          RS: file for Turian's word vectors The default value is an example of size 25 word vectors on the nlp machines
 boolean flexiTag
           
 int numHid
          Number of hidden units in the word vectors.
 boolean smartMutation
          Smarter smoothing for rare words.
 int smoothInUnknownsThreshold
          Words more common than this are tagged with MLE P(t|w).
 int unknownPrefixSize
          For certain Lexicons, a certain number of word-initial letters are used to subclassify the unknown token.
 int unknownSuffixSize
          For certain Lexicons, a certain number of word-final letters are used to subclassify the unknown token.
 boolean useSignatureForKnownSmoothing
          Whether to use signature rather than just being unknown as prior in known word smoothing.
 boolean useUnicodeType
          Make use of unicode code point types in smoothing.
 int useUnknownWordSignatures
          Whether to use suffix and capitalization information for unknowns.
 java.lang.String uwModelTrainer
          Model for unknown words that the lexicon should use.
 java.lang.String wordClassesFile
          A file of word class data which may be used for smoothing, normally instead of hand-specified signatures.
 java.lang.String wordVectorFile
           
 
Constructor Summary
Options.LexOptions()
           
 
Method Summary
 void readData(java.io.BufferedReader in)
           
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

useUnknownWordSignatures

public int useUnknownWordSignatures
Whether to use suffix and capitalization information for unknowns. Within the BaseLexicon model options have the following meaning: 0 means a single unknown token. 1 uses suffix, and capitalization. 2 uses a variant (richer) form of signature. Good. Use this one. Using the richer signatures in versions 3 or 4 seems to have very marginal or no positive value. 3 uses a richer form of signature that mimics the NER word type patterns. 4 is a variant of 2. 5 is another with more English specific morphology (good for English unknowns!). 6-9 are options for Arabic. 9 codes some patterns for numbers and derivational morphology, but also supports unknownPrefixSize and unknownSuffixSize. For German, 0 means a single unknown token, and non-zero means to use capitalization of first letter and a suffix of length unknownSuffixSize.


DEFAULT_WORD_VECTOR_FILE

public static final java.lang.String DEFAULT_WORD_VECTOR_FILE
RS: file for Turian's word vectors The default value is an example of size 25 word vectors on the nlp machines

See Also:
Constant Field Values

wordVectorFile

public java.lang.String wordVectorFile

numHid

public int numHid
Number of hidden units in the word vectors. As setting of 0 will make it try to extract the size from the data file.


smoothInUnknownsThreshold

public int smoothInUnknownsThreshold
Words more common than this are tagged with MLE P(t|w). Default 100. The smoothing is sufficiently slight that changing this has little effect. But set this to 0 to be able to use the parser as a vanilla PCFG with no smoothing (not as a practical parser but for exposition or debugging).


smartMutation

public boolean smartMutation
Smarter smoothing for rare words.


useUnicodeType

public boolean useUnicodeType
Make use of unicode code point types in smoothing.


unknownSuffixSize

public int unknownSuffixSize
For certain Lexicons, a certain number of word-final letters are used to subclassify the unknown token. This gives the number of letters.


unknownPrefixSize

public int unknownPrefixSize
For certain Lexicons, a certain number of word-initial letters are used to subclassify the unknown token. This gives the number of letters.


uwModelTrainer

public java.lang.String uwModelTrainer
Model for unknown words that the lexicon should use. This is the name of a class.


flexiTag

public boolean flexiTag

useSignatureForKnownSmoothing

public boolean useSignatureForKnownSmoothing
Whether to use signature rather than just being unknown as prior in known word smoothing. Currently only works if turned on for English.


wordClassesFile

public java.lang.String wordClassesFile
A file of word class data which may be used for smoothing, normally instead of hand-specified signatures.

Constructor Detail

Options.LexOptions

public Options.LexOptions()
Method Detail

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object

readData

public void readData(java.io.BufferedReader in)
              throws java.io.IOException
Throws:
java.io.IOException


Stanford NLP Group