edu.stanford.nlp.parser.lexparser
Class TestOptions

java.lang.Object
  extended by edu.stanford.nlp.parser.lexparser.TestOptions
All Implemented Interfaces:
Serializable

public class TestOptions
extends Object
implements Serializable

Options to the parser which affect performance only at testing (parsing) time.
The Options class that stores the TestOptions stores the TestOptions as a transient object. This means that whatever options get set at creation time are forgotten when the parser is serialized. If you want an option to be remembered when the parser is reloaded, put it in either TrainOptions or in Options itself.

Author:
Dan Klein
See Also:
Serialized Form

Field Summary
 boolean addMissingFinalPunctuation
          If a token list does not have sentence final punctuation near the end, then automatically add the default one.
 double depWeight
          Weighting on dependency log probs.
 boolean doRecovery
          If true, then failure of the PCFG factor to parse a sentence will trigger parse recovery mode.
 boolean evalb
          Write EvalB-readable output files.
 Properties evals
          What evaluations to report and how to report them (using LexicalizedParser).
 boolean exhaustiveTest
           
 int fastFactoredCandidateAddend
          This variable says to find k good factored parses, how many added on best PCFG parses should be examined.
 int fastFactoredCandidateMultiplier
          This variable says to find k good fast factored parses, how many times k of the best PCFG parses should be examined.
 boolean forceTagBeginnings
           
 boolean forceTags
          Parse using only tags given from correct answer or the POS tagger
 boolean increasingLength
          Parse trees in test treebank in order of increasing length.
 boolean iterativeCKY
          If true, use faster iterative deepening CKY algorithm.
 boolean lengthNormalization
          Turns on normalizing scores for sentence length.
 int MAX_ITEMS
          The maximum number of edges and hooks combined that the factored parser will build before giving up.
 int maxLength
          The maximum sentence length (including punctuation, etc.) to parse.
 int maxSpanForTags
          The largest span to consider for word-hood.
 boolean noFunctionalForcing
          Only valid with force tags - strips away functionals when forcing the tags, meaning tags have to start appropriately but the parser will assign the functional part.
 boolean noRecoveryTagging
          If false, then failure of the PCFG parser to parse a sentence will trigger allowing all tags for words in parse recovery mode, with a log probability of -1000.
 String outputFilesDirectory
          If the writeOutputFiles option is true, then output files appear in this directory.
 String outputFilesExtension
          If the writeOutputFiles option is true, then output files appear with this extension.
 String outputFilesPrefix
          If the writeOutputFiles option is true, then output files appear with this prefix.
 String outputFormat
          Determines format of output trees: choose among penn, oneline
 String outputFormatOptions
           
 String outputkBestEquivocation
          If this option is not null, output the k-best equivocation.
 boolean pcfgThreshold
          If this variable is true, and the sum of the inside and outside score for a constituent is worse than the best known score for a sentence by more than pcfgThresholdValue, then -Inf is returned as the outside Score by oScore() (while otherwise the true outside score is returned).
 double pcfgThresholdValue
           
 boolean preTag
          Tag the sentences first, then parse given those (coarse) tags.
 boolean printAllBestParses
          Print out all best PCFG parses.
 int printFactoredKGood
          Printing k-best parses from PCFG, when k > 0.
 int printPCFGkBest
          Printing k-best parses from PCFG, when k > 0.
 boolean prunePunc
           
 boolean sample
          Used when you want to generate sample parses instead of finding the best parse.
 String taggerSerializedFile
          POS tagger model used when preTag is enabled.
 double unseenSmooth
          The amount of smoothing put in (as an m-estimate) for unknown words.
 boolean useFastFactored
          If true, use approximate factored algorithm, which just rescores PCFG k best, rather than exact factored algorithm.
 boolean useLexiconToScoreDependencyPwGt
          If this is true, the Lexicon is used to score P(w|t) in the backoff inside the dependency grammar.
 boolean useN5
          If true, the n^4 "speed-up" is not used with the Factored Parser.
 boolean useNonProjectiveDependencyParser
          If this is true, perform non-projective dependency parsing.
 boolean verbose
          Print a lot of extra output as you parse.
 boolean writeOutputFiles
          If true, write files parsed to a new file with the same name except for an added ".stp" extension.
 
Constructor Summary
TestOptions()
           
 
Method Summary
 void display()
           
 TreePrint treePrint(TreebankLangParserParams tlpParams)
          Determines method for print trees on output.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

noRecoveryTagging

public boolean noRecoveryTagging
If false, then failure of the PCFG parser to parse a sentence will trigger allowing all tags for words in parse recovery mode, with a log probability of -1000. If true, these extra taggings are not added. It is false by default. Use option -noRecoveryTagging to set to true.


doRecovery

public boolean doRecovery
If true, then failure of the PCFG factor to parse a sentence will trigger parse recovery mode.


useN5

public boolean useN5
If true, the n^4 "speed-up" is not used with the Factored Parser.


useFastFactored

public boolean useFastFactored
If true, use approximate factored algorithm, which just rescores PCFG k best, rather than exact factored algorithm. This algorithm requires the dependency grammar to exist for rescoring, but not for the dependency grammar to be run. Hence the correct usage for guarding code only required for exact A* factored parsing is now if (op.doPCFG && op.doDep && ! Test.useFastFactored).


iterativeCKY

public boolean iterativeCKY
If true, use faster iterative deepening CKY algorithm.


maxLength

public int maxLength
The maximum sentence length (including punctuation, etc.) to parse.


MAX_ITEMS

public int MAX_ITEMS
The maximum number of edges and hooks combined that the factored parser will build before giving up. This number should probably be relative to the sentence length parsed. In general, though, if the parser cannot parse a sentence after this much work then there is no good parse consistent between the PCFG and Dependency parsers. (Normally, depending on other flags), the parser will then just return the best PCFG parse.)


unseenSmooth

public double unseenSmooth
The amount of smoothing put in (as an m-estimate) for unknown words. If negative, set by the code in the lexicon class.


increasingLength

public boolean increasingLength
Parse trees in test treebank in order of increasing length.


preTag

public boolean preTag
Tag the sentences first, then parse given those (coarse) tags.


forceTags

public boolean forceTags
Parse using only tags given from correct answer or the POS tagger


forceTagBeginnings

public boolean forceTagBeginnings

taggerSerializedFile

public String taggerSerializedFile
POS tagger model used when preTag is enabled.


noFunctionalForcing

public boolean noFunctionalForcing
Only valid with force tags - strips away functionals when forcing the tags, meaning tags have to start appropriately but the parser will assign the functional part.


evalb

public boolean evalb
Write EvalB-readable output files.


verbose

public boolean verbose
Print a lot of extra output as you parse.


exhaustiveTest

public final boolean exhaustiveTest
See Also:
Constant Field Values

pcfgThreshold

public final boolean pcfgThreshold
If this variable is true, and the sum of the inside and outside score for a constituent is worse than the best known score for a sentence by more than pcfgThresholdValue, then -Inf is returned as the outside Score by oScore() (while otherwise the true outside score is returned).

See Also:
Constant Field Values

pcfgThresholdValue

public final double pcfgThresholdValue
See Also:
Constant Field Values

printAllBestParses

public boolean printAllBestParses
Print out all best PCFG parses.


depWeight

public double depWeight
Weighting on dependency log probs. The dependency grammar negative log probability scores are simply multiplied by this number.


prunePunc

public boolean prunePunc

addMissingFinalPunctuation

public boolean addMissingFinalPunctuation
If a token list does not have sentence final punctuation near the end, then automatically add the default one. This might help parsing if the treebank is all punctuated. Not done if reading a treebank.


outputFormat

public String outputFormat
Determines format of output trees: choose among penn, oneline


outputFormatOptions

public String outputFormatOptions

writeOutputFiles

public boolean writeOutputFiles
If true, write files parsed to a new file with the same name except for an added ".stp" extension.


outputFilesDirectory

public String outputFilesDirectory
If the writeOutputFiles option is true, then output files appear in this directory. An unset value (null) means to use the directory of the source files. Use "" or . for the current directory.


outputFilesExtension

public String outputFilesExtension
If the writeOutputFiles option is true, then output files appear with this extension. Use "" for no extension.


outputFilesPrefix

public String outputFilesPrefix
If the writeOutputFiles option is true, then output files appear with this prefix.


outputkBestEquivocation

public String outputkBestEquivocation
If this option is not null, output the k-best equivocation. Must be specified with printPCFGkBest.


maxSpanForTags

public int maxSpanForTags
The largest span to consider for word-hood. Used for parsing unsegmented Chinese text and parsing lattices. Keep it at 1 unless you know what you're doing.


lengthNormalization

public boolean lengthNormalization
Turns on normalizing scores for sentence length. Makes no difference (except decreased efficiency) unless maxSpanForTags is greater than one. Works only for PCFG (so far).


sample

public boolean sample
Used when you want to generate sample parses instead of finding the best parse. (NOT YET USED.)


printPCFGkBest

public int printPCFGkBest
Printing k-best parses from PCFG, when k > 0.


printFactoredKGood

public int printFactoredKGood
Printing k-best parses from PCFG, when k > 0.


evals

public Properties evals
What evaluations to report and how to report them (using LexicalizedParser). Known evaluations are: pcfgLB, pcfgCB, pcfgDA, pcfgTA, pcfgLL, pcfgRUO, pcfgCUO, pcfgCatE, depDA, depTA, depLL, factLB, factCB, factDA, factTA, factLL. The default is pcfgLB,depDA,factLB,factTA. You need to negate those ones out (e.g., -evals "depDA=false") if you don't want them. LB = ParseEval labeled bracketing, CB = crossing brackets and zero crossing bracket rate, DA = dependency accuracy, TA = tagging accuracy, LL = log likelihood score, RUO/CUO = rules/categories under and over proposed, CatE = evaluation by phrasal category. Known styles are: runningAverages, summary, tsv. The default style is summary. You need to negate it out if you don't want it. Invalid names in the argument to this option are not reported!


fastFactoredCandidateMultiplier

public int fastFactoredCandidateMultiplier
This variable says to find k good fast factored parses, how many times k of the best PCFG parses should be examined.


fastFactoredCandidateAddend

public int fastFactoredCandidateAddend
This variable says to find k good factored parses, how many added on best PCFG parses should be examined.


useLexiconToScoreDependencyPwGt

public boolean useLexiconToScoreDependencyPwGt
If this is true, the Lexicon is used to score P(w|t) in the backoff inside the dependency grammar. (Otherwise, a MLE is used is w is seen, and a constant if w is unseen.


useNonProjectiveDependencyParser

public boolean useNonProjectiveDependencyParser
If this is true, perform non-projective dependency parsing.

Constructor Detail

TestOptions

public TestOptions()
Method Detail

treePrint

public TreePrint treePrint(TreebankLangParserParams tlpParams)
Determines method for print trees on output.

Parameters:
tlpParams - The treebank parser params
Returns:
A suitable tree printing object

display

public void display()


Stanford NLP Group