edu.stanford.nlp.parser.lexparser
Class Test

java.lang.Object
  extended by edu.stanford.nlp.parser.lexparser.Test

public class Test
extends Object

Options to the parser which affect performance only at testing (parsing) time.

Author:
Dan Klein

Nested Class Summary
static class Test.Constraint
          A Constraint represents a restriction on possible parse trees to consider.
 
Field Summary
static boolean addMissingFinalPunctuation
          If a token list does not have sentence final punctuation near the end, then automatically add the default one.
static List<Test.Constraint> constraints
          When you want to force the parse to parse a particular subsequence into a particular state.
static double depWeight
          Weighting on dependency log probs.
static boolean doRecovery
          If true, then failure of the PCFG factor to parse a sentence will trigger parse recovery mode.
static boolean evalb
          Write EvalB-readable output files.
static Properties evals
          What evaluations to report and how to report them (using LexicalizedParser).
static boolean exhaustiveTest
           
static int fastFactoredCandidateAddend
          This variable says to find k good factored parses, how many added on best PCFG parses should be examined.
static int fastFactoredCandidateMultiplier
          This variable says to find k good fast factored parses, how many times k of the best PCFG parses should be examined.
static boolean forceTagBeginnings
           
static boolean forceTags
          Parse using only tags given from correct answer or the POS tagger
static boolean increasingLength
          Parse trees in test treebank in order of increasing length.
static boolean iterativeCKY
          If true, use faster iterative deepening CKY algorithm.
static boolean lengthNormalization
          Turns on normalizing scores for sentence length.
static int MAX_ITEMS
          The maximum number of edges and hooks combined that the factored parser will build before giving up.
static int maxLength
          The maximum sentence length (including punctuation, etc.) to parse.
static int maxSpanForTags
          The largest span to consider for word-hood.
static boolean noFunctionalForcing
          Only valid with force tags - strips away functionals when forcing the tags, meaning tags have to start appropriately but the parser will assign the functional part.
static boolean noRecoveryTagging
          If false, then failure of the PCFG parser to parse a sentence will trigger allowing all tags for words in parse recovery mode, with a log probability of -1000.
static String outputFilesDirectory
          If the writeOutputFiles option is true, then output files appear in this directory.
static String outputFilesExtension
          If the writeOutputFiles option is true, then output files appear with this extension.
static String outputFormat
          Determines format of output trees: choose among penn, oneline
static String outputFormatOptions
           
static boolean pcfgThreshold
          If this variable is true, and the sum of the inside and outside score for a constituent is worse than the best known score for a sentence by more than pcfgThresholdValue, then -Inf is returned as the outside Score by oScore() (while otherwise the true outside score is returned).
static double pcfgThresholdValue
           
static boolean preTag
          Tag the sentences first, then parse given those (coarse) tags.
static boolean printAllBestParses
          Print out all best PCFG parses.
static int printFactoredKGood
          Printing k-best parses from PCFG, when k > 0.
static int printPCFGkBest
          Printing k-best parses from PCFG, when k > 0.
static boolean prunePunc
           
static boolean sample
          Used when you want to generate sample parses instead of finding the best parse.
static double unseenSmooth
          The amount of smoothing put in (as an m-estimate) for unknown words.
static boolean useFastFactored
          If true, use approximate factored algorithm, which just rescores PCFG k best, rather than exact factored algorithm.
static boolean useLexiconToScoreDependencyPwGt
          If this is true, the Lexicon is used to score P(w|t) in the backoff inside the dependency grammar.
static boolean useN5
          If true, the n^4 "speed-up" is not used with the Factored Parser.
static boolean verbose
          Print a lot of extra output as you parse.
static boolean writeOutputFiles
          If true, write files parsed to a new file with the same name except for an added ".stp" extension.
 
Method Summary
static void display()
           
static TreePrint treePrint(TreebankLangParserParams tlpParams)
          Determines method for print trees on output.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

noRecoveryTagging

public static boolean noRecoveryTagging
If false, then failure of the PCFG parser to parse a sentence will trigger allowing all tags for words in parse recovery mode, with a log probability of -1000. If true, these extra taggings are not added. It is false by default. Use option -noRecoveryTagging to set to true.


doRecovery

public static boolean doRecovery
If true, then failure of the PCFG factor to parse a sentence will trigger parse recovery mode.


useN5

public static boolean useN5
If true, the n^4 "speed-up" is not used with the Factored Parser.


useFastFactored

public static boolean useFastFactored
If true, use approximate factored algorithm, which just rescores PCFG k best, rather than exact factored algorithm. This algorithm requires the dependency grammar to exist for rescoring, but not for the dependency grammar to be run. Hence the correct usage for guarding code only required for exact A* factored parsing is now if (op.doPCFG && op.doDep && ! Test.useFastFactored).


iterativeCKY

public static boolean iterativeCKY
If true, use faster iterative deepening CKY algorithm.


maxLength

public static int maxLength
The maximum sentence length (including punctuation, etc.) to parse.


MAX_ITEMS

public static int MAX_ITEMS
The maximum number of edges and hooks combined that the factored parser will build before giving up. This number should probably be relative to the sentence length parsed. In general, though, if the parser cannot parse a sentence after this much work then there is no good parse consistent between the PCFG and Dependency parsers. (Normally, depending on other flags), the parser will then just return the best PCFG parse.)


unseenSmooth

public static double unseenSmooth
The amount of smoothing put in (as an m-estimate) for unknown words. If negative, set by the code in the lexicon class.


increasingLength

public static boolean increasingLength
Parse trees in test treebank in order of increasing length.


preTag

public static boolean preTag
Tag the sentences first, then parse given those (coarse) tags.


forceTags

public static boolean forceTags
Parse using only tags given from correct answer or the POS tagger


forceTagBeginnings

public static boolean forceTagBeginnings

noFunctionalForcing

public static boolean noFunctionalForcing
Only valid with force tags - strips away functionals when forcing the tags, meaning tags have to start appropriately but the parser will assign the functional part.


evalb

public static boolean evalb
Write EvalB-readable output files.


verbose

public static boolean verbose
Print a lot of extra output as you parse.


exhaustiveTest

public static final boolean exhaustiveTest
See Also:
Constant Field Values

pcfgThreshold

public static final boolean pcfgThreshold
If this variable is true, and the sum of the inside and outside score for a constituent is worse than the best known score for a sentence by more than pcfgThresholdValue, then -Inf is returned as the outside Score by oScore() (while otherwise the true outside score is returned).

See Also:
Constant Field Values

pcfgThresholdValue

public static final double pcfgThresholdValue
See Also:
Constant Field Values

printAllBestParses

public static boolean printAllBestParses
Print out all best PCFG parses.


depWeight

public static double depWeight
Weighting on dependency log probs. The dependency grammar negative log probability scores are simply multiplied by this number.


prunePunc

public static boolean prunePunc

addMissingFinalPunctuation

public static boolean addMissingFinalPunctuation
If a token list does not have sentence final punctuation near the end, then automatically add the default one. This might help parsing if the treebank is all punctuated. Not done if reading a treebank.


outputFormat

public static String outputFormat
Determines format of output trees: choose among penn, oneline


outputFormatOptions

public static String outputFormatOptions

writeOutputFiles

public static boolean writeOutputFiles
If true, write files parsed to a new file with the same name except for an added ".stp" extension.


outputFilesDirectory

public static String outputFilesDirectory
If the writeOutputFiles option is true, then output files appear in this directory. An unset value (null) means to use the directory of the source files. Use "" or . for the current directory.


outputFilesExtension

public static String outputFilesExtension
If the writeOutputFiles option is true, then output files appear with this extension. An unset value (null) means to use the default of "stp". Use "" for no extension.


maxSpanForTags

public static int maxSpanForTags
The largest span to consider for word-hood. Used for parsing unsegmented Chinese text and parsing lattices. Keep it at 1 unless you know what you're doing.


lengthNormalization

public static boolean lengthNormalization
Turns on normalizing scores for sentence length. Makes no difference (except decreased efficiency) unless maxSpanForTags is greater than one. Works only for PCFG (so far).


constraints

public static List<Test.Constraint> constraints
When you want to force the parse to parse a particular subsequence into a particular state. Parses will only be made where there is a constituent over the given span which matches (as regular expression) the state Pattern given.


sample

public static boolean sample
Used when you want to generate sample parses instead of finding the best parse. (NOT YET USED.)


printPCFGkBest

public static int printPCFGkBest
Printing k-best parses from PCFG, when k > 0.


printFactoredKGood

public static int printFactoredKGood
Printing k-best parses from PCFG, when k > 0.


evals

public static Properties evals
What evaluations to report and how to report them (using LexicalizedParser). Known evaluations are: pcfgLB, pcfgCB, pcfgDA, pcfgTA, pcfgLL, pcfgRUO, pcfgCUO, pcfgCatE, depDA, depTA, depLL, factLB, factCB, factDA, factTA, factLL. The default is pcfgLB,depDA,factLB,factTA. You need to negate those ones out (e.g., -evals "depDA=false") if you don't want them. LB = ParseEval labeled bracketing, CB = crossing brackets and zero crossing bracket rate, DA = dependency accuracy, TA = tagging accuracy, LL = log likelihood score, RUO/CUO = rules/categories under and over proposed, CatE = evaluation by phrasal category. Known styles are: runningAverages, summary, tsv. The default style is summary. You need to negate it out if you don't want it. Invalid names in the argument to this option are not reported!


fastFactoredCandidateMultiplier

public static int fastFactoredCandidateMultiplier
This variable says to find k good fast factored parses, how many times k of the best PCFG parses should be examined.


fastFactoredCandidateAddend

public static int fastFactoredCandidateAddend
This variable says to find k good factored parses, how many added on best PCFG parses should be examined.


useLexiconToScoreDependencyPwGt

public static boolean useLexiconToScoreDependencyPwGt
If this is true, the Lexicon is used to score P(w|t) in the backoff inside the dependency grammar. (Otherwise, a MLE is used is w is seen, and a constant if w is unseen.

Method Detail

treePrint

public static TreePrint treePrint(TreebankLangParserParams tlpParams)
Determines method for print trees on output.

Parameters:
tlpParams - The treebank parser params
Returns:
A suitable tree printing object

display

public static void display()


Stanford NLP Group