edu.stanford.nlp.parser.lexparser
Class Train

java.lang.Object
  extended by edu.stanford.nlp.parser.lexparser.Train

public class Train
extends Object

Non-language-specific options for training a grammar from a treebank.

Author:
Dan Klein, Christopher Manning

Field Summary
static boolean cheatPCFG
           
static boolean collinsPunc
          Promote/delete punctuation like Collins.
static int compactGrammar
           
static Set<String> deleteSplitters
           
static double fractionBeforeUnseenCounting
          Start to aggregate signature-tag pairs only for words unseen in the first this fraction of the data.
static boolean gPA
          This variable controls doing 2 levels of parent annotation.
static int HSEL_CUT
           
static boolean hSelSplit
           
static boolean leaveItAll
          if true, leave all PTB (functional tag) annotations (bad)
static boolean leftRec
          Left edge is right-recursive (X << X) Bad.
static boolean leftToRight
           
static boolean markFinalStates
           
static boolean markovFactor
           
static int markovOrder
           
static boolean markUnary
          Mark all unary nodes specially.
static boolean markUnary2
           
static boolean markUnaryTags
           
static boolean noTagSplit
           
static int openClassTypesThreshold
          A POS tag has to have been attributed to more than this number of word types before it is regarded as an open-class tag.
static boolean PA
          This variable controls doing parent annotation of phrasal nodes.
static boolean postGPA
           
static boolean postPA
           
static Set postSplitters
           
static boolean postSplitWithBaseCategory
          Whether, in post-splitting of categories, nodes are annotated with the (grand)parent's base category or with its complete subcategorized category.
static PrintWriter printAnnotatedPW
           
static PrintWriter printBinarizedPW
           
static boolean printStates
           
static boolean printTreeTransformations
          Just for debugging: check that your tree transforms work right or states
static boolean rightRec
          Right edge is right-recursive (X << X) Bad.
static double ruleDiscount
          Discounts the count of BinaryRule's (only, apparently) in training data.
static boolean selectivePostSplit
           
static double selectivePostSplitCutOff
           
static boolean selectiveSplit
          Only split the "common high KL divergence" parent categories....
static double selectiveSplitCutOff
           
static boolean sisterAnnotate
          Selective Sister annotation.
static Set sisterSplitters
           
static boolean smoothedBound
           
static boolean smoothing
          CHANGE ANYTHING BELOW HERE AT YOUR OWN RISK
static boolean splitPrePreT
          Mark all pre-preterminals (also does splitBaseNP: don't need both)
static Set splitters
          Set the splitter strings.
static boolean tagPA
          Parent annotation on tags.
static boolean tagSelectivePostSplit
           
static double tagSelectivePostSplitCutOff
           
static boolean tagSelectiveSplit
          Do parent annotation on tags selectively.
static double tagSelectiveSplitCutOff
           
static boolean xOverX
          X over X is marked (subsumes baseNP marking) Bad.
 
Method Summary
static int compactGrammar()
           
static void display()
           
static boolean outsideFactor()
          If true, declare early -- leave this on except maybe with markov on.
static void printTrainTree(PrintWriter pw, String message, Tree t)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

leaveItAll

public static boolean leaveItAll
if true, leave all PTB (functional tag) annotations (bad)


cheatPCFG

public static boolean cheatPCFG

markovFactor

public static boolean markovFactor

markovOrder

public static int markovOrder

hSelSplit

public static boolean hSelSplit

HSEL_CUT

public static int HSEL_CUT

markFinalStates

public static boolean markFinalStates

openClassTypesThreshold

public static int openClassTypesThreshold
A POS tag has to have been attributed to more than this number of word types before it is regarded as an open-class tag. Unknown words will only possibly be tagged as open-class tags (unless flexiTag is on). If flexiTag is on, unknown words will be able to be tagged any POS for which the unseenMap has nonzero count (that is, the tag was seen for a new word after unseen signature counting was started).


fractionBeforeUnseenCounting

public static double fractionBeforeUnseenCounting
Start to aggregate signature-tag pairs only for words unseen in the first this fraction of the data.


PA

public static boolean PA
This variable controls doing parent annotation of phrasal nodes. Good.


gPA

public static boolean gPA
This variable controls doing 2 levels of parent annotation. Bad.


postPA

public static boolean postPA

postGPA

public static boolean postGPA

selectiveSplit

public static boolean selectiveSplit
Only split the "common high KL divergence" parent categories.... Good.


selectiveSplitCutOff

public static double selectiveSplitCutOff

selectivePostSplit

public static boolean selectivePostSplit

selectivePostSplitCutOff

public static double selectivePostSplitCutOff

postSplitWithBaseCategory

public static boolean postSplitWithBaseCategory
Whether, in post-splitting of categories, nodes are annotated with the (grand)parent's base category or with its complete subcategorized category.


sisterAnnotate

public static boolean sisterAnnotate
Selective Sister annotation.


sisterSplitters

public static Set sisterSplitters

markUnary

public static boolean markUnary
Mark all unary nodes specially. Good for just PCFG. Bad for factored. (1 better than 2 in combos)


markUnary2

public static boolean markUnary2

markUnaryTags

public static boolean markUnaryTags

splitPrePreT

public static boolean splitPrePreT
Mark all pre-preterminals (also does splitBaseNP: don't need both)


tagPA

public static boolean tagPA
Parent annotation on tags. Good (for PCFG?)


tagSelectiveSplit

public static boolean tagSelectiveSplit
Do parent annotation on tags selectively. Neutral, but less splits.


tagSelectiveSplitCutOff

public static double tagSelectiveSplitCutOff

tagSelectivePostSplit

public static boolean tagSelectivePostSplit

tagSelectivePostSplitCutOff

public static double tagSelectivePostSplitCutOff

rightRec

public static boolean rightRec
Right edge is right-recursive (X << X) Bad. (NP only is good)


leftRec

public static boolean leftRec
Left edge is right-recursive (X << X) Bad.


xOverX

public static boolean xOverX
X over X is marked (subsumes baseNP marking) Bad.


collinsPunc

public static boolean collinsPunc
Promote/delete punctuation like Collins. Bad (!)


splitters

public static Set splitters
Set the splitter strings. These are a set of parent and/or grandparent annotated categories which should be split off.


postSplitters

public static Set postSplitters

deleteSplitters

public static Set<String> deleteSplitters

printTreeTransformations

public static boolean printTreeTransformations
Just for debugging: check that your tree transforms work right or states


printAnnotatedPW

public static PrintWriter printAnnotatedPW

printBinarizedPW

public static PrintWriter printBinarizedPW

printStates

public static boolean printStates

compactGrammar

public static int compactGrammar

leftToRight

public static boolean leftToRight

noTagSplit

public static boolean noTagSplit

smoothing

public static boolean smoothing
CHANGE ANYTHING BELOW HERE AT YOUR OWN RISK


smoothedBound

public static boolean smoothedBound

ruleDiscount

public static double ruleDiscount
Discounts the count of BinaryRule's (only, apparently) in training data.

Method Detail

outsideFactor

public static boolean outsideFactor()
If true, declare early -- leave this on except maybe with markov on.


compactGrammar

public static int compactGrammar()

display

public static void display()

printTrainTree

public static void printTrainTree(PrintWriter pw,
                                  String message,
                                  Tree t)


Stanford NLP Group