-
trainTreeFile
java.lang.String trainTreeFile
-
trainLengthLimit
int trainLengthLimit
-
cheatPCFG
boolean cheatPCFG
Add all test set trees to training data for PCFG.
(Currently only supported in FactoredParser main.)
-
markovFactor
boolean markovFactor
Whether to do "horizontal Markovization" (as in ACL 2003 paper).
False means regular PCFG expansions.
-
markovOrder
int markovOrder
-
hSelSplit
boolean hSelSplit
-
HSEL_CUT
int HSEL_CUT
-
markFinalStates
boolean markFinalStates
Whether or not to mark final states in binarized grammar.
This must be off to get most value out of grammar compaction.
-
openClassTypesThreshold
int openClassTypesThreshold
A POS tag has to have been attributed to more than this number of word
types before it is regarded as an open-class tag. Unknown words will
only possibly be tagged as open-class tags (unless flexiTag is on).
If flexiTag is on, unknown words will be able to be tagged any POS for
which the unseenMap has nonzero count (that is, the tag was seen for
a new word after unseen signature counting was started).
-
fractionBeforeUnseenCounting
double fractionBeforeUnseenCounting
Start to aggregate signature-tag pairs only for words unseen in the first
this fraction of the data.
-
PA
boolean PA
This variable controls doing parent annotation of phrasal nodes. Good.
-
gPA
boolean gPA
This variable controls doing 2 levels of parent annotation. Bad.
-
postPA
boolean postPA
-
postGPA
boolean postGPA
-
selectiveSplit
boolean selectiveSplit
Only split the "common high KL divergence" parent categories.... Good.
-
selectiveSplitCutOff
double selectiveSplitCutOff
-
selectivePostSplit
boolean selectivePostSplit
-
selectivePostSplitCutOff
double selectivePostSplitCutOff
-
postSplitWithBaseCategory
boolean postSplitWithBaseCategory
Whether, in post-splitting of categories, nodes are annotated with the
(grand)parent's base category or with its complete subcategorized
category.
-
sisterAnnotate
boolean sisterAnnotate
Selective Sister annotation.
-
sisterSplitters
java.util.Set<E> sisterSplitters
-
markUnary
int markUnary
Mark all unary nodes specially. Good for just PCFG. Bad for factored.
markUnary affects phrasal nodes. A value of 0 means to do nothing;
a value of 1 means to mark the parent (higher) node of a unary rewrite.
A value of 2 means to mark the child (lower) node of a unary rewrie.
Values of 1 and 2 only apply if the child (lower) node is phrasal.
(A value of 1 is better than 2 in combos.) A value of 1 corresponds
to the old boolean -unary flag.
-
markUnaryTags
boolean markUnaryTags
Mark POS tags which are the sole member of their phrasal constituent.
This is like markUnary=2, applied to POS tags.
-
splitPrePreT
boolean splitPrePreT
Mark all pre-preterminals (also does splitBaseNP: don't need both)
-
tagPA
boolean tagPA
Parent annotation on tags. Good (for PCFG?)
-
tagSelectiveSplit
boolean tagSelectiveSplit
Do parent annotation on tags selectively. Neutral, but less splits.
-
tagSelectiveSplitCutOff
double tagSelectiveSplitCutOff
-
tagSelectivePostSplit
boolean tagSelectivePostSplit
-
tagSelectivePostSplitCutOff
double tagSelectivePostSplitCutOff
-
rightRec
boolean rightRec
Right edge is right-recursive (X << X) Bad. (NP only is good)
-
leftRec
boolean leftRec
Left edge is right-recursive (X << X) Bad.
-
collinsPunc
boolean collinsPunc
Promote/delete punctuation like Collins. Bad (!)
-
splitters
java.util.Set<E> splitters
Set the splitter strings. These are a set of parent and/or grandparent
annotated categories which should be split off.
-
postSplitters
java.util.Set<E> postSplitters
-
deleteSplitters
java.util.Set<E> deleteSplitters
-
printTreeTransformations
int printTreeTransformations
Just for debugging: check that your tree transforms work correctly. This
will print the transformations of the first printTreeTransformations trees.
-
printAnnotatedPW
java.io.PrintWriter printAnnotatedPW
-
printBinarizedPW
java.io.PrintWriter printBinarizedPW
-
printStates
boolean printStates
-
compactGrammar
int compactGrammar
How to compact grammars as FSMs.
0 = no compaction [uses makeSyntheticLabel1],
1 = no compaction but use label names that wrap from right to left in binarization [uses makeSyntheticLabel2],
2 = wrapping labels and materialize unary at top rewriting passive to active,
3 = ExactGrammarCompactor,
4 = LossyGrammarCompactor,
5 = CategoryMergingGrammarCompactor.
(May 2007 CDM note: options 4 and 5 don't seem to be functioning sensibly. 0, 1, and 3
seem to be the 'good' options. 2 is only useful as input to 3. There seems to be
no reason not to use 0, despite the default.)
-
leftToRight
boolean leftToRight
-
noTagSplit
boolean noTagSplit
-
ruleSmoothing
boolean ruleSmoothing
Enables linear rule smoothing during grammar extraction
but before grammar compaction. The alpha term is the same
as that described in Petrov et al. (2006), and has range [0,1].
-
ruleSmoothingAlpha
double ruleSmoothingAlpha
-
smoothing
boolean smoothing
TODO wsg2011: This is the old grammar smoothing parameter that no
longer does anything in the parser. It should be removed.
-
ruleDiscount
double ruleDiscount
Discounts the count of BinaryRule's (only, apparently) in training data.
-
printAnnotatedRuleCounts
boolean printAnnotatedRuleCounts
-
printAnnotatedStateCounts
boolean printAnnotatedStateCounts
-
basicCategoryTagsInDependencyGrammar
boolean basicCategoryTagsInDependencyGrammar
Where to use the basic or split tags in the dependency grammar
-
preTransformer
TreeTransformer preTransformer
A transformer to use on the training data before any other
processing step. This is specified by using the -preTransformer
flag when training the parser. A comma separated list of classes
will be turned into a CompositeTransformer. This can be used to
strip subcategories, to run a tsurgeon pattern, or any number of
other useful operations.
-
taggedFiles
java.lang.String taggedFiles
A set of files to use as extra information in the lexicon. This
can provide tagged words which are not part of trees
-
predictSplits
boolean predictSplits
Use the method reported by Berkeley for splitting and recombining
states. This is an experimental and still in development
reimplementation of that work.
-
splitCount
int splitCount
If we are predicting splits, we loop this many times
-
splitRecombineRate
double splitRecombineRate
If we are predicting splits, we recombine states at this rate every loop
-
simpleBinarizedLabels
boolean simpleBinarizedLabels
When binarizing trees, don't annotate the labels with anything
-
noRebinarization
boolean noRebinarization
When binarizing trees, don't binarize trees with two children.
Only applies when using inside markov binarization for now.
-
trainingThreads
int trainingThreads
If the training algorithm allows for parallelization, how many
threads to use
-
dvKBest
int dvKBest
-
trainingIterations
int trainingIterations
-
batchSize
int batchSize
-
regCost
double regCost
-
qnIterationsPerBatch
int qnIterationsPerBatch
-
qnEstimates
int qnEstimates
When training the DV parsing method, how many estimates to keep
for the qn approximation.
-
qnTolerance
double qnTolerance
When training the DV parsing method, the tolerance to use if we
want to stop qn early
-
debugOutputFrequency
int debugOutputFrequency
If larger than 0, the parser may choose to output debug information
every X seconds, X iterations, or some other similar metric
-
randomSeed
long randomSeed
-
learningRate
double learningRate
How fast to learn (can mean different things for different algorithms)
-
deltaMargin
double deltaMargin
How much to penalize the wrong trees for how different they are
from the gold tree when training
-
unknownNumberVector
boolean unknownNumberVector
Whether or not to build an unknown word vector specifically for numbers
-
unknownDashedWordVectors
boolean unknownDashedWordVectors
Whether or not to handle unknown dashed words by taking the last part
-
unknownCapsVector
boolean unknownCapsVector
Whether or not to build an unknown word vector for words with caps in them
-
dvSimplifiedModel
boolean dvSimplifiedModel
Make the dv model as simple as possible
-
unknownChineseYearVector
boolean unknownChineseYearVector
Whether or not to build an unknown word vector to match Chinese years
-
unknownChineseNumberVector
boolean unknownChineseNumberVector
Whether or not to build an unknown word vector to match Chinese numbers
-
unknownChinesePercentVector
boolean unknownChinesePercentVector
Whether or not to build an unknown word vector to match Chinese percentages
-
scalingForInit
double scalingForInit
How much to scale certain parameters when initializing models.
For example, the DVParser uses this to rescale its initial
matrices.
-
maxTrainTimeSeconds
int maxTrainTimeSeconds
-
unkWord
java.lang.String unkWord
Some models will use external data sources which contain
information about unknown words. This variable is a way to
provide the name of the unknown word in the external data source.
-
lowercaseWordVectors
boolean lowercaseWordVectors
Whether or not to lowercase word vectors
-
transformMatrixType
TrainOptions.TransformMatrixType transformMatrixType
-
useContextWords
boolean useContextWords
Specifically for the DVModel, uses words on either side of a
context when combining constituents. Gives perhaps a microscopic
improvement in performance but causes a large slowdown.
-
trainWordVectors
boolean trainWordVectors
Do we want a model that uses word vectors (such as the DVParser)
to train those word vectors when training the model?
Note: models prior to 2014-02-13 may have incorrect values in
this field, as it was originally a compile time constant
-
stalledIterationLimit
int stalledIterationLimit
How many iterations to allow training to stall before taking the
best model, if training in an iterative manner
-
markStrahler
boolean markStrahler
Horton-Strahler number/dimension (Maximilian Schlund)