edu.stanford.nlp.parser.lexparser
Class EnglishTreebankParserParams.EnglishTrain

java.lang.Object
  extended by edu.stanford.nlp.parser.lexparser.EnglishTreebankParserParams.EnglishTrain
All Implemented Interfaces:
Serializable
Enclosing class:
EnglishTreebankParserParams

public static class EnglishTreebankParserParams.EnglishTrain
extends Object
implements Serializable

See Also:
Serialized Form

Field Summary
 boolean correctTags
          'Correct' tags to produce verbs in VPs, etc.
 boolean dominatesC
          Verbal distance -- mark whether symbol dominates a conjunction (CC)
 boolean dominatesI
          Verbal distance -- mark whether symbol dominates a preposition (IN)
 int dominatesV
          Verbal distance -- mark whether symbol dominates a verb (V*, MD).
 boolean gpaRootVP
          Grand-parent annotate (root mark) VP below ROOT.
 boolean joinJJ
          Joint comparative and superlative adjective with positive.
 boolean joinNounTags
          Join proper nouns with common nouns.
 boolean joinPound
          Join pound with dollar.
 int leaveItAll
          if true, leave all PTB (functional tag) annotations (bad)
 int makePPTOintoIN
          Change TO inside PP to IN.
 int markCC
          Mark phrases which are conjunctions.
 boolean markContainedVP
           
 int markDitransV
          Attempt to record ditransitive verbs.
 boolean markReflexivePRP
          Mark reflexivie PRP words.
 boolean rightPhrasal
          Right edge has a phrasal node.
 int sisterSplitLevel
          Set the support * KL cutoff level (1-4) for sister splitting -- don't use it, as far as we can tell so far
 int splitAux
          Make special tags for forms of BE and HAVE (and maybe DO/HELP, etc.).
 int splitBaseNP
          Mark base NPs.
 int splitCC
          Provide annotation of conjunctions.
 int splitIN
          Annotate prepositions into subcategories.
 boolean splitJJCOMP
          Put a special tag on 'adjectives with complements'.
 boolean splitMoreLess
          Specially mark the comparative/superlative words: less, least, more, most
 int splitNNP
          Mark NNP words as to position in phrase (single, left, right, inside) or subcategorizes NNP(S) as initials or initial/final in NP.
 boolean splitNOT
          Annotates forms of "not" specially as tag "NOT".
 int splitNPADV
          Retain NP-ADV annotation.
 int splitNPNNP
          Mark NP-NNP.
 int splitNPpercent
          Mark phrases that are headed by %.
 boolean splitNPPRP
           
 boolean splitNumNP
          Mark "numeric NPs".
 boolean splitPercent
          Mark the nouns that are percent signs.
 int splitPoss
          Give a special tag to NPs which are possessive NPs (end in 's).
 boolean splitPPJJ
          A special test for "such" mainly ("such as Fred").
 boolean splitQuotes
          Mark quote marks for single vs.
 boolean splitRB
          Split modifier (NP, AdjP) adverbs from others.
 int splitSbar
          Split SBAR nodes.
 boolean splitSFP
          Separate out sentence final punct.
 int splitSGapped
          Mark specially S nodes with "gapped" subject (control, raising).
 int splitSTag
          Mark S/SINV/SQ nodes according to verbal tag.
 int splitTMP
          Retain NP-TMP (or maybe PP-TMP) annotation.
 boolean splitTRJJ
          Put a special tag on 'transitive adjectives' with NP complement, like 'due May 15' -- it also catches 'such' in 'such as NP', which may be a good.
 int splitVP
          Add (head) tags to VPs.
 boolean splitVPNPAgr
          Put enough marking on VP and NP to permit "agreement".
 boolean tagRBGPA
          Grand parent annotate RB to try to distinguish sentential ones and ones in places like NP post modifier (things like 'very' are already distinguished as their parent is ADJP).
 boolean unaryDT
          Mark "Intransitive" DT.
 boolean unaryIN
          Mark "Intransitive" IN.
 boolean unaryPRP
          "Intransitive" PRP.
 boolean unaryRB
          Mark "Intransitive" RB.
 boolean vpSubCat
          Pitiful attempt at marking V* preterms with their surface subcat frames.
 
Method Summary
 void display()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

leaveItAll

public int leaveItAll
if true, leave all PTB (functional tag) annotations (bad)


splitIN

public int splitIN
Annotate prepositions into subcategories. Values: 0 = no annotation 1 = IN with a ^S.* parent (putative subordinating conjunctions) marked differently from others (real prepositions). OK. 2 = Annotate IN prepositions 3 ways: ^S.* parent, ^N.* parent or rest (generally predicative ADJP, VP). Better than sIN=1. Good. 3 = Annotate prepositions 6 ways: real feature engineering. Great. 4 = Refinement of 3: allows -SC under SINV, WHADVP for -T and no -SCC if the parent is an NP. 5 = Like 4 but maps TO to IN in a "nominal" (N*, P*, A*) context. 6 = 4, but mark V/A complement and leave noun ones unmarked instead.


splitQuotes

public boolean splitQuotes
Mark quote marks for single vs. double so don't get mismatched ones.


splitSFP

public boolean splitSFP
Separate out sentence final punct. (. ! ?). Doesn't help.


splitPercent

public boolean splitPercent
Mark the nouns that are percent signs. Slightly good.


splitNPpercent

public int splitNPpercent
Mark phrases that are headed by %. A value of 0 = do nothing, 1 = only NP, 2 = NP and ADJP, 3 = NP, ADJP and QP, 4 = any phrase.


tagRBGPA

public boolean tagRBGPA
Grand parent annotate RB to try to distinguish sentential ones and ones in places like NP post modifier (things like 'very' are already distinguished as their parent is ADJP).


splitNNP

public int splitNNP
Mark NNP words as to position in phrase (single, left, right, inside) or subcategorizes NNP(S) as initials or initial/final in NP.


joinPound

public boolean joinPound
Join pound with dollar.


joinJJ

public boolean joinJJ
Joint comparative and superlative adjective with positive.


joinNounTags

public boolean joinNounTags
Join proper nouns with common nouns. This isn't to improve performance, but because Genia doesn't use proper noun tags in general.


splitPPJJ

public boolean splitPPJJ
A special test for "such" mainly ("such as Fred"). A wash, so omit


splitTRJJ

public boolean splitTRJJ
Put a special tag on 'transitive adjectives' with NP complement, like 'due May 15' -- it also catches 'such' in 'such as NP', which may be a good. Matches 658 times in 2-21 training corpus. Wash.


splitJJCOMP

public boolean splitJJCOMP
Put a special tag on 'adjectives with complements'. This acts as a general subcat feature for adjectives.


splitMoreLess

public boolean splitMoreLess
Specially mark the comparative/superlative words: less, least, more, most


unaryDT

public boolean unaryDT
Mark "Intransitive" DT. Good.


unaryRB

public boolean unaryRB
Mark "Intransitive" RB. Good.


unaryPRP

public boolean unaryPRP
"Intransitive" PRP. Wash -- basically a no-op really.


markReflexivePRP

public boolean markReflexivePRP
Mark reflexivie PRP words.


unaryIN

public boolean unaryIN
Mark "Intransitive" IN. Minutely negative.


splitCC

public int splitCC
Provide annotation of conjunctions. Gives modest gains (numbers shown F1 increase with respect to goodPCFG in June 2005). A value of 1 annotates both "and" and "or" as "CC-C" (+0.29%), 2 annotates "but" and "&" separately (+0.17%), 3 annotates just "and" (equalsIgnoreCase) (+0.11%), 0 annotates nothing (+0.00%).


splitNOT

public boolean splitNOT
Annotates forms of "not" specially as tag "NOT". BAD


splitRB

public boolean splitRB
Split modifier (NP, AdjP) adverbs from others. This does nothing if you're already doing tagPA.


splitAux

public int splitAux
Make special tags for forms of BE and HAVE (and maybe DO/HELP, etc.). A value of 0 is do nothing. A value of 1 is the basic form. Positive PCFG effect, but neutral to negative in Factored, and impossible if you use gPA. A value of 2 adds in "s" = "'s" and delves further to disambiguate "'s" as BE or HAVE. Theoretically good, but no practical gains. A value of 3 adds DO. A value of 4 adds HELP (which also takes VB form complement) as DO. A value of 5 adds LET (which also takes VB form complement) as DO. A value of 6 adds MAKE (which also takes VB form complement) as DO. A value of 7 adds WATCH, SEE (which also take VB form complement) as DO. A value of 8 adds come, go, but not inflections (which colloquially can take a VB form complement) as DO. A value of 9 adds GET as BE. Differences are small. You get about 0.3 F1 by doing something; the best appear to be 2 or 3 for sentence exact and 7 or 8 for LP/LR F1.


vpSubCat

public boolean vpSubCat
Pitiful attempt at marking V* preterms with their surface subcat frames. Bad so far.


markDitransV

public int markDitransV
Attempt to record ditransitive verbs. The value 0 means do nothing; 1 records two or more NP or S* arguments, and 2 means to only record two or more NP arguments (that aren't NP-TMP). 1 gave neutral to bad results.


splitVP

public int splitVP
Add (head) tags to VPs. An argument of 0 = no head-subcategorization of VPs, 1 = add head tags (anything, as given by HeadFinder), 2 = add head tags, but collapse finite verb tags (VBP, VBD, VBZ, MD) together, 3 = only annotate verbal tags, and collapse finite verb tags (annotation is VBF, TO, VBG, VBN, VB, or zero), 4 = only split on categories of VBF, TO, VBG, VBN, VB, and map cases that are not headed by a verbal category to an appropriate category based on word suffix (ing, d, t, s, to) or to VB otherwise. We usually use a value of 3; 2 or 3 is much better than 0. See also splitVPNPAgr. If it is true, its effects override any value set for this parameter.


splitVPNPAgr

public boolean splitVPNPAgr
Put enough marking on VP and NP to permit "agreement".


splitSTag

public int splitSTag
Mark S/SINV/SQ nodes according to verbal tag. Meanings are: 0 = no subcategorization. 1 = mark with head tag 2 = mark only -VBF if VBZ/VBD/VBP/MD tag 3 = as 2 and mark -VBNF if TO/VBG/VBN/VB 4 = as 2 but only mark S not SINV/SQ 5 = as 3 but only mark S not SINV/SQ Previously seen as bad. Option 4 might be promising now.


markContainedVP

public boolean markContainedVP

splitNPPRP

public boolean splitNPPRP

dominatesV

public int dominatesV
Verbal distance -- mark whether symbol dominates a verb (V*, MD). Very good.


dominatesI

public boolean dominatesI
Verbal distance -- mark whether symbol dominates a preposition (IN)


dominatesC

public boolean dominatesC
Verbal distance -- mark whether symbol dominates a conjunction (CC)


markCC

public int markCC
Mark phrases which are conjunctions. 0 = No marking 1 = Any phrase with a CC daughter that isn't first or last. Possibly marginally positive. 2 = As 0 but also a non-marginal CONJP daughter. In principle good, but no gains. 3 = More like Charniak. Not yet implemented. Need to annotate _before_ annotate children! np or vp with two or more np/vp children, a comma, cc or conjp, and nothing else.


splitSGapped

public int splitSGapped
Mark specially S nodes with "gapped" subject (control, raising). 1 is basic version. 2 is better mark S nodes with "gapped" subject. 3 seems best on small training set, but all of these are too similar; 4 can't be differentiated. 5 is done on tree before empty splitting. (Bad!?)


splitNumNP

public boolean splitNumNP
Mark "numeric NPs". Probably bad?


splitPoss

public int splitPoss
Give a special tag to NPs which are possessive NPs (end in 's). A value of 0 means do nothing, 1 means tagging possessive NPs with "-P", 2 means restructure possessive NPs so that they introduce a POSSP node that takes as children the POS and a regularly structured NP. I.e., recover standard good linguistic practice circa 1985. This seems a good idea, but is almost a no-op (modulo fine points of markovization), since the previous NP-P phrase already uniquely captured what is now a POSSP.


splitBaseNP

public int splitBaseNP
Mark base NPs. A value of 0 = no marking, 1 = marking baseNP (ones which rewrite just as preterminals), and 2 = doing Collins-style marking, where an extra NP node is inserted above a baseNP, if it isn't already in an NP over NP construction, as in Collins 1999. This option shouldn't really be in EnglishTrain since it's needed at parsing time. But we don't currently use it.... A value of 1 is good.


splitTMP

public int splitTMP
Retain NP-TMP (or maybe PP-TMP) annotation. Good. The values for this parameter are defined in NPTmpRetainingTreeNormalizer.


splitSbar

public int splitSbar
Split SBAR nodes. 1 = mark 'in order to' purpose clauses; this is actually a small and inconsistent part of what is marked SBAR-PRP in the treebank, which is mainly 'because' reason clauses. 2 = mark all infinitive SBAR. 3 = do 1 and 2. A value of 1 seems minutely positive; 2 and 3 seem negative. Also get 'in case Sfin', 'In order to', and on one occasion 'in order that'


splitNPADV

public int splitNPADV
Retain NP-ADV annotation. 0 means strip "-ADV" annotation. 1 means to retain it, and to percolate it down to a head tag providing it can do it through a path of only NP nodes.


splitNPNNP

public int splitNPNNP
Mark NP-NNP. 0 is nothing; 1 is only NNP head, 2 is NNP and NNPS head; 3 is NNP or NNPS anywhere in local NP. All bad!


correctTags

public boolean correctTags
'Correct' tags to produce verbs in VPs, etc. where possible


rightPhrasal

public boolean rightPhrasal
Right edge has a phrasal node. Bad?


sisterSplitLevel

public int sisterSplitLevel
Set the support * KL cutoff level (1-4) for sister splitting -- don't use it, as far as we can tell so far


gpaRootVP

public boolean gpaRootVP
Grand-parent annotate (root mark) VP below ROOT. Seems negative.


makePPTOintoIN

public int makePPTOintoIN
Change TO inside PP to IN.

Method Detail

display

public void display()


Stanford NLP Group