edu.stanford.nlp.parser.lexparser
Class ChineseTreebankParserParams

java.lang.Object
  extended byedu.stanford.nlp.parser.lexparser.AbstractTreebankParserParams
      extended byedu.stanford.nlp.parser.lexparser.ChineseTreebankParserParams
All Implemented Interfaces:
Serializable, TreebankLangParserParams

public class ChineseTreebankParserParams
extends AbstractTreebankParserParams
implements TreebankLangParserParams

Parameter file for parsing the Penn Chinese Treebank. Includes category enrichments specific to the Penn Chinese Treebank.

Author:
Roger Levy
See Also:
Serialized Form

Field Summary
static boolean chineseSelectiveTagPA
           
static boolean chineseSplitDouHao
          Chinese: Split the dou hao (a punctuation mark separating members of a list) from other punctuation.
static boolean chineseSplitPunct
          Chinese: split Chinese punctuation several ways, along the lines of English punctuation plus another category for the dou hao.
static boolean chineseSplitPunctLR
          Chinese: split left right/paren quote (if chineseSplitPunct is also true.
static boolean chineseSplitVP3
          Chinese: split VPs into VP-COMP, VP-CRD, VP-ADJ.
static boolean chineseVerySelectiveTagPA
           
static boolean gpaAD
          Grandparent annotate all AD.
static boolean markADgrandchildOfIP
          Chinese: mark ADs that are grandchild of IP.
static boolean markIPadjsubj
           
static boolean markIPconj
          Chinese: mark IPs that are conjuncts.
static boolean markIPsisDEC
          Chinese: mark IPs that are part of prenominal modifiers.
static boolean markIPsisterBA
          Chinese: mark IPs that are sister of BA.
static boolean markIPsisterVVorP
          Chinese: mark IP's that are sister of VV or P.
static boolean markModifiedNP
          Chinese: mark left-modified NPs (rightmost NPs with a left-side mod).
static boolean markMultiNtag
          Chinese: mark nominal tags that are part of multi-nominal rewrites.
static boolean markNPconj
          Chinese: mark NPs that are conjuncts.
static boolean markNPmodNP
          Chinese: mark NP modifiers of NPs.
static boolean markPostverbalP
          Chinese: mark P with a left aunt VV, and PP with a left sister VV.
static boolean markPostverbalPP
           
static boolean markPsisterIP
          Chinese: mark P's that are sister of IP.
static boolean markVPadjunct
          Chinese: mark phrases that are adjuncts of VP (these tend to be locatives/temporals, and have a specific distribution).
static boolean markVVsisterIP
          Chinese: mark VVs that are sister of IP (communication & small-clause-taking verbs).
static boolean mergeNNVV
          Chinese: merge NN and VV.
static boolean paRootDtr
          Chinese: parent annotate daughter of root.
static int selectiveSplitLevel
          How selectively to split.
static boolean splitBaseNP
          Mark base NPs.
static boolean tagWordSize
          Annotate tags for number of characters contained.
static boolean unaryCP
           
static boolean unaryIP
          Chinese: unary category marking
 
Fields inherited from class edu.stanford.nlp.parser.lexparser.AbstractTreebankParserParams
inputEncoding, outputEncoding, tlp
 
Constructor Summary
ChineseTreebankParserParams()
           
 
Method Summary
 TreeTransformer collinizer()
          Returns a ChineseCollinizer
 TreeTransformer collinizerEvalb()
          Returns a ChineseCollinizer that doesn't delete punctuation
 void display()
          display language-specific settings
 HeadFinder headFinder()
          Returns a ChineseHeadFinder
 Lexicon lex()
          Returns a ChineseLexicon
static void main(String[] args)
          testing -- loads a treebank and prints the first tree.
 MemoryTreebank memoryTreebank()
          Uses a memoryTreebank with a CHTBTokenizer and a BobChrisTreeNormalizer
 int setOptionFlag(String[] args, int i)
          Set language-specific options according to flags.
 String[] sisterSplitters()
          Returns the splitting strings used for selective splits.
 String[] splitters()
          Returns the splitting strings used for selective splits.
 edu.stanford.nlp.parser.lexparser.TreeHeadPair transformTree(Tree t, Tree root, edu.stanford.nlp.parser.lexparser.TreeHeadPair thp)
          transformTree does all language-specific tree transformations.
 
Methods inherited from class edu.stanford.nlp.parser.lexparser.AbstractTreebankParserParams
pw, pw, setInputEncoding, setOutputEncoding, testMemoryTreebank, treebankLanguagePack
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface edu.stanford.nlp.parser.lexparser.TreebankLangParserParams
pw, pw, setInputEncoding, setOutputEncoding, testMemoryTreebank, treebankLanguagePack
 

Field Detail

chineseSplitDouHao

public static boolean chineseSplitDouHao
Chinese: Split the dou hao (a punctuation mark separating members of a list) from other punctuation. Good but included below.


chineseSplitPunct

public static boolean chineseSplitPunct
Chinese: split Chinese punctuation several ways, along the lines of English punctuation plus another category for the dou hao. Good.


chineseSplitPunctLR

public static boolean chineseSplitPunctLR
Chinese: split left right/paren quote (if chineseSplitPunct is also true. Only very marginal gains, but seems positive.


markVVsisterIP

public static boolean markVVsisterIP
Chinese: mark VVs that are sister of IP (communication & small-clause-taking verbs). Good: give 0.5%


markPsisterIP

public static boolean markPsisterIP
Chinese: mark P's that are sister of IP. Negative effect


markIPsisterVVorP

public static boolean markIPsisterVVorP
Chinese: mark IP's that are sister of VV or P. These rarely have punctuation. Small positive effect.


markADgrandchildOfIP

public static boolean markADgrandchildOfIP
Chinese: mark ADs that are grandchild of IP.


gpaAD

public static boolean gpaAD
Grandparent annotate all AD. Seems slightly negative.


chineseVerySelectiveTagPA

public static boolean chineseVerySelectiveTagPA

chineseSelectiveTagPA

public static boolean chineseSelectiveTagPA

markIPsisterBA

public static boolean markIPsisterBA
Chinese: mark IPs that are sister of BA. These always have overt NP. Very slightly positive.


markVPadjunct

public static boolean markVPadjunct
Chinese: mark phrases that are adjuncts of VP (these tend to be locatives/temporals, and have a specific distribution). Necessary even with chineseSplitVP3 and parent annotation because parent annotation happens with unsplit parent categories. Slightly positive.


markNPmodNP

public static boolean markNPmodNP
Chinese: mark NP modifiers of NPs. Quite positive (0.5%)


markModifiedNP

public static boolean markModifiedNP
Chinese: mark left-modified NPs (rightmost NPs with a left-side mod). Slightly positive.


markNPconj

public static boolean markNPconj
Chinese: mark NPs that are conjuncts. Negative on small set.


markMultiNtag

public static boolean markMultiNtag
Chinese: mark nominal tags that are part of multi-nominal rewrites. Doesn't seem any good.


markIPsisDEC

public static boolean markIPsisDEC
Chinese: mark IPs that are part of prenominal modifiers. Negative.


markIPconj

public static boolean markIPconj
Chinese: mark IPs that are conjuncts. Or those that have (adjuncts or subjects)


markIPadjsubj

public static boolean markIPadjsubj

chineseSplitVP3

public static boolean chineseSplitVP3
Chinese: split VPs into VP-COMP, VP-CRD, VP-ADJ. Negative value.


mergeNNVV

public static boolean mergeNNVV
Chinese: merge NN and VV. A lark.


unaryIP

public static boolean unaryIP
Chinese: unary category marking


unaryCP

public static boolean unaryCP

paRootDtr

public static boolean paRootDtr
Chinese: parent annotate daughter of root. Meant only for selectivesplit=false.


markPostverbalP

public static boolean markPostverbalP
Chinese: mark P with a left aunt VV, and PP with a left sister VV. Note that it's necessary to mark both to thread the context-marking. Used to identify post-verbal P's, which are rare.


markPostverbalPP

public static boolean markPostverbalPP

selectiveSplitLevel

public static int selectiveSplitLevel
How selectively to split.


splitBaseNP

public static boolean splitBaseNP
Mark base NPs. Good.


tagWordSize

public static boolean tagWordSize
Annotate tags for number of characters contained.

Constructor Detail

ChineseTreebankParserParams

public ChineseTreebankParserParams()
Method Detail

headFinder

public HeadFinder headFinder()
Returns a ChineseHeadFinder

Specified by:
headFinder in interface TreebankLangParserParams
Specified by:
headFinder in class AbstractTreebankParserParams

lex

public Lexicon lex()
Returns a ChineseLexicon

Specified by:
lex in interface TreebankLangParserParams
Specified by:
lex in class AbstractTreebankParserParams

memoryTreebank

public MemoryTreebank memoryTreebank()
Uses a memoryTreebank with a CHTBTokenizer and a BobChrisTreeNormalizer

Specified by:
memoryTreebank in interface TreebankLangParserParams
Specified by:
memoryTreebank in class AbstractTreebankParserParams

collinizer

public TreeTransformer collinizer()
Returns a ChineseCollinizer

Specified by:
collinizer in interface TreebankLangParserParams
Specified by:
collinizer in class AbstractTreebankParserParams

collinizerEvalb

public TreeTransformer collinizerEvalb()
Returns a ChineseCollinizer that doesn't delete punctuation

Specified by:
collinizerEvalb in interface TreebankLangParserParams
Specified by:
collinizerEvalb in class AbstractTreebankParserParams

splitters

public String[] splitters()
Description copied from interface: TreebankLangParserParams
Returns the splitting strings used for selective splits.

Specified by:
splitters in interface TreebankLangParserParams
Specified by:
splitters in class AbstractTreebankParserParams
Returns:
An array containing ancestor-annotated Strings: categories should be split according to these ancestor annotations.

sisterSplitters

public String[] sisterSplitters()
Description copied from interface: TreebankLangParserParams
Returns the splitting strings used for selective splits.

Specified by:
sisterSplitters in interface TreebankLangParserParams
Specified by:
sisterSplitters in class AbstractTreebankParserParams
Returns:
An array containing ancestor-annotated Strings: categories should be split according to these ancestor annotations.

transformTree

public edu.stanford.nlp.parser.lexparser.TreeHeadPair transformTree(Tree t,
                                                                    Tree root,
                                                                    edu.stanford.nlp.parser.lexparser.TreeHeadPair thp)
transformTree does all language-specific tree transformations. Any parameterizations should be inside the specific TreebankLangParserarams class.

Specified by:
transformTree in interface TreebankLangParserParams
Specified by:
transformTree in class AbstractTreebankParserParams

display

public void display()
Description copied from interface: TreebankLangParserParams
display language-specific settings

Specified by:
display in interface TreebankLangParserParams
Specified by:
display in class AbstractTreebankParserParams

setOptionFlag

public int setOptionFlag(String[] args,
                         int i)
Set language-specific options according to flags. This routine should process the option starting in args[i] (which might potentially be several arguments long if it takes arguments). It should return the index after the last index it consumed in processing. In particular, if it cannot process the current option, the return value should be i.

Specified by:
setOptionFlag in interface TreebankLangParserParams
Specified by:
setOptionFlag in class AbstractTreebankParserParams

main

public static void main(String[] args)
testing -- loads a treebank and prints the first tree.



Stanford NLP Group