public class ChineseTreebankParserParams extends AbstractTreebankParserParams
AbstractTreebankParserParams.AnnotatePunctuationFunction, AbstractTreebankParserParams.RemoveGFSubcategoryStripper, AbstractTreebankParserParams.SubcategoryStripper
Modifier and Type | Field and Description |
---|---|
boolean |
bikelHeadFinder |
boolean |
charTags |
boolean |
chineseSelectiveTagPA |
boolean |
chineseSplitDouHao
Chinese: Split the dou hao (a punctuation mark separating
members of a list) from other punctuation.
|
boolean |
chineseSplitPunct
Chinese: split Chinese punctuation several ways, along the lines
of English punctuation plus another category for the dou hao.
|
boolean |
chineseSplitPunctLR
Chinese: split left right/paren quote (if chineseSplitPunct is also
true.
|
int |
chineseSplitVP
Chinese VP splitting.
|
boolean |
chineseVerySelectiveTagPA |
static boolean |
DEFAULT_USE_GOOD_TURNING_UNKNOWN_WORD_MODEL
Parameters specific for creating a ChineseLexicon
|
boolean |
discardFrags |
boolean |
dominatesV
Verbal distance -- mark whether symbol dominates a verb (V*).
|
boolean |
gpaAD
Grandparent annotate all AD.
|
double |
lengthPenalty
Parameters for a ChineseCharacterBasedLexicon
|
boolean |
markADgrandchildOfIP
Chinese: mark ADs that are grandchild of IP.
|
boolean |
markCC
Mark phrases which are conjunctions.
|
boolean |
markIPadjsubj |
boolean |
markIPconj
Chinese: mark IPs that are conjuncts.
|
boolean |
markIPsisDEC
Chinese: mark IPs that are part of prenominal modifiers.
|
boolean |
markIPsisterBA
Chinese: mark IPs that are sister of BA.
|
boolean |
markIPsisterVVorP
Chinese: mark IP's that are sister of VV or P.
|
boolean |
markModifiedNP
Chinese: mark left-modified NPs (rightmost NPs with a left-side
mod).
|
boolean |
markMultiNtag
Chinese: mark nominal tags that are part of multi-nominal
rewrites.
|
boolean |
markNPconj
Chinese: mark NPs that are conjuncts.
|
boolean |
markNPmodNP
Chinese: mark NP modifiers of NPs.
|
boolean |
markPostverbalP
Chinese: mark P with a left aunt VV, and PP with a left sister
VV.
|
boolean |
markPostverbalPP |
boolean |
markPsisterIP
Chinese: mark P's that are sister of IP.
|
boolean |
markVPadjunct
Chinese: mark phrases that are adjuncts of VP (these tend to be
locatives/temporals, and have a specific distribution).
|
boolean |
markVVsisterIP
Chinese: mark VVs that are sister of IP (communication &
small-clause-taking verbs).
|
boolean |
mergeNNVV
Chinese: merge NN and VV.
|
boolean |
paRootDtr
Chinese: parent annotate daughter of root.
|
int |
penaltyType
penaltyType should be set as follows:
0: no length penalty
1: quadratic length penalty
2: penalty for continuation chars only
TODO: make this an enum
|
boolean |
segment |
java.lang.String |
segmenterClass |
boolean |
segmentMarkov |
boolean |
splitBaseNP
Mark base NPs.
|
boolean |
splitNPTMP
Whether to retain the -TMP functional tag on various phrasal
categories.
|
boolean |
splitPPTMP |
boolean |
splitXPTMP |
boolean |
sunJurafskyHeadFinder |
boolean |
tagWordSize
Annotate tags for number of characters contained.
|
boolean |
unaryCP |
boolean |
unaryIP
Chinese: unary category marking
|
boolean |
useCharacterBasedLexicon |
boolean |
useCharBasedUnknownWordModel |
boolean |
useGoodTuringUnknownWordModel |
boolean |
useMaxentDepGrammar |
boolean |
useMaxentLexicon |
boolean |
useSimilarWordMap |
boolean |
useUnknownCharacterModel |
evalGF, generateOriginalDependencies, inputEncoding, outputEncoding, tlp
Constructor and Description |
---|
ChineseTreebankParserParams() |
Modifier and Type | Method and Description |
---|---|
AbstractCollinizer |
collinizer()
Returns a ChineseCollinizer
|
AbstractCollinizer |
collinizerEvalb()
Returns a ChineseCollinizer that doesn't delete punctuation
|
java.util.ArrayList<Word> |
defaultTestSentence()
Return a default sentence for the language (for testing)
|
Extractor<DependencyGrammar> |
dependencyGrammarExtractor(Options op,
Index<java.lang.String> wordIndex,
Index<java.lang.String> tagIndex) |
DiskTreebank |
diskTreebank()
Uses a DiskTreebank with a CHTBTokenizer and a
BobChrisTreeNormalizer.
|
void |
display()
Display (write to stderr) language-specific settings.
|
boolean |
generateOriginalDependencies()
Whether to generate original Stanford Dependencies or the newer
Universal Dependencies.
|
GrammaticalStructure |
getGrammaticalStructure(Tree t,
java.util.function.Predicate<java.lang.String> filter,
HeadFinder hf)
Build a GrammaticalStructure from a Tree.
|
HeadFinder |
headFinder()
Returns a ChineseHeadFinder
|
Lexicon |
lex(Options op,
Index<java.lang.String> wordIndex,
Index<java.lang.String> tagIndex)
Returns a ChineseLexicon
|
static void |
main(java.lang.String[] args)
For testing: loads a treebank and prints the trees.
|
MemoryTreebank |
memoryTreebank()
Uses a MemoryTreebank with a CHTBTokenizer and a
BobChrisTreeNormalizer
|
double[] |
MLEDependencyGrammarSmoothingParams()
Give the parameters for smoothing in the MLEDependencyGrammar.
|
java.util.List<GrammaticalStructure> |
readGrammaticalStructureFromFile(java.lang.String filename)
Returns a function which reads the given filename and turns its
content in a list of GrammaticalStructures.
|
int |
setOptionFlag(java.lang.String[] args,
int i)
Set language-specific options according to flags.
|
java.lang.String[] |
sisterSplitters()
Returns the splitting strings used for selective splits.
|
boolean |
supportsBasicDependencies()
By default, parsers are assumed to not support dependencies.
|
Tree |
transformTree(Tree t,
Tree root)
transformTree does all language-specific tree
transformations.
|
TreeReaderFactory |
treeReaderFactory()
Returns a factory for reading in trees from the source you want.
|
HeadFinder |
typedDependencyHeadFinder()
The HeadFinder to use when extracting typed dependencies.
|
defaultCoreNLPFlags, getInputEncoding, getOutputEncoding, isEvalGF, ppAttachmentEval, processHeadWord, pw, pw, setEvalGF, setEvaluateGrammaticalFunctions, setGenerateOriginalDependencies, setInputEncoding, setOutputEncoding, subcategoryStripper, testMemoryTreebank, treebank, treebankLanguagePack, treeTokenizerFactory
public boolean charTags
public boolean useCharacterBasedLexicon
public boolean useMaxentLexicon
public boolean useMaxentDepGrammar
public boolean segment
public boolean segmentMarkov
public boolean sunJurafskyHeadFinder
public boolean bikelHeadFinder
public boolean discardFrags
public boolean useSimilarWordMap
public java.lang.String segmenterClass
public boolean chineseSplitDouHao
public boolean chineseSplitPunct
public boolean chineseSplitPunctLR
public boolean markVVsisterIP
public boolean markPsisterIP
public boolean markIPsisterVVorP
public boolean markADgrandchildOfIP
public boolean gpaAD
public boolean chineseVerySelectiveTagPA
public boolean chineseSelectiveTagPA
public boolean markIPsisterBA
public boolean markVPadjunct
public boolean markNPmodNP
public boolean markModifiedNP
public boolean markNPconj
public boolean markMultiNtag
public boolean markIPsisDEC
public boolean markIPconj
public boolean markIPadjsubj
public int chineseSplitVP
public boolean mergeNNVV
public boolean unaryIP
public boolean unaryCP
public boolean paRootDtr
public boolean markPostverbalP
public boolean markPostverbalPP
public boolean splitBaseNP
public boolean tagWordSize
public boolean markCC
public boolean splitNPTMP
public boolean splitPPTMP
public boolean splitXPTMP
public boolean dominatesV
public static final boolean DEFAULT_USE_GOOD_TURNING_UNKNOWN_WORD_MODEL
public boolean useGoodTuringUnknownWordModel
public boolean useCharBasedUnknownWordModel
public double lengthPenalty
public boolean useUnknownCharacterModel
public int penaltyType
public HeadFinder headFinder()
headFinder
in interface TreebankLangParserParams
headFinder
in class AbstractTreebankParserParams
public HeadFinder typedDependencyHeadFinder()
AbstractTreebankParserParams
typedDependencyHeadFinder
in interface TreebankLangParserParams
typedDependencyHeadFinder
in class AbstractTreebankParserParams
public Lexicon lex(Options op, Index<java.lang.String> wordIndex, Index<java.lang.String> tagIndex)
lex
in interface TreebankLangParserParams
lex
in class AbstractTreebankParserParams
op
- Options as to how the Lexicon behavespublic double[] MLEDependencyGrammarSmoothingParams()
AbstractTreebankParserParams
MLEDependencyGrammarSmoothingParams
in interface TreebankLangParserParams
MLEDependencyGrammarSmoothingParams
in class AbstractTreebankParserParams
public TreeReaderFactory treeReaderFactory()
AbstractTreebankParserParams
treeReaderFactory
in interface TreebankLangParserParams
treeReaderFactory
in class AbstractTreebankParserParams
public DiskTreebank diskTreebank()
diskTreebank
in interface TreebankLangParserParams
diskTreebank
in class AbstractTreebankParserParams
public MemoryTreebank memoryTreebank()
memoryTreebank
in interface TreebankLangParserParams
memoryTreebank
in class AbstractTreebankParserParams
public AbstractCollinizer collinizer()
collinizer
in interface TreebankLangParserParams
collinizer
in class AbstractTreebankParserParams
public AbstractCollinizer collinizerEvalb()
collinizerEvalb
in interface TreebankLangParserParams
collinizerEvalb
in class AbstractTreebankParserParams
public java.lang.String[] sisterSplitters()
AbstractTreebankParserParams
sisterSplitters
in interface TreebankLangParserParams
sisterSplitters
in class AbstractTreebankParserParams
public Tree transformTree(Tree t, Tree root)
transformTree
in interface TreebankLangParserParams
transformTree
in class AbstractTreebankParserParams
t
- The input tree (with non-language specific annotation already
done, so you need to strip back to basic categories)root
- The root of the current tree (can be null for words)public void display()
AbstractTreebankParserParams
display
in interface TreebankLangParserParams
display
in class AbstractTreebankParserParams
public int setOptionFlag(java.lang.String[] args, int i)
setOptionFlag
in interface TreebankLangParserParams
setOptionFlag
in class AbstractTreebankParserParams
args
- Array of command line argumentsi
- Index in command line arguments to try to process as an optionpublic Extractor<DependencyGrammar> dependencyGrammarExtractor(Options op, Index<java.lang.String> wordIndex, Index<java.lang.String> tagIndex)
dependencyGrammarExtractor
in interface TreebankLangParserParams
dependencyGrammarExtractor
in class AbstractTreebankParserParams
public java.util.ArrayList<Word> defaultTestSentence()
defaultTestSentence
in interface TreebankLangParserParams
defaultTestSentence
in class AbstractTreebankParserParams
public java.util.List<GrammaticalStructure> readGrammaticalStructureFromFile(java.lang.String filename)
TreebankLangParserParams
readGrammaticalStructureFromFile
in interface TreebankLangParserParams
readGrammaticalStructureFromFile
in class AbstractTreebankParserParams
public GrammaticalStructure getGrammaticalStructure(Tree t, java.util.function.Predicate<java.lang.String> filter, HeadFinder hf)
TreebankLangParserParams
getGrammaticalStructure
in interface TreebankLangParserParams
getGrammaticalStructure
in class AbstractTreebankParserParams
public boolean supportsBasicDependencies()
AbstractTreebankParserParams
supportsBasicDependencies
in interface TreebankLangParserParams
supportsBasicDependencies
in class AbstractTreebankParserParams
public boolean generateOriginalDependencies()
TreebankLangParserParams
generateOriginalDependencies
in interface TreebankLangParserParams
generateOriginalDependencies
in class AbstractTreebankParserParams
public static void main(java.lang.String[] args)