TreebankLangParserParams (Stanford JavaNLP API)

All Superinterfaces:

java.io.Serializable, TreebankFactory

All Known Implementing Classes:

AbstractTreebankParserParams, ArabicTreebankParserParams, ChineseTreebankParserParams, EnglishTreebankParserParams, FrenchTreebankParserParams, GenericTreebankParserParams, HebrewTreebankParserParams, HungarianTreebankParserParams, ItalianTreebankParserParams, NegraPennTreebankParserParams, SpanishTreebankParserParams, TregexPoweredTreebankParserParams, TueBaDZParserParams
```
public interface TreebankLangParserParams
extends TreebankFactory, java.io.Serializable
```
Contains language-specific methods commonly necessary to get a parser to parse an arbitrary treebank.

Version:

03/05/2003

Author:

Roger Levy

Method Summary

All Methods Instance Methods Abstract Methods
Modifier and Type	Method and Description
`AbstractCollinizer`	`collinizer()` The tree transformer applied to trees prior to evaluation.
`AbstractCollinizer`	`collinizerEvalb()` the tree transformer used to produce trees for evaluation.
`java.lang.String[]`	`defaultCoreNLPFlags()` When run inside StanfordCoreNLP, which flags should be used by default.
`java.util.List<? extends HasWord>`	`defaultTestSentence()` Return a default sentence of the language (for testing).
`Extractor<DependencyGrammar>`	`dependencyGrammarExtractor(Options op, Index<java.lang.String> wordIndex, Index<java.lang.String> tagIndex)`
`DiskTreebank`	`diskTreebank()` returns a DiskTreebank appropriate to the treebank source
`void`	`display()` display language-specific settings
`boolean`	`generateOriginalDependencies()` Whether to generate original Stanford Dependencies or the newer Universal Dependencies.
`GrammaticalStructure`	`getGrammaticalStructure(Tree t, java.util.function.Predicate<java.lang.String> filter, HeadFinder hf)` Build a GrammaticalStructure from a Tree.
`java.lang.String`	`getInputEncoding()` Returns the input encoding being used.
`java.lang.String`	`getOutputEncoding()` Returns the output encoding being used.
`HeadFinder`	`headFinder()`
`Lexicon`	`lex(Options op, Index<java.lang.String> wordIndex, Index<java.lang.String> tagIndex)` Vends a `Lexicon` object suitable to the particular language/treebank combination of interest.
`MemoryTreebank`	`memoryTreebank()` returns a MemoryTreebank appropriate to the treebank source
`double[]`	`MLEDependencyGrammarSmoothingParams()` Give the parameters for smoothing in the MLEDependencyGrammar.
`AbstractEval`	`ppAttachmentEval()` Returns a language specific object for evaluating PP attachment
`Label`	`processHeadWord(Label headWord)` Allows language specific processing (e.g., stemming) of head words.
`java.io.PrintWriter`	`pw()` returns a PrintWriter used to print output.
`java.io.PrintWriter`	`pw(java.io.OutputStream o)` returns a PrintWriter used to print output to the OutputStream o.
`java.util.List<GrammaticalStructure>`	`readGrammaticalStructureFromFile(java.lang.String filename)` Returns a function which reads the given filename and turns its content in a list of GrammaticalStructures.
`void`	`setEvaluateGrammaticalFunctions(boolean evalGFs)` If evalGFs = true, then the evaluation of parse trees will include evaluation on grammatical functions.
`void`	`setGenerateOriginalDependencies(boolean originalDependencies)` Set whether to generate original Stanford Dependencies or the newer Universal Dependencies.
`void`	`setInputEncoding(java.lang.String encoding)`
`int`	`setOptionFlag(java.lang.String[] args, int i)` Set a language-specific option according to command-line flags.
`void`	`setOutputEncoding(java.lang.String encoding)`
`java.lang.String[]`	`sisterSplitters()` Returns the splitting strings used for selective splits.
`TreeTransformer`	`subcategoryStripper()` Returns a TreeTransformer appropriate to the Treebank which can be used to remove functional tags (such as "-TMP") from categories.
`boolean`	`supportsBasicDependencies()` Whether our code provides support for converting phrase structure (constituency) parses to (basic) dependency parses.
`MemoryTreebank`	`testMemoryTreebank()` returns a MemoryTreebank appropriate to the testing treebank source
`Tree`	`transformTree(Tree t, Tree root)` This method does language-specific tree transformations such as annotating particular nodes with language-relevant features.
`Treebank`	`treebank()` Required to extend TreebankFactory
`TreebankLanguagePack`	`treebankLanguagePack()` returns a TreebankLanguagePack containing Treebank-specific (but not parser-specific) info such as what is punctuation, and also information about the structure of labels
`TreeReaderFactory`	`treeReaderFactory()` Returns a factory for reading in trees from the source you want.
`TokenizerFactory<Tree>`	`treeTokenizerFactory()`
`HeadFinder`	`typedDependencyHeadFinder()`

- Method Detail
  - headFinder
```
HeadFinder headFinder()
```
  - typedDependencyHeadFinder
```
HeadFinder typedDependencyHeadFinder()
```
  - processHeadWord
```
Label processHeadWord(Label headWord)
```
    Allows language specific processing (e.g., stemming) of head words.
    
    Parameters:
    
    headWord - An Label that minimally implements the HasWord and HasTag interfaces.
    
    Returns:
    
    A processed Label
  - setInputEncoding
```
void setInputEncoding(java.lang.String encoding)
```
  - setOutputEncoding
```
void setOutputEncoding(java.lang.String encoding)
```
  - setEvaluateGrammaticalFunctions
```
void setEvaluateGrammaticalFunctions(boolean evalGFs)
```
    If evalGFs = true, then the evaluation of parse trees will include evaluation on grammatical functions. Otherwise, evaluation will strip the grammatical functions.
  - getOutputEncoding
```
java.lang.String getOutputEncoding()
```
    Returns the output encoding being used.
    
    Returns:
    
    The output encoding being used.
  - getInputEncoding
```
java.lang.String getInputEncoding()
```
    Returns the input encoding being used.
    
    Returns:
    
    The input encoding being used.
  - treeReaderFactory
```
TreeReaderFactory treeReaderFactory()
```
    Returns a factory for reading in trees from the source you want. It's the responsibility of trf to deal properly with character-set encoding of the input. It also is the responsibility of trf to properly normalize trees.
    
    Returns:
    
    A factory that vends an appropriate TreeReader
  - lex
```
Lexicon lex(Options op,
            Index<java.lang.String> wordIndex,
            Index<java.lang.String> tagIndex)
```
    Vends a Lexicon object suitable to the particular language/treebank combination of interest.
    
    Parameters:
    
    op - Options as to how the Lexicon behaves
    
    Returns:
    
    A Lexicon, constructed based on the given option
  - collinizer
```
AbstractCollinizer collinizer()
```
    The tree transformer applied to trees prior to evaluation. For instance, it might delete punctuation nodes. This method will be applied both to the parse output tree and to the gold tree. The exact specification depends on "standard practice" for various treebanks.
    
    Returns:
    
    A TreeTransformer that performs adjustments to trees to delete or equivalence class things not evaluated in the parser performance evaluation.
  - collinizerEvalb
```
AbstractCollinizer collinizerEvalb()
```
    the tree transformer used to produce trees for evaluation. Will be applied both to the parse output tree and to the gold tree. Should strip punctuation and maybe do some other things. The evalb version should strip some more stuff off. (finish this doc!)
  - memoryTreebank
```
MemoryTreebank memoryTreebank()
```
    returns a MemoryTreebank appropriate to the treebank source
  - diskTreebank
```
DiskTreebank diskTreebank()
```
    returns a DiskTreebank appropriate to the treebank source
  - testMemoryTreebank
```
MemoryTreebank testMemoryTreebank()
```
    returns a MemoryTreebank appropriate to the testing treebank source
  - treebank
```
Treebank treebank()
```
    Required to extend TreebankFactory
    
    Specified by:
    
    treebank in interface TreebankFactory
  - treebankLanguagePack
```
TreebankLanguagePack treebankLanguagePack()
```
    returns a TreebankLanguagePack containing Treebank-specific (but not parser-specific) info such as what is punctuation, and also information about the structure of labels
  - pw
```
java.io.PrintWriter pw()
```
    returns a PrintWriter used to print output. It's the responsibility of the returned PrintWriter to deal properly with character encodings for the relevant treebank
  - pw
```
java.io.PrintWriter pw(java.io.OutputStream o)
```
    returns a PrintWriter used to print output to the OutputStream o. It's the responsibility of the returned PrintWriter to deal properly with character encodings for the relevant treebank
  - sisterSplitters
```
java.lang.String[] sisterSplitters()
```
    Returns the splitting strings used for selective splits.
    
    Returns:
    
    An array containing ancestor-annotated Strings: categories should be split according to these ancestor annotations.
  - subcategoryStripper
```
TreeTransformer subcategoryStripper()
```
    Returns a TreeTransformer appropriate to the Treebank which can be used to remove functional tags (such as "-TMP") from categories.
  - transformTree
```
Tree transformTree(Tree t,
                   Tree root)
```
    This method does language-specific tree transformations such as annotating particular nodes with language-relevant features. Such parameterizations should be inside the specific TreebankLangParserParams class. This method is recursively applied to each node in the tree (depth first, left-to-right), so you shouldn't write this method to apply recursively to tree members. This method is allowed to (and in some cases does) destructively change the input tree t. It changes both labels and the tree shape.
    
    Parameters:
    
    t - The input tree (with non-language specific annotation already done, so you need to strip back to basic categories)
    
    root - The root of the current tree (can be null for words)
    
    Returns:
    
    The fully annotated tree node (with daughters still as you want them in the final result)
  - display
```
void display()
```
    display language-specific settings
  - setOptionFlag
```
int setOptionFlag(java.lang.String[] args,
                  int i)
```
    Set a language-specific option according to command-line flags. This routine should try to process the option starting at args[i] (which might potentially be several arguments long if it takes arguments). It should return the index after the last index it consumed in processing. In particular, if it cannot process the current option, the return value should be i.
    
    Parameters:
    
    args - Array of command line arguments
    
    i - Index in command line arguments to try to process as an option
    
    Returns:
    
    The index of the item after arguments processed as part of this command line option.
  - defaultTestSentence
```
java.util.List<? extends HasWord> defaultTestSentence()
```
    Return a default sentence of the language (for testing).
    
    Returns:
    
    A default sentence of the language
  - treeTokenizerFactory
```
TokenizerFactory<Tree> treeTokenizerFactory()
```
  - dependencyGrammarExtractor
```
Extractor<DependencyGrammar> dependencyGrammarExtractor(Options op,
                                                        Index<java.lang.String> wordIndex,
                                                        Index<java.lang.String> tagIndex)
```
  - MLEDependencyGrammarSmoothingParams
```
double[] MLEDependencyGrammarSmoothingParams()
```
    Give the parameters for smoothing in the MLEDependencyGrammar.
    
    Returns:
    
    an array of doubles with smooth_aT_hTWd, smooth_aTW_hTWd, smooth_stop, and interp
  - ppAttachmentEval
```
AbstractEval ppAttachmentEval()
```
    Returns a language specific object for evaluating PP attachment
    
    Returns:
    
    An object that implements AbstractEval
  - readGrammaticalStructureFromFile
```
java.util.List<GrammaticalStructure> readGrammaticalStructureFromFile(java.lang.String filename)
```
    Returns a function which reads the given filename and turns its content in a list of GrammaticalStructures. Will throw UnsupportedOperationException if the language doesn't support dependencies or GrammaticalStructures.
  - getGrammaticalStructure
```
GrammaticalStructure getGrammaticalStructure(Tree t,
                                             java.util.function.Predicate<java.lang.String> filter,
                                             HeadFinder hf)
```
    Build a GrammaticalStructure from a Tree. Throws UnsupportedOperationException if the language doesn't support dependencies or GrammaticalStructures.
  - supportsBasicDependencies
```
boolean supportsBasicDependencies()
```
    Whether our code provides support for converting phrase structure (constituency) parses to (basic) dependency parses.
    
    Returns:
    
    Whether dependencies are supported for a language
  - setGenerateOriginalDependencies
```
void setGenerateOriginalDependencies(boolean originalDependencies)
```
    Set whether to generate original Stanford Dependencies or the newer Universal Dependencies.
    
    Parameters:
    
    originalDependencies - Whether to generate SD
  - generateOriginalDependencies
```
boolean generateOriginalDependencies()
```
    Whether to generate original Stanford Dependencies or the newer Universal Dependencies.
    
    Returns:
    
    Whether to generate SD
  - defaultCoreNLPFlags
```
java.lang.String[] defaultCoreNLPFlags()
```
    When run inside StanfordCoreNLP, which flags should be used by default. E.g., the current use is that for English, we want it to run with the option to retain "-TMP" functional tags but not to impose that on other languages.

Interface TreebankLangParserParams

Method Summary

Method Detail

headFinder

typedDependencyHeadFinder

processHeadWord

setInputEncoding

setOutputEncoding

setEvaluateGrammaticalFunctions

getOutputEncoding

getInputEncoding

treeReaderFactory

lex

collinizer

collinizerEvalb

memoryTreebank

diskTreebank

testMemoryTreebank

treebank

treebankLanguagePack

pw

pw

sisterSplitters

subcategoryStripper

transformTree

display

setOptionFlag

defaultTestSentence

treeTokenizerFactory

dependencyGrammarExtractor

MLEDependencyGrammarSmoothingParams

ppAttachmentEval

readGrammaticalStructureFromFile

getGrammaticalStructure

supportsBasicDependencies

setGenerateOriginalDependencies

generateOriginalDependencies

defaultCoreNLPFlags