edu.stanford.nlp.parser.lexparser
Interface TreebankLangParserParams

All Superinterfaces:
Serializable, TreebankFactory
All Known Implementing Classes:
AbstractTreebankParserParams, ArabicTreebankParserParams, ChineseTreebankParserParams, EnglishTreebankParserParams, FrenchTreebankParserParams, HebrewTreebankParserParams, NegraPennTreebankParserParams, TueBaDZParserParams

public interface TreebankLangParserParams
extends TreebankFactory, Serializable

Contains language-specific methods necessary to get the parser to parse an arbitrary treebank.

Author:
Roger Levy

Method Summary
 TreeTransformer collinizer()
          The tree transformer applied to trees prior to evaluation.
 TreeTransformer collinizerEvalb()
          the tree transformer used to produce trees for evaluation.
 List<? extends HasWord> defaultTestSentence()
          Return a default sentence of the language (for testing).
 Extractor<DependencyGrammar> dependencyGrammarExtractor(Options op, Index<String> wordIndex, Index<String> tagIndex)
           
 DiskTreebank diskTreebank()
          returns a DiskTreebank appropriate to the treebank source
 void display()
          display language-specific settings
 String getInputEncoding()
          Returns the input encoding being used.
 String getOutputEncoding()
          Returns the output encoding being used.
 HeadFinder headFinder()
           
 Lexicon lex(Options op, Index<String> wordIndex, Index<String> tagIndex)
          Vends a Lexicon object suitable to the particular language/treebank combination of interest.
 MemoryTreebank memoryTreebank()
          returns a MemoryTreebank appropriate to the treebank source
 double[] MLEDependencyGrammarSmoothingParams()
          Give the parameters for smoothing in the MLEDependencyGrammar.
 AbstractEval ppAttachmentEval()
          Returns a language specific object for evaluating PP attachment
 Label processHeadWord(Label headWord)
          Allows language specific processing (e.g., stemming) of head words.
 PrintWriter pw()
          returns a PrintWriter used to print output.
 PrintWriter pw(OutputStream o)
          returns a PrintWriter used to print output to the OutputStream o.
 void setEvaluateGrammaticalFunctions(boolean evalGFs)
          If evalGFs = true, then the evaluation of parse trees will include evaluation on grammatical functions.
 void setInputEncoding(String encoding)
           
 int setOptionFlag(String[] args, int i)
          Set a language-specific option according to command-line flags.
 void setOutputEncoding(String encoding)
           
 void setupForEval()
          Convenience method for setting state parameters specific to evaluation.
 String[] sisterSplitters()
          Returns the splitting strings used for selective splits.
 TreeTransformer subcategoryStripper()
          Returns a TreeTransformer appropriate to the Treebank which can be used to remove functional tags (such as "-TMP") from categories.
 MemoryTreebank testMemoryTreebank()
          returns a MemoryTreebank appropriate to the testing treebank source
 Tree transformTree(Tree t, Tree root)
          This method does language-specific tree transformations such as annotating particular nodes with language-relevant features.
 Treebank treebank()
          Required to extend TreebankFactory
 TreebankLanguagePack treebankLanguagePack()
          returns a TreebankLanguagePack containing Treebank-specific (but not parser-specific) info such as what is punctuation, and also information about the structure of labels
 TreeReaderFactory treeReaderFactory()
          Returns a factory for reading in trees from the source you want.
 TokenizerFactory<Tree> treeTokenizerFactory()
           
 HeadFinder typedDependencyHeadFinder()
           
 

Method Detail

headFinder

HeadFinder headFinder()

typedDependencyHeadFinder

HeadFinder typedDependencyHeadFinder()

processHeadWord

Label processHeadWord(Label headWord)
Allows language specific processing (e.g., stemming) of head words.

Parameters:
headWord - An Label that minimally implements the HasWord and HasTag interfaces.
Returns:
A processed Label

setupForEval

void setupForEval()
Convenience method for setting state parameters specific to evaluation. For example, if grammatical functions are retained during training but discarded during evaluation, this method may be used to make that state change.


setInputEncoding

void setInputEncoding(String encoding)

setOutputEncoding

void setOutputEncoding(String encoding)

setEvaluateGrammaticalFunctions

void setEvaluateGrammaticalFunctions(boolean evalGFs)
If evalGFs = true, then the evaluation of parse trees will include evaluation on grammatical functions. Otherwise, evaluation will strip the grammatical functions.


getOutputEncoding

String getOutputEncoding()
Returns the output encoding being used.

Returns:
The output encoding being used.

getInputEncoding

String getInputEncoding()
Returns the input encoding being used.

Returns:
The input encoding being used.

treeReaderFactory

TreeReaderFactory treeReaderFactory()
Returns a factory for reading in trees from the source you want. It's the responsibility of trf to deal properly with character-set encoding of the input. It also is the responsibility of trf to properly normalize trees.

Returns:
A factory that vends an appropriate TreeReader

lex

Lexicon lex(Options op,
            Index<String> wordIndex,
            Index<String> tagIndex)
Vends a Lexicon object suitable to the particular language/treebank combination of interest.

Parameters:
op - Options as to how the Lexicon behaves
Returns:
A Lexicon, constructed based on the given option

collinizer

TreeTransformer collinizer()
The tree transformer applied to trees prior to evaluation. For instance, it might delete punctuation nodes. This method will be applied both to the parse output tree and to the gold tree. The exact specification depends on "standard practice" for various treebanks.

Returns:
A TreeTransformer that performs adjustments to trees to delete or equivalence class things not evaluated in the parser performance evaluation.

collinizerEvalb

TreeTransformer collinizerEvalb()
the tree transformer used to produce trees for evaluation. Will be applied both to the parse output tree and to the gold tree. Should strip punctuation and maybe do some other things. The evalb version should strip some more stuff off. (finish this doc!)


memoryTreebank

MemoryTreebank memoryTreebank()
returns a MemoryTreebank appropriate to the treebank source


diskTreebank

DiskTreebank diskTreebank()
returns a DiskTreebank appropriate to the treebank source


testMemoryTreebank

MemoryTreebank testMemoryTreebank()
returns a MemoryTreebank appropriate to the testing treebank source


treebank

Treebank treebank()
Required to extend TreebankFactory

Specified by:
treebank in interface TreebankFactory

treebankLanguagePack

TreebankLanguagePack treebankLanguagePack()
returns a TreebankLanguagePack containing Treebank-specific (but not parser-specific) info such as what is punctuation, and also information about the structure of labels


pw

PrintWriter pw()
returns a PrintWriter used to print output. It's the responsibility of the returned PrintWriter to deal properly with character encodings for the relevant treebank


pw

PrintWriter pw(OutputStream o)
returns a PrintWriter used to print output to the OutputStream o. It's the responsibility of the returned PrintWriter to deal properly with character encodings for the relevant treebank


sisterSplitters

String[] sisterSplitters()
Returns the splitting strings used for selective splits.

Returns:
An array containing ancestor-annotated Strings: categories should be split according to these ancestor annotations.

subcategoryStripper

TreeTransformer subcategoryStripper()
Returns a TreeTransformer appropriate to the Treebank which can be used to remove functional tags (such as "-TMP") from categories.


transformTree

Tree transformTree(Tree t,
                   Tree root)
This method does language-specific tree transformations such as annotating particular nodes with language-relevant features. Such parameterizations should be inside the specific TreebankLangParserParams class. This method is recursively applied to each node in the tree (depth first, left-to-right), so you shouldn't write this method to apply recursively to tree members. This method is allowed to (and in some cases does) destructively change the input tree t. It changes both labels and the tree shape.

Parameters:
t - The input tree (with non-language specific annotation already done, so you need to strip back to basic categories)
root - The root of the current tree (can be null for words)
Returns:
The fully annotated tree node (with daughters still as you want them in the final result)

display

void display()
display language-specific settings


setOptionFlag

int setOptionFlag(String[] args,
                  int i)
Set a language-specific option according to command-line flags. This routine should try to process the option starting at args[i] (which might potentially be several arguments long if it takes arguments). It should return the index after the last index it consumed in processing. In particular, if it cannot process the current option, the return value should be i.

Parameters:
args - Array of command line arguments
i - Index in command line arguments to try to process as an option
Returns:
The index of the item after arguments processed as part of this command line option.

defaultTestSentence

List<? extends HasWord> defaultTestSentence()
Return a default sentence of the language (for testing).

Returns:
A default sentence of the language

treeTokenizerFactory

TokenizerFactory<Tree> treeTokenizerFactory()

dependencyGrammarExtractor

Extractor<DependencyGrammar> dependencyGrammarExtractor(Options op,
                                                        Index<String> wordIndex,
                                                        Index<String> tagIndex)

MLEDependencyGrammarSmoothingParams

double[] MLEDependencyGrammarSmoothingParams()
Give the parameters for smoothing in the MLEDependencyGrammar.

Returns:
an array of doubles with smooth_aT_hTWd, smooth_aTW_hTWd, smooth_stop, and interp

ppAttachmentEval

AbstractEval ppAttachmentEval()
Returns a language specific object for evaluating PP attachment

Returns:
An object that implements AbstractEval


Stanford NLP Group