edu.stanford.nlp.parser.lexparser
Class AbstractTreebankParserParams

java.lang.Object
  extended by edu.stanford.nlp.parser.lexparser.AbstractTreebankParserParams
All Implemented Interfaces:
TreebankLangParserParams, TreebankFactory, Serializable
Direct Known Subclasses:
ArabicTreebankParserParams, ChineseTreebankParserParams, EnglishTreebankParserParams, FrenchTreebankParserParams, HebrewTreebankParserParams, NegraPennTreebankParserParams, TueBaDZParserParams

public abstract class AbstractTreebankParserParams
extends Object
implements TreebankLangParserParams

An abstract class providing a common method base from which to complete a TreebankLangParserParams implementing class.

With some extending classes you'll want to have access to special attributes of the corresponding TreebankLanguagePack while taking advantage of this class's code for making the TreebankLanguagePack accessible. A good way to do this is to pass a new instance of the appropriate TreebankLanguagePack into this class's constructor, then get it back later on by casting a call to treebankLanguagePack(). See ChineseTreebankParserParams for an example.

Author:
Roger Levy
See Also:
Serialized Form

Nested Class Summary
protected static class AbstractTreebankParserParams.AnnotatePunctuationFunction
          Annotation function for mapping punctuation to PTB-style equivalence classes.
protected  class AbstractTreebankParserParams.RemoveGFSubcategoryStripper
          The job of this class is to remove subcategorizations from tag and category nodes, so as to put a tree in a suitable state for evaluation.
protected  class AbstractTreebankParserParams.SubcategoryStripper
          The job of this class is to remove subcategorizations from tag and category nodes, so as to put a tree in a suitable state for evaluation.
 
Field Summary
protected  boolean evalGF
          If true, then evaluation is over grammatical functions as well as the labels If false, then grammatical functions are stripped for evaluation.
protected  String inputEncoding
           
protected  String outputEncoding
           
protected  TreebankLanguagePack tlp
           
 
Constructor Summary
protected AbstractTreebankParserParams(TreebankLanguagePack tlp)
          Stores the passed-in TreebankLanguagePack and sets up charset encodings.
 
Method Summary
abstract  TreeTransformer collinizer()
          the tree transformer used to produce trees for evaluation.
abstract  TreeTransformer collinizerEvalb()
          the tree transformer used to produce trees for evaluation.
 Extractor<DependencyGrammar> dependencyGrammarExtractor(Options op, Index<String> wordIndex, Index<String> tagIndex)
           
static
<E> Collection<E>
dependencyObjectify(Tree t, HeadFinder hf, TreeTransformer collinizer, DependencyTyper<E> typer)
          Returns the set of dependencies in a tree, according to some DependencyTyper.
abstract  DiskTreebank diskTreebank()
          returns a DiskTreebank appropriate to the treebank source
abstract  void display()
          display language-specific settings
 String getInputEncoding()
          Returns the input encoding being used.
 String getOutputEncoding()
          Returns the output encoding being used.
abstract  HeadFinder headFinder()
          The HeadFinder to use for your treebank.
 boolean isEvalGF()
           
 Lexicon lex(Index<String> wordIndex, Index<String> tagIndex)
           
 Lexicon lex(Options op, Index<String> wordIndex, Index<String> tagIndex)
          Vends a Lexicon object suitable to the particular language/treebank combination of interest.
abstract  MemoryTreebank memoryTreebank()
          returns a MemoryTreebank appropriate to the treebank source
 double[] MLEDependencyGrammarSmoothingParams()
          Give the parameters for smoothing in the MLEDependencyGrammar.
static Collection<Constituent> parsevalObjectify(Tree t, TreeTransformer collinizer)
          Takes a Tree and a collinizer and returns a Collection of labeled Constituents for PARSEVAL.
static Collection<Constituent> parsevalObjectify(Tree t, TreeTransformer collinizer, boolean labelConstituents)
          Takes a Tree and a collinizer and returns a Collection of Constituents for PARSEVAL evaluation.
 AbstractEval ppAttachmentEval()
          Returns a language specific object for evaluating PP attachment
 Label processHeadWord(Label headWord)
          Allows language specific processing (e.g., stemming) of head words.
 PrintWriter pw()
          The PrintWriter used to print output.
 PrintWriter pw(OutputStream o)
          The PrintWriter used to print output.
 void setEvalGF(boolean evalGF)
           
 void setEvaluateGrammaticalFunctions(boolean evalGFs)
          Sets whether to consider grammatical functions in evaluation
 void setInputEncoding(String encoding)
          Sets the input encoding.
 int setOptionFlag(String[] args, int i)
          Set language-specific options according to flags.
 void setOutputEncoding(String encoding)
          Sets the output encoding.
 void setupForEval()
          Convenience method for setting state parameters specific to evaluation.
abstract  String[] sisterSplitters()
          Returns the splitting strings used for selective splits.
 TreeTransformer subcategoryStripper()
          Returns a TreeTransformer appropriate to the Treebank which can be used to remove functional tags (such as "-TMP") from categories.
 MemoryTreebank testMemoryTreebank()
          You can often return the same thing for testMemoryTreebank as for memoryTreebank
abstract  Tree transformTree(Tree t, Tree root)
          This method does language-specific tree transformations such as annotating particular nodes with language-relevant features.
 Treebank treebank()
          Implemented as required by TreebankFactory.
 TreebankLanguagePack treebankLanguagePack()
          Returns an appropriate treebankLanguagePack
 TokenizerFactory<Tree> treeTokenizerFactory()
           
static EquivalenceClasser<List<String>,String> typedDependencyClasser()
          Returns an EquivalenceClasser that classes typed dependencies by the syntactic categories of mother, head and daughter, plus direction.
abstract  HeadFinder typedDependencyHeadFinder()
          The HeadFinder to use when extracting typed dependencies.
static Collection<List<String>> typedDependencyObjectify(Tree t, HeadFinder hf, TreeTransformer collinizer)
          Returns a collection of word-word dependencies typed by mother, head, daughter node syntactic categories.
static Collection<List<String>> unorderedTypedDependencyObjectify(Tree t, HeadFinder hf, TreeTransformer collinizer)
          Returns a collection of unordered (but directed!) typed word-word dependencies for the tree.
static Collection<List<String>> unorderedUntypedDependencyObjectify(Tree t, HeadFinder hf, TreeTransformer collinizer)
          Returns a collection of unordered (but directed!) untyped word-word dependencies for the tree.
static Collection<List<String>> untypedDependencyObjectify(Tree t, HeadFinder hf, TreeTransformer collinizer)
          Returns a collection of untyped word-word dependencies for the tree.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface edu.stanford.nlp.parser.lexparser.TreebankLangParserParams
defaultTestSentence, treeReaderFactory
 

Field Detail

evalGF

protected boolean evalGF
If true, then evaluation is over grammatical functions as well as the labels If false, then grammatical functions are stripped for evaluation. This really only makes sense if you've trained with grammatical functions but want to evaluate without them.


inputEncoding

protected String inputEncoding

outputEncoding

protected String outputEncoding

tlp

protected TreebankLanguagePack tlp
Constructor Detail

AbstractTreebankParserParams

protected AbstractTreebankParserParams(TreebankLanguagePack tlp)
Stores the passed-in TreebankLanguagePack and sets up charset encodings.

Parameters:
tlp - The treebank language pack to use
Method Detail

setupForEval

public void setupForEval()
Description copied from interface: TreebankLangParserParams
Convenience method for setting state parameters specific to evaluation. For example, if grammatical functions are retained during training but discarded during evaluation, this method may be used to make that state change.

Specified by:
setupForEval in interface TreebankLangParserParams

processHeadWord

public Label processHeadWord(Label headWord)
Description copied from interface: TreebankLangParserParams
Allows language specific processing (e.g., stemming) of head words.

Specified by:
processHeadWord in interface TreebankLangParserParams
Parameters:
headWord - An Label that minimally implements the HasWord and HasTag interfaces.
Returns:
A processed Label

setEvaluateGrammaticalFunctions

public void setEvaluateGrammaticalFunctions(boolean evalGFs)
Sets whether to consider grammatical functions in evaluation

Specified by:
setEvaluateGrammaticalFunctions in interface TreebankLangParserParams

setInputEncoding

public void setInputEncoding(String encoding)
Sets the input encoding.

Specified by:
setInputEncoding in interface TreebankLangParserParams

setOutputEncoding

public void setOutputEncoding(String encoding)
Sets the output encoding.

Specified by:
setOutputEncoding in interface TreebankLangParserParams

getOutputEncoding

public String getOutputEncoding()
Returns the output encoding being used.

Specified by:
getOutputEncoding in interface TreebankLangParserParams
Returns:
The output encoding being used.

getInputEncoding

public String getInputEncoding()
Returns the input encoding being used.

Specified by:
getInputEncoding in interface TreebankLangParserParams
Returns:
The input encoding being used.

ppAttachmentEval

public AbstractEval ppAttachmentEval()
Returns a language specific object for evaluating PP attachment

Specified by:
ppAttachmentEval in interface TreebankLangParserParams
Returns:
An object that implements AbstractEval

memoryTreebank

public abstract MemoryTreebank memoryTreebank()
returns a MemoryTreebank appropriate to the treebank source

Specified by:
memoryTreebank in interface TreebankLangParserParams

diskTreebank

public abstract DiskTreebank diskTreebank()
returns a DiskTreebank appropriate to the treebank source

Specified by:
diskTreebank in interface TreebankLangParserParams

testMemoryTreebank

public MemoryTreebank testMemoryTreebank()
You can often return the same thing for testMemoryTreebank as for memoryTreebank

Specified by:
testMemoryTreebank in interface TreebankLangParserParams

treebank

public Treebank treebank()
Implemented as required by TreebankFactory. Use diskTreebank() instead.

Specified by:
treebank in interface TreebankLangParserParams
Specified by:
treebank in interface TreebankFactory

pw

public PrintWriter pw()
The PrintWriter used to print output. It's the responsibility of pw to deal properly with character encodings for the relevant treebank.

Specified by:
pw in interface TreebankLangParserParams

pw

public PrintWriter pw(OutputStream o)
The PrintWriter used to print output. It's the responsibility of pw to deal properly with character encodings for the relevant treebank.

Specified by:
pw in interface TreebankLangParserParams

treebankLanguagePack

public TreebankLanguagePack treebankLanguagePack()
Returns an appropriate treebankLanguagePack

Specified by:
treebankLanguagePack in interface TreebankLangParserParams

headFinder

public abstract HeadFinder headFinder()
The HeadFinder to use for your treebank.

Specified by:
headFinder in interface TreebankLangParserParams

typedDependencyHeadFinder

public abstract HeadFinder typedDependencyHeadFinder()
The HeadFinder to use when extracting typed dependencies.

Specified by:
typedDependencyHeadFinder in interface TreebankLangParserParams

lex

public Lexicon lex(Index<String> wordIndex,
                   Index<String> tagIndex)

lex

public Lexicon lex(Options op,
                   Index<String> wordIndex,
                   Index<String> tagIndex)
Description copied from interface: TreebankLangParserParams
Vends a Lexicon object suitable to the particular language/treebank combination of interest.

Specified by:
lex in interface TreebankLangParserParams
Parameters:
op - Options as to how the Lexicon behaves
Returns:
A Lexicon, constructed based on the given option

MLEDependencyGrammarSmoothingParams

public double[] MLEDependencyGrammarSmoothingParams()
Give the parameters for smoothing in the MLEDependencyGrammar. Defaults are the ones previously hard coded into MLEDependencyGrammar.

Specified by:
MLEDependencyGrammarSmoothingParams in interface TreebankLangParserParams
Returns:
an array of doubles with smooth_aT_hTWd, smooth_aTW_hTWd, smooth_stop, and interp

parsevalObjectify

public static Collection<Constituent> parsevalObjectify(Tree t,
                                                        TreeTransformer collinizer)
Takes a Tree and a collinizer and returns a Collection of labeled Constituents for PARSEVAL.

Parameters:
t - The tree to extract constituents from
collinizer - The TreeTransformer used to normalize the tree for evaluation
Returns:
The bag of Constituents for PARSEVAL.

parsevalObjectify

public static Collection<Constituent> parsevalObjectify(Tree t,
                                                        TreeTransformer collinizer,
                                                        boolean labelConstituents)
Takes a Tree and a collinizer and returns a Collection of Constituents for PARSEVAL evaluation. Some notes on this particular parseval: (Note that I haven't checked this rigorously yet with the PARSEVAL definition -- Roger.)


untypedDependencyObjectify

public static Collection<List<String>> untypedDependencyObjectify(Tree t,
                                                                  HeadFinder hf,
                                                                  TreeTransformer collinizer)
Returns a collection of untyped word-word dependencies for the tree.


unorderedUntypedDependencyObjectify

public static Collection<List<String>> unorderedUntypedDependencyObjectify(Tree t,
                                                                           HeadFinder hf,
                                                                           TreeTransformer collinizer)
Returns a collection of unordered (but directed!) untyped word-word dependencies for the tree.


typedDependencyObjectify

public static Collection<List<String>> typedDependencyObjectify(Tree t,
                                                                HeadFinder hf,
                                                                TreeTransformer collinizer)
Returns a collection of word-word dependencies typed by mother, head, daughter node syntactic categories.


unorderedTypedDependencyObjectify

public static Collection<List<String>> unorderedTypedDependencyObjectify(Tree t,
                                                                         HeadFinder hf,
                                                                         TreeTransformer collinizer)
Returns a collection of unordered (but directed!) typed word-word dependencies for the tree.


dependencyObjectify

public static <E> Collection<E> dependencyObjectify(Tree t,
                                                    HeadFinder hf,
                                                    TreeTransformer collinizer,
                                                    DependencyTyper<E> typer)
Returns the set of dependencies in a tree, according to some DependencyTyper.


typedDependencyClasser

public static EquivalenceClasser<List<String>,String> typedDependencyClasser()
Returns an EquivalenceClasser that classes typed dependencies by the syntactic categories of mother, head and daughter, plus direction.

Returns:
An Equivalence class for typed dependencies

collinizer

public abstract TreeTransformer collinizer()
the tree transformer used to produce trees for evaluation. Will be applied both to the parse output tree and to the gold tree. Should strip punctuation and maybe do some other things.

Specified by:
collinizer in interface TreebankLangParserParams
Returns:
A TreeTransformer that performs adjustments to trees to delete or equivalence class things not evaluated in the parser performance evaluation.

collinizerEvalb

public abstract TreeTransformer collinizerEvalb()
the tree transformer used to produce trees for evaluation. Will be applied both to the parse output tree and to the gold tree. Should strip punctuation and maybe do some other things. The evalb version should strip some more stuff off. (finish this doc!)

Specified by:
collinizerEvalb in interface TreebankLangParserParams

sisterSplitters

public abstract String[] sisterSplitters()
Returns the splitting strings used for selective splits.

Specified by:
sisterSplitters in interface TreebankLangParserParams
Returns:
An array containing ancestor-annotated Strings: categories should be split according to these ancestor annotations.

subcategoryStripper

public TreeTransformer subcategoryStripper()
Returns a TreeTransformer appropriate to the Treebank which can be used to remove functional tags (such as "-TMP") from categories. Removes GFs if evalGF = false; if GFs were not used in training, results are equivalent.

Specified by:
subcategoryStripper in interface TreebankLangParserParams

transformTree

public abstract Tree transformTree(Tree t,
                                   Tree root)
This method does language-specific tree transformations such as annotating particular nodes with language-relevant features. Such parameterizations should be inside the specific TreebankLangParserParams class. This method is recursively applied to each node in the tree (depth first, left-to-right), so you shouldn't write this method to apply recursively to tree members. This method is allowed to (and in some cases does) destructively change the input tree t. It changes both labels and the tree shape.

Specified by:
transformTree in interface TreebankLangParserParams
Parameters:
t - The input tree (with non-language specific annotation already done, so you need to strip back to basic categories)
root - The root of the current tree (can be null for words)
Returns:
The fully annotated tree node (with daughters still as you want them in the final result)

display

public abstract void display()
display language-specific settings

Specified by:
display in interface TreebankLangParserParams

setOptionFlag

public int setOptionFlag(String[] args,
                         int i)
Set language-specific options according to flags. This routine should process the option starting in args[i] (which might potentially be several arguments long if it takes arguments). It should return the index after the last index it consumed in processing. In particular, if it cannot process the current option, the return value should be i.

Generic options are processed separately by Options.setOption(String[],int), and implementations of this method do not have to worry about them. The Options class handles routing options. TreebankParserParams that extend this class should call super when overriding this method.

Specified by:
setOptionFlag in interface TreebankLangParserParams
Parameters:
args - Array of command line arguments
i - Index in command line arguments to try to process as an option
Returns:
The index of the item after arguments processed as part of this command line option.

treeTokenizerFactory

public TokenizerFactory<Tree> treeTokenizerFactory()
Specified by:
treeTokenizerFactory in interface TreebankLangParserParams

dependencyGrammarExtractor

public Extractor<DependencyGrammar> dependencyGrammarExtractor(Options op,
                                                               Index<String> wordIndex,
                                                               Index<String> tagIndex)
Specified by:
dependencyGrammarExtractor in interface TreebankLangParserParams

isEvalGF

public boolean isEvalGF()

setEvalGF

public void setEvalGF(boolean evalGF)


Stanford NLP Group