edu.stanford.nlp.trees
Class PennTreebankLanguagePack

java.lang.Object
  extended by edu.stanford.nlp.trees.AbstractTreebankLanguagePack
      extended by edu.stanford.nlp.trees.PennTreebankLanguagePack
All Implemented Interfaces:
TreebankLanguagePack, java.io.Serializable

public class PennTreebankLanguagePack
extends AbstractTreebankLanguagePack

Specifies the treebank/language specific components needed for parsing the English Penn Treebank.

Author:
Christopher Manning
See Also:
Serialized Form

Field Summary
 
Fields inherited from class edu.stanford.nlp.trees.AbstractTreebankLanguagePack
DEFAULT_ENCODING, DEFAULT_GF_CHAR, gfCharacter
 
Constructor Summary
PennTreebankLanguagePack()
          Gives a handle to the TreebankLanguagePack
 
Method Summary
 java.lang.String[] evalBIgnoredPunctuationTags()
          Returns a String array of punctuation tags that EVALB-style evaluation should ignore for this treebank/language.
 TokenizerFactory<Word> getTokenizerFactory()
          Returns a factory for PTBTokenizer.
 GrammaticalStructureFactory grammaticalStructureFactory()
          Return a GrammaticalStructure suitable for this language/treebank.
 GrammaticalStructureFactory grammaticalStructureFactory(Filter<java.lang.String> puncFilter)
          Return a GrammaticalStructure suitable for this language/treebank.
 GrammaticalStructureFactory grammaticalStructureFactory(Filter<java.lang.String> puncFilter, HeadFinder hf)
           
 HeadFinder headFinder()
          The HeadFinder to use for your treebank.
 char[] labelAnnotationIntroducingCharacters()
          Return an array of characters at which a String should be truncated to give the basic syntactic category of a label.
static void main(java.lang.String[] args)
          Prints a few aspects of the TreebankLanguagePack, just for debugging.
 java.lang.String[] punctuationTags()
          Returns a String array of punctuation tags for this treebank/language.
 java.lang.String[] punctuationWords()
          Returns a String array of punctuation words for this treebank/language.
 java.lang.String[] sentenceFinalPunctuationTags()
          Returns a String array of sentence final punctuation tags for this treebank/language.
 java.lang.String[] sentenceFinalPunctuationWords()
          Returns a String array of sentence final punctuation words for this treebank/language.
 java.lang.String[] startSymbols()
          Returns a String array of treebank start symbols.
 java.lang.String treebankFileExtension()
          Returns the extension of treebank files for this treebank.
 
Methods inherited from class edu.stanford.nlp.trees.AbstractTreebankLanguagePack
basicCategory, categoryAndFunction, evalBIgnoredPunctuationTagAcceptFilter, evalBIgnoredPunctuationTagRejectFilter, getBasicCategoryFunction, getCategoryAndFunctionFunction, getEncoding, getGfCharacter, isEvalBIgnoredPunctuationTag, isLabelAnnotationIntroducingCharacter, isPunctuationTag, isPunctuationWord, isSentenceFinalPunctuationTag, isStartSymbol, morphFeatureSpec, punctuationTagAcceptFilter, punctuationTagRejectFilter, punctuationWordAcceptFilter, punctuationWordRejectFilter, sentenceFinalPunctuationTagAcceptFilter, setGfCharacter, startSymbol, startSymbolAcceptFilter, stripGF, treeReaderFactory, treeTokenizerFactory
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PennTreebankLanguagePack

public PennTreebankLanguagePack()
Gives a handle to the TreebankLanguagePack

Method Detail

punctuationTags

public java.lang.String[] punctuationTags()
Returns a String array of punctuation tags for this treebank/language.

Specified by:
punctuationTags in interface TreebankLanguagePack
Specified by:
punctuationTags in class AbstractTreebankLanguagePack
Returns:
The punctuation tags

punctuationWords

public java.lang.String[] punctuationWords()
Returns a String array of punctuation words for this treebank/language.

Specified by:
punctuationWords in interface TreebankLanguagePack
Specified by:
punctuationWords in class AbstractTreebankLanguagePack
Returns:
The punctuation words

sentenceFinalPunctuationTags

public java.lang.String[] sentenceFinalPunctuationTags()
Returns a String array of sentence final punctuation tags for this treebank/language.

Specified by:
sentenceFinalPunctuationTags in interface TreebankLanguagePack
Specified by:
sentenceFinalPunctuationTags in class AbstractTreebankLanguagePack
Returns:
The sentence final punctuation tags

sentenceFinalPunctuationWords

public java.lang.String[] sentenceFinalPunctuationWords()
Returns a String array of sentence final punctuation words for this treebank/language.

Returns:
The sentence final punctuation tags

evalBIgnoredPunctuationTags

public java.lang.String[] evalBIgnoredPunctuationTags()
Returns a String array of punctuation tags that EVALB-style evaluation should ignore for this treebank/language. Traditionally, EVALB has ignored a subset of the total set of punctuation tags in the English Penn Treebank (quotes and period, comma, colon, etc., but not brackets)

Specified by:
evalBIgnoredPunctuationTags in interface TreebankLanguagePack
Overrides:
evalBIgnoredPunctuationTags in class AbstractTreebankLanguagePack
Returns:
Whether this is a EVALB-ignored punctuation tag

labelAnnotationIntroducingCharacters

public char[] labelAnnotationIntroducingCharacters()
Return an array of characters at which a String should be truncated to give the basic syntactic category of a label. The idea here is that Penn treebank style labels follow a syntactic category with various functional and crossreferencing information introduced by special characters (such as "NP-SBJ=1"). This would be truncated to "NP" by the array containing '-' and "=".

Specified by:
labelAnnotationIntroducingCharacters in interface TreebankLanguagePack
Overrides:
labelAnnotationIntroducingCharacters in class AbstractTreebankLanguagePack
Returns:
An array of characters that set off label name suffixes

startSymbols

public java.lang.String[] startSymbols()
Returns a String array of treebank start symbols.

Specified by:
startSymbols in interface TreebankLanguagePack
Specified by:
startSymbols in class AbstractTreebankLanguagePack
Returns:
The start symbols

getTokenizerFactory

public TokenizerFactory<Word> getTokenizerFactory()
Returns a factory for PTBTokenizer.

Specified by:
getTokenizerFactory in interface TreebankLanguagePack
Overrides:
getTokenizerFactory in class AbstractTreebankLanguagePack
Returns:
A tokenizer

treebankFileExtension

public java.lang.String treebankFileExtension()
Returns the extension of treebank files for this treebank. This is "mrg".

Returns:
the extension on files for this treebank

grammaticalStructureFactory

public GrammaticalStructureFactory grammaticalStructureFactory()
Return a GrammaticalStructure suitable for this language/treebank.

Specified by:
grammaticalStructureFactory in interface TreebankLanguagePack
Overrides:
grammaticalStructureFactory in class AbstractTreebankLanguagePack
Returns:
A GrammaticalStructure suitable for this language/treebank.

grammaticalStructureFactory

public GrammaticalStructureFactory grammaticalStructureFactory(Filter<java.lang.String> puncFilter)
Return a GrammaticalStructure suitable for this language/treebank.

Note: This is loaded by reflection so basic treebank use does not require all the Stanford Dependencies code.

Specified by:
grammaticalStructureFactory in interface TreebankLanguagePack
Overrides:
grammaticalStructureFactory in class AbstractTreebankLanguagePack
Parameters:
puncFilter - A filter which should reject punctuation words (as Strings)
Returns:
A GrammaticalStructure suitable for this language/treebank.

grammaticalStructureFactory

public GrammaticalStructureFactory grammaticalStructureFactory(Filter<java.lang.String> puncFilter,
                                                               HeadFinder hf)

headFinder

public HeadFinder headFinder()
The HeadFinder to use for your treebank.

Returns:
A suitable HeadFinder

main

public static void main(java.lang.String[] args)
Prints a few aspects of the TreebankLanguagePack, just for debugging.



Stanford NLP Group