edu.stanford.nlp.trees.international.tuebadz
Class TueBaDZLanguagePack

java.lang.Object
  extended by edu.stanford.nlp.trees.AbstractTreebankLanguagePack
      extended by edu.stanford.nlp.trees.international.tuebadz.TueBaDZLanguagePack
All Implemented Interfaces:
TreebankLanguagePack, Serializable

public class TueBaDZLanguagePack
extends AbstractTreebankLanguagePack

Language pack for the Tuebingen Treebank of Written German (TueBa-D/Z). http://www.sfs.nphil.uni-tuebingen.de/en_tuebadz.shtml This treebank is in utf-8.

Author:
Roger Levy (rog@stanford.edu)
See Also:
Serialized Form

Field Summary
 
Fields inherited from class edu.stanford.nlp.trees.AbstractTreebankLanguagePack
DEFAULT_ENCODING, DEFAULT_GF_CHAR, gfCharacter
 
Constructor Summary
TueBaDZLanguagePack()
          Gives a handle to the TreebankLanguagePack
TueBaDZLanguagePack(boolean leaveGF)
          Make a new language pack with grammatical functions used based on the value of leaveGF
TueBaDZLanguagePack(boolean useLimitedGF, boolean leaveGF, char gfChar)
          Make a new language pack with grammatical functions used based on the value of leaveGF and marked with the character gfChar.
TueBaDZLanguagePack(boolean leaveGF, char gfChar)
          Make a new language pack with grammatical functions used based on the value of leaveGF and marked with the character gfChar.
 
Method Summary
 String basicCategory(String category)
          Returns the basic syntactic category of a String.
 String getEncoding()
          Return the input Charset encoding for the Treebank.
 HeadFinder headFinder()
          The HeadFinder to use for your treebank.
 boolean isLeaveGF()
           
 boolean isLimitedGF()
           
 char[] labelAnnotationIntroducingCharacters()
          Return an array of characters at which a String should be truncated to give the basic syntactic category of a label.
static void main(String[] args)
          Prints a few aspects of the TreebankLanguagePack, just for debugging.
 String[] punctuationTags()
          Returns a String array of punctuation tags for this treebank/language.
 String[] punctuationWords()
          Returns a String array of punctuation words for this treebank/language.
 String[] sentenceFinalPunctuationTags()
          Returns a String array of sentence final punctuation tags for this treebank/language.
 String[] sentenceFinalPunctuationWords()
          Returns a String array of sentence final punctuation words for this treebank/language.
 void setLeaveGF(boolean leaveGF)
           
 void setLimitedGF(boolean limitedGF)
           
 String[] startSymbols()
          Returns a String array of treebank start symbols.
 String stripGF(String category)
          Returns the category for a String with everything following the gf character (which may be language specific) stripped.
 String treebankFileExtension()
          Returns the extension of treebank files for this treebank.
 TreeReaderFactory treeReaderFactory()
          Returns a TreeReaderFactory suitable for general purpose use with this language/treebank.
 
Methods inherited from class edu.stanford.nlp.trees.AbstractTreebankLanguagePack
categoryAndFunction, evalBIgnoredPunctuationTagAcceptFilter, evalBIgnoredPunctuationTagRejectFilter, evalBIgnoredPunctuationTags, getBasicCategoryFunction, getCategoryAndFunctionFunction, getGfCharacter, getTokenizerFactory, grammaticalStructureFactory, grammaticalStructureFactory, isEvalBIgnoredPunctuationTag, isLabelAnnotationIntroducingCharacter, isPunctuationTag, isPunctuationWord, isSentenceFinalPunctuationTag, isStartSymbol, morphFeatureSpec, punctuationTagAcceptFilter, punctuationTagRejectFilter, punctuationWordAcceptFilter, punctuationWordRejectFilter, sentenceFinalPunctuationTagAcceptFilter, setGfCharacter, startSymbol, startSymbolAcceptFilter, treeTokenizerFactory
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TueBaDZLanguagePack

public TueBaDZLanguagePack()
Gives a handle to the TreebankLanguagePack


TueBaDZLanguagePack

public TueBaDZLanguagePack(boolean leaveGF)
Make a new language pack with grammatical functions used based on the value of leaveGF


TueBaDZLanguagePack

public TueBaDZLanguagePack(boolean leaveGF,
                           char gfChar)
Make a new language pack with grammatical functions used based on the value of leaveGF and marked with the character gfChar. gfChar should *not* be an annotation introducing character.


TueBaDZLanguagePack

public TueBaDZLanguagePack(boolean useLimitedGF,
                           boolean leaveGF,
                           char gfChar)
Make a new language pack with grammatical functions used based on the value of leaveGF and marked with the character gfChar. gfChar should *not* be an annotation introducing character.

Method Detail

labelAnnotationIntroducingCharacters

public char[] labelAnnotationIntroducingCharacters()
Return an array of characters at which a String should be truncated to give the basic syntactic category of a label. The idea here is that Penn treebank style labels follow a syntactic category with various functional and crossreferencing information introduced by special characters (such as "NP-SBJ=1"). This would be truncated to "NP" by the array containing '-' and "=".

Specified by:
labelAnnotationIntroducingCharacters in interface TreebankLanguagePack
Overrides:
labelAnnotationIntroducingCharacters in class AbstractTreebankLanguagePack
Returns:
An array of characters that set off label name suffixes

punctuationTags

public String[] punctuationTags()
Description copied from class: AbstractTreebankLanguagePack
Returns a String array of punctuation tags for this treebank/language.

Specified by:
punctuationTags in interface TreebankLanguagePack
Specified by:
punctuationTags in class AbstractTreebankLanguagePack
Returns:
The punctuation tags

punctuationWords

public String[] punctuationWords()
Description copied from class: AbstractTreebankLanguagePack
Returns a String array of punctuation words for this treebank/language.

Specified by:
punctuationWords in interface TreebankLanguagePack
Specified by:
punctuationWords in class AbstractTreebankLanguagePack
Returns:
The punctuation words

sentenceFinalPunctuationTags

public String[] sentenceFinalPunctuationTags()
Description copied from class: AbstractTreebankLanguagePack
Returns a String array of sentence final punctuation tags for this treebank/language.

Specified by:
sentenceFinalPunctuationTags in interface TreebankLanguagePack
Specified by:
sentenceFinalPunctuationTags in class AbstractTreebankLanguagePack
Returns:
The sentence final punctuation tags

startSymbols

public String[] startSymbols()
Description copied from class: AbstractTreebankLanguagePack
Returns a String array of treebank start symbols.

Specified by:
startSymbols in interface TreebankLanguagePack
Specified by:
startSymbols in class AbstractTreebankLanguagePack
Returns:
The start symbols

sentenceFinalPunctuationWords

public String[] sentenceFinalPunctuationWords()
Description copied from interface: TreebankLanguagePack
Returns a String array of sentence final punctuation words for this treebank/language.

Returns:
The punctuation words

treebankFileExtension

public String treebankFileExtension()
Description copied from interface: TreebankLanguagePack
Returns the extension of treebank files for this treebank. This should be passed as an argument to Treebank loading classes. It might be "mrg" or "fid" or whatever. Don't inlcude the period.

Returns:
the extension on files for this treebank

basicCategory

public String basicCategory(String category)
Description copied from class: AbstractTreebankLanguagePack
Returns the basic syntactic category of a String. This implementation basically truncates stuff after an occurrence of one of the labelAnnotationIntroducingCharacters(). However, there is also special case stuff to deal with labelAnnotationIntroducingCharacters in category labels: (i) if the first char is in this set, it's never truncated (e.g., '-' or '=' as a token), and (ii) if it starts with one of this set, a second instance of the same item from this set is also excluded (to deal with '-LLB-', '-RCB-', etc.).

Specified by:
basicCategory in interface TreebankLanguagePack
Overrides:
basicCategory in class AbstractTreebankLanguagePack
Parameters:
category - The whole String name of the label
Returns:
The basic category of the String

stripGF

public String stripGF(String category)
Description copied from interface: TreebankLanguagePack
Returns the category for a String with everything following the gf character (which may be language specific) stripped.

Specified by:
stripGF in interface TreebankLanguagePack
Overrides:
stripGF in class AbstractTreebankLanguagePack
Parameters:
category - The String name of the label (may previously have had basic category called on it)
Returns:
The String stripped of grammatical functions

isLeaveGF

public boolean isLeaveGF()

setLeaveGF

public void setLeaveGF(boolean leaveGF)

getEncoding

public String getEncoding()
Return the input Charset encoding for the Treebank. See documentation for the Charset class.

Specified by:
getEncoding in interface TreebankLanguagePack
Overrides:
getEncoding in class AbstractTreebankLanguagePack
Returns:
Name of Charset

main

public static void main(String[] args)
Prints a few aspects of the TreebankLanguagePack, just for debugging.


isLimitedGF

public boolean isLimitedGF()

setLimitedGF

public void setLimitedGF(boolean limitedGF)

treeReaderFactory

public TreeReaderFactory treeReaderFactory()
Description copied from class: AbstractTreebankLanguagePack
Returns a TreeReaderFactory suitable for general purpose use with this language/treebank.

Specified by:
treeReaderFactory in interface TreebankLanguagePack
Overrides:
treeReaderFactory in class AbstractTreebankLanguagePack
Returns:
A TreeReaderFactory suitable for general purpose use with this language/treebank.

headFinder

public HeadFinder headFinder()
The HeadFinder to use for your treebank.

Returns:
A suitable HeadFinder


Stanford NLP Group