public class NegraPennLanguagePack extends AbstractTreebankLanguagePack
DEFAULT_ENCODING, DEFAULT_GF_CHAR, gfCharacter
Constructor and Description |
---|
NegraPennLanguagePack()
Gives a handle to the TreebankLanguagePack
|
NegraPennLanguagePack(boolean leaveGF)
Gives a handle to the TreebankLanguagePack
|
NegraPennLanguagePack(boolean leaveGF,
char gfChar)
Make a new language pack with grammatical functions used based on the value of leaveGF
and marked with the character gfChar.
|
Modifier and Type | Method and Description |
---|---|
String |
basicCategory(String category)
Returns the basic syntactic category of a String.
|
String[] |
evalBIgnoredPunctuationTags()
Returns a String array of punctuation tags that EVALB-style evaluation
should ignore for this treebank/language.
|
String |
getEncoding()
Return the input Charset encoding for the Treebank.
|
TokenizerFactory<Word> |
getTokenizerFactory()
Return a tokenizer which might be suitable for tokenizing text that
will be used with this Treebank/Language pair, without tokenizing carriage
returns (i.e., treating them as white space).
|
HeadFinder |
headFinder()
The HeadFinder to use for your treebank.
|
boolean |
isLeaveGF() |
char[] |
labelAnnotationIntroducingCharacters()
Return an array of characters at which a String should be
truncated to give the basic syntactic category of a label.
|
String[] |
punctuationTags()
Returns a String array of punctuation tags for this treebank/language.
|
String[] |
punctuationWords()
Returns a String array of punctuation words for this treebank/language.
|
String[] |
sentenceFinalPunctuationTags()
Returns a String array of sentence final punctuation tags for this
treebank/language.
|
String[] |
sentenceFinalPunctuationWords()
Returns a String array of sentence final punctuation words for this
treebank/language.
|
void |
setLeaveGF(boolean leaveGF) |
String[] |
startSymbols()
Returns a String array of treebank start symbols.
|
String |
stripGF(String category)
Returns the category for a String with everything following
the gf character (which may be language specific) stripped.
|
String |
treebankFileExtension()
Returns the extension of treebank files for this treebank.
|
TreeReaderFactory |
treeReaderFactory()
Returns a TreeReaderFactory suitable for general purpose use
with this language/treebank.
|
HeadFinder |
typedDependencyHeadFinder()
The HeadFinder to use when making typed dependencies.
|
categoryAndFunction, evalBIgnoredPunctuationTagAcceptFilter, evalBIgnoredPunctuationTagRejectFilter, getBasicCategoryFunction, getCategoryAndFunctionFunction, getGfCharacter, grammaticalStructureFactory, grammaticalStructureFactory, grammaticalStructureFactory, isEvalBIgnoredPunctuationTag, isLabelAnnotationIntroducingCharacter, isPunctuationTag, isPunctuationWord, isSentenceFinalPunctuationTag, isStartSymbol, morphFeatureSpec, punctuationTagAcceptFilter, punctuationTagRejectFilter, punctuationWordAcceptFilter, punctuationWordRejectFilter, sentenceFinalPunctuationTagAcceptFilter, setGfCharacter, startSymbol, startSymbolAcceptFilter, supportsGrammaticalStructures, treeTokenizerFactory
public NegraPennLanguagePack()
public NegraPennLanguagePack(boolean leaveGF)
public NegraPennLanguagePack(boolean leaveGF, char gfChar)
public String[] punctuationTags()
punctuationTags
in interface TreebankLanguagePack
punctuationTags
in class AbstractTreebankLanguagePack
public String[] punctuationWords()
punctuationWords
in interface TreebankLanguagePack
punctuationWords
in class AbstractTreebankLanguagePack
public String[] sentenceFinalPunctuationTags()
sentenceFinalPunctuationTags
in interface TreebankLanguagePack
sentenceFinalPunctuationTags
in class AbstractTreebankLanguagePack
public String[] sentenceFinalPunctuationWords()
public String basicCategory(String category)
AbstractTreebankLanguagePack
labelAnnotationIntroducingCharacters()
.
However, there is also special case stuff to deal with
labelAnnotationIntroducingCharacters in category labels:
(i) if the first char is in this set, it's never truncated
(e.g., '-' or '=' as a token), and (ii) if it starts with
one of this set, a second instance of the same item from this set is
also excluded (to deal with '-LLB-', '-RCB-', etc.).basicCategory
in interface TreebankLanguagePack
basicCategory
in class AbstractTreebankLanguagePack
category
- The whole String name of the labelpublic String stripGF(String category)
TreebankLanguagePack
stripGF
in interface TreebankLanguagePack
stripGF
in class AbstractTreebankLanguagePack
category
- The String name of the label (may previously have had basic category called on it)public String[] evalBIgnoredPunctuationTags()
evalBIgnoredPunctuationTags
in interface TreebankLanguagePack
evalBIgnoredPunctuationTags
in class AbstractTreebankLanguagePack
public char[] labelAnnotationIntroducingCharacters()
labelAnnotationIntroducingCharacters
in interface TreebankLanguagePack
labelAnnotationIntroducingCharacters
in class AbstractTreebankLanguagePack
public String[] startSymbols()
startSymbols
in interface TreebankLanguagePack
startSymbols
in class AbstractTreebankLanguagePack
public String getEncoding()
Charset
class.getEncoding
in interface TreebankLanguagePack
getEncoding
in class AbstractTreebankLanguagePack
public String treebankFileExtension()
public boolean isLeaveGF()
public void setLeaveGF(boolean leaveGF)
public TreeReaderFactory treeReaderFactory()
AbstractTreebankLanguagePack
treeReaderFactory
in interface TreebankLanguagePack
treeReaderFactory
in class AbstractTreebankLanguagePack
public HeadFinder headFinder()
public HeadFinder typedDependencyHeadFinder()
public TokenizerFactory<Word> getTokenizerFactory()
WhitespaceTokenizer
,
but people didn't much like that.
So now we provide PTBTokenizer
. It's not customized to German, but
will nevertheless do better than WhitespaceTokenizer at tokenizing German!getTokenizerFactory
in interface TreebankLanguagePack
getTokenizerFactory
in class AbstractTreebankLanguagePack