public abstract class AbstractTreebankLanguagePack extends Object implements TreebankLanguagePack
Modifier and Type | Field and Description |
---|---|
static String |
DEFAULT_ENCODING
Use this as the default encoding for Readers and Writers of
Treebank data.
|
protected static char |
DEFAULT_GF_CHAR |
protected char |
gfCharacter
Default character for indicating that something is a grammatical fn; probably should be overridden by
lang specific ones
|
Constructor and Description |
---|
AbstractTreebankLanguagePack()
Gives a handle to the TreebankLanguagePack.
|
AbstractTreebankLanguagePack(char gfChar)
Gives a handle to the TreebankLanguagePack.
|
Modifier and Type | Method and Description |
---|---|
String |
basicCategory(String category)
Returns the basic syntactic category of a String.
|
String |
categoryAndFunction(String category)
Returns the syntactic category and 'function' of a String.
|
java.util.function.Predicate<String> |
evalBIgnoredPunctuationTagAcceptFilter()
Returns a filter that accepts a String that is a punctuation
tag that should be ignored by EVALB-style evaluation,
and rejects everything else.
|
java.util.function.Predicate<String> |
evalBIgnoredPunctuationTagRejectFilter()
Returns a filter that accepts everything except a String that is a
punctuation tag that should be ignored by EVALB-style evaluation.
|
String[] |
evalBIgnoredPunctuationTags()
Returns a String array of punctuation tags that EVALB-style evaluation
should ignore for this treebank/language.
|
java.util.function.Function<String,String> |
getBasicCategoryFunction()
Returns a
Function object that maps Strings to Strings according
to this TreebankLanguagePack's basicCategory() method. |
java.util.function.Function<String,String> |
getCategoryAndFunctionFunction()
Returns a
Function object that maps Strings to Strings according
to this TreebankLanguagePack's categoryAndFunction() method. |
String |
getEncoding()
Return the input Charset encoding for the Treebank.
|
char |
getGfCharacter() |
TokenizerFactory<? extends HasWord> |
getTokenizerFactory()
Return a tokenizer which might be suitable for tokenizing text that
will be used with this Treebank/Language pair, without tokenizing carriage returns (i.e., treating them as white space).
|
GrammaticalStructureFactory |
grammaticalStructureFactory()
Return a GrammaticalStructureFactory suitable for this language/treebank.
|
GrammaticalStructureFactory |
grammaticalStructureFactory(java.util.function.Predicate<String> puncFilt)
Return a GrammaticalStructureFactory suitable for this language/treebank.
|
GrammaticalStructureFactory |
grammaticalStructureFactory(java.util.function.Predicate<String> puncFilt,
HeadFinder typedDependencyHeadFinder)
Return a GrammaticalStructureFactory suitable for this language/treebank.
|
boolean |
isEvalBIgnoredPunctuationTag(String str)
Accepts a String that is a punctuation
tag that should be ignored by EVALB-style evaluation,
and rejects everything else.
|
boolean |
isLabelAnnotationIntroducingCharacter(char ch)
Say whether this character is an annotation introducing
character.
|
boolean |
isPunctuationTag(String str)
Accepts a String that is a punctuation
tag name, and rejects everything else.
|
boolean |
isPunctuationWord(String str)
Accepts a String that is a punctuation
word, and rejects everything else.
|
boolean |
isSentenceFinalPunctuationTag(String str)
Accepts a String that is a sentence end
punctuation tag, and rejects everything else.
|
boolean |
isStartSymbol(String str)
Accepts a String that is a start symbol of the treebank.
|
char[] |
labelAnnotationIntroducingCharacters()
Return an array of characters at which a String should be
truncated to give the basic syntactic category of a label.
|
MorphoFeatureSpecification |
morphFeatureSpec()
Returns a morphological feature specification for words in this language.
|
java.util.function.Predicate<String> |
punctuationTagAcceptFilter()
Return a filter that accepts a String that is a punctuation
tag name, and rejects everything else.
|
java.util.function.Predicate<String> |
punctuationTagRejectFilter()
Return a filter that rejects a String that is a punctuation
tag name, and rejects everything else.
|
abstract String[] |
punctuationTags()
Returns a String array of punctuation tags for this treebank/language.
|
java.util.function.Predicate<String> |
punctuationWordAcceptFilter()
Returns a filter that accepts a String that is a punctuation
word, and rejects everything else.
|
java.util.function.Predicate<String> |
punctuationWordRejectFilter()
Returns a filter that accepts a String that is not a punctuation
word, and rejects punctuation.
|
abstract String[] |
punctuationWords()
Returns a String array of punctuation words for this treebank/language.
|
java.util.function.Predicate<String> |
sentenceFinalPunctuationTagAcceptFilter()
Returns a filter that accepts a String that is a sentence end
punctuation tag, and rejects everything else.
|
abstract String[] |
sentenceFinalPunctuationTags()
Returns a String array of sentence final punctuation tags for this
treebank/language.
|
void |
setGfCharacter(char gfCharacter)
Sets the grammatical function indicating character to gfCharacter.
|
String |
startSymbol()
Returns a String which is the first (perhaps unique) start symbol
of the treebank, or null if none is defined.
|
java.util.function.Predicate<String> |
startSymbolAcceptFilter()
Return a filter that accepts a String that is a start symbol
of the treebank, and rejects everything else.
|
abstract String[] |
startSymbols()
Returns a String array of treebank start symbols.
|
String |
stripGF(String category)
Returns the category for a String with everything following
the gf character (which may be language specific) stripped.
|
boolean |
supportsGrammaticalStructures()
Whether or not we have typed dependencies for this language.
|
TreeReaderFactory |
treeReaderFactory()
Returns a TreeReaderFactory suitable for general purpose use
with this language/treebank.
|
TokenizerFactory<Tree> |
treeTokenizerFactory()
Return a TokenizerFactory for Trees of this language/treebank.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
headFinder, sentenceFinalPunctuationWords, treebankFileExtension, typedDependencyHeadFinder
protected char gfCharacter
protected static final char DEFAULT_GF_CHAR
public static final String DEFAULT_ENCODING
public AbstractTreebankLanguagePack()
public AbstractTreebankLanguagePack(char gfChar)
gfChar
- The character that sets of grammatical functions in node labels.public abstract String[] punctuationTags()
punctuationTags
in interface TreebankLanguagePack
public abstract String[] punctuationWords()
punctuationWords
in interface TreebankLanguagePack
public abstract String[] sentenceFinalPunctuationTags()
sentenceFinalPunctuationTags
in interface TreebankLanguagePack
public String[] evalBIgnoredPunctuationTags()
evalBIgnoredPunctuationTags
in interface TreebankLanguagePack
public boolean isPunctuationTag(String str)
isPunctuationTag
in interface TreebankLanguagePack
str
- The string to checkpublic boolean isPunctuationWord(String str)
isPunctuationWord
in interface TreebankLanguagePack
str
- The string to checkpublic boolean isSentenceFinalPunctuationTag(String str)
isSentenceFinalPunctuationTag
in interface TreebankLanguagePack
str
- The string to checkpublic boolean isEvalBIgnoredPunctuationTag(String str)
isEvalBIgnoredPunctuationTag
in interface TreebankLanguagePack
str
- The string to checkpublic java.util.function.Predicate<String> punctuationTagAcceptFilter()
punctuationTagAcceptFilter
in interface TreebankLanguagePack
public java.util.function.Predicate<String> punctuationTagRejectFilter()
punctuationTagRejectFilter
in interface TreebankLanguagePack
public java.util.function.Predicate<String> punctuationWordAcceptFilter()
punctuationWordAcceptFilter
in interface TreebankLanguagePack
public java.util.function.Predicate<String> punctuationWordRejectFilter()
punctuationWordRejectFilter
in interface TreebankLanguagePack
public java.util.function.Predicate<String> sentenceFinalPunctuationTagAcceptFilter()
sentenceFinalPunctuationTagAcceptFilter
in interface TreebankLanguagePack
public java.util.function.Predicate<String> evalBIgnoredPunctuationTagAcceptFilter()
evalBIgnoredPunctuationTagAcceptFilter
in interface TreebankLanguagePack
public java.util.function.Predicate<String> evalBIgnoredPunctuationTagRejectFilter()
evalBIgnoredPunctuationTagRejectFilter
in interface TreebankLanguagePack
public String getEncoding()
Charset
class.getEncoding
in interface TreebankLanguagePack
public char[] labelAnnotationIntroducingCharacters()
labelAnnotationIntroducingCharacters
in interface TreebankLanguagePack
public String basicCategory(String category)
labelAnnotationIntroducingCharacters()
.
However, there is also special case stuff to deal with
labelAnnotationIntroducingCharacters in category labels:
(i) if the first char is in this set, it's never truncated
(e.g., '-' or '=' as a token), and (ii) if it starts with
one of this set, a second instance of the same item from this set is
also excluded (to deal with '-LLB-', '-RCB-', etc.).basicCategory
in interface TreebankLanguagePack
category
- The whole String name of the labelpublic String stripGF(String category)
TreebankLanguagePack
stripGF
in interface TreebankLanguagePack
category
- The String name of the label (may previously have had basic category called on it)public java.util.function.Function<String,String> getBasicCategoryFunction()
Function
object that maps Strings to Strings according
to this TreebankLanguagePack's basicCategory() method.getBasicCategoryFunction
in interface TreebankLanguagePack
public String categoryAndFunction(String category)
category-function
.
This implementation strips numeric tags after label introducing
characters (assuming that non-numeric things are functional tags).categoryAndFunction
in interface TreebankLanguagePack
category
- The whole String name of the labelpublic java.util.function.Function<String,String> getCategoryAndFunctionFunction()
Function
object that maps Strings to Strings according
to this TreebankLanguagePack's categoryAndFunction() method.getCategoryAndFunctionFunction
in interface TreebankLanguagePack
public boolean isLabelAnnotationIntroducingCharacter(char ch)
isLabelAnnotationIntroducingCharacter
in interface TreebankLanguagePack
ch
- The character to checkpublic boolean isStartSymbol(String str)
isStartSymbol
in interface TreebankLanguagePack
str
- The str to testpublic java.util.function.Predicate<String> startSymbolAcceptFilter()
startSymbolAcceptFilter
in interface TreebankLanguagePack
public abstract String[] startSymbols()
startSymbols
in interface TreebankLanguagePack
public String startSymbol()
startSymbol
in interface TreebankLanguagePack
public TokenizerFactory<? extends HasWord> getTokenizerFactory()
WhitespaceTokenizer
.getTokenizerFactory
in interface TreebankLanguagePack
public GrammaticalStructureFactory grammaticalStructureFactory()
grammaticalStructureFactory
in interface TreebankLanguagePack
public GrammaticalStructureFactory grammaticalStructureFactory(java.util.function.Predicate<String> puncFilt)
grammaticalStructureFactory
in interface TreebankLanguagePack
puncFilt
- A filter which should reject punctuation words (as Strings)public GrammaticalStructureFactory grammaticalStructureFactory(java.util.function.Predicate<String> puncFilt, HeadFinder typedDependencyHeadFinder)
grammaticalStructureFactory
in interface TreebankLanguagePack
puncFilt
- A filter which should reject punctuation words (as Strings)typedDependencyHeadFinder
- A HeadFinder which finds heads for typed dependenciespublic boolean supportsGrammaticalStructures()
TreebankLanguagePack
supportsGrammaticalStructures
in interface TreebankLanguagePack
public char getGfCharacter()
public void setGfCharacter(char gfCharacter)
TreebankLanguagePack
setGfCharacter
in interface TreebankLanguagePack
gfCharacter
- Sets the character in label names that sets of
grammatical function marking (from the phrase label).public TreeReaderFactory treeReaderFactory()
treeReaderFactory
in interface TreebankLanguagePack
public TokenizerFactory<Tree> treeTokenizerFactory()
treeTokenizerFactory
in interface TreebankLanguagePack
public MorphoFeatureSpecification morphFeatureSpec()
morphFeatureSpec
in interface TreebankLanguagePack