public interface TreebankLanguagePack
extends java.io.Serializable
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
DEFAULT_ENCODING
Use this as the default encoding for Readers and Writers of
Treebank data.
|
Modifier and Type | Method and Description |
---|---|
java.lang.String |
basicCategory(java.lang.String category)
Returns the basic syntactic category of a String by truncating
stuff after a (non-word-initial) occurrence of one of the
labelAnnotationIntroducingCharacters() . |
java.lang.String |
categoryAndFunction(java.lang.String category)
Returns the syntactic category and 'function' of a String.
|
java.util.function.Predicate<java.lang.String> |
evalBIgnoredPunctuationTagAcceptFilter()
Returns a filter that accepts a String that is a punctuation
tag that should be ignored by EVALB-style evaluation,
and rejects everything else.
|
java.util.function.Predicate<java.lang.String> |
evalBIgnoredPunctuationTagRejectFilter()
Returns a filter that accepts everything except a String that is a
punctuation tag that should be ignored by EVALB-style evaluation.
|
java.lang.String[] |
evalBIgnoredPunctuationTags()
Returns a String array of punctuation tags that EVALB-style evaluation
should ignore for this treebank/language.
|
boolean |
generateOriginalDependencies()
Used for languages where an original Stanford Dependency
converter and a Universal Dependency converter exists.
|
java.util.function.Function<java.lang.String,java.lang.String> |
getBasicCategoryFunction()
Returns a
Function object that maps Strings to Strings according
to this TreebankLanguagePack's basicCategory method. |
java.util.function.Function<java.lang.String,java.lang.String> |
getCategoryAndFunctionFunction()
Returns a
Function object that maps Strings to Strings according
to this TreebankLanguagePack's categoryAndFunction method. |
java.lang.String |
getEncoding()
Return the charset encoding of the Treebank.
|
TokenizerFactory<? extends HasWord> |
getTokenizerFactory()
Return a tokenizer factory which might be suitable for tokenizing text
that will be used with this Treebank/Language pair.
|
GrammaticalStructureFactory |
grammaticalStructureFactory()
Return a GrammaticalStructureFactory suitable for this language/treebank.
|
GrammaticalStructureFactory |
grammaticalStructureFactory(java.util.function.Predicate<java.lang.String> puncFilter)
Return a GrammaticalStructureFactory suitable for this language/treebank.
|
GrammaticalStructureFactory |
grammaticalStructureFactory(java.util.function.Predicate<java.lang.String> puncFilter,
HeadFinder typedDependencyHF)
Return a GrammaticalStructureFactory suitable for this language/treebank.
|
HeadFinder |
headFinder()
The HeadFinder to use for your treebank.
|
boolean |
isEvalBIgnoredPunctuationTag(java.lang.String str)
Accepts a String that is a punctuation
tag that should be ignored by EVALB-style evaluation,
and rejects everything else.
|
boolean |
isLabelAnnotationIntroducingCharacter(char ch)
Say whether this character is an annotation introducing
character.
|
boolean |
isPunctuationTag(java.lang.String str)
Accepts a String that is a punctuation
tag name, and rejects everything else.
|
boolean |
isPunctuationWord(java.lang.String str)
Accepts a String that is a punctuation
word, and rejects everything else.
|
boolean |
isSentenceFinalPunctuationTag(java.lang.String str)
Accepts a String that is a sentence end
punctuation tag, and rejects everything else.
|
boolean |
isStartSymbol(java.lang.String str)
Accepts a String that is a start symbol of the treebank.
|
char[] |
labelAnnotationIntroducingCharacters()
Return an array of characters at which a String should be
truncated to give the basic syntactic category of a label.
|
MorphoFeatureSpecification |
morphFeatureSpec()
The morphological feature specification for the language.
|
java.util.function.Predicate<java.lang.String> |
punctuationTagAcceptFilter()
Return a filter that accepts a String that is a punctuation
tag name, and rejects everything else.
|
java.util.function.Predicate<java.lang.String> |
punctuationTagRejectFilter()
Return a filter that rejects a String that is a punctuation
tag name, and accepts everything else.
|
java.lang.String[] |
punctuationTags()
Returns a String array of punctuation tags for this treebank/language.
|
java.util.function.Predicate<java.lang.String> |
punctuationWordAcceptFilter()
Returns a filter that accepts a String that is a punctuation
word, and rejects everything else.
|
java.util.function.Predicate<java.lang.String> |
punctuationWordRejectFilter()
Returns a filter that accepts a String that is not a punctuation
word, and rejects punctuation.
|
java.lang.String[] |
punctuationWords()
Returns a String array of punctuation words for this treebank/language.
|
java.util.function.Predicate<java.lang.String> |
sentenceFinalPunctuationTagAcceptFilter()
Returns a filter that accepts a String that is a sentence end
punctuation tag, and rejects everything else.
|
java.lang.String[] |
sentenceFinalPunctuationTags()
Returns a String array of sentence final punctuation tags for this
treebank/language.
|
java.lang.String[] |
sentenceFinalPunctuationWords()
Returns a String array of sentence final punctuation words for
this treebank/language.
|
void |
setGenerateOriginalDependencies(boolean generateOriginalDependencies)
Used for languages where an original Stanford Dependency
converter and a Universal Dependency converter exists.
|
void |
setGfCharacter(char gfCharacter)
Sets the grammatical function indicating character to gfCharacter.
|
java.lang.String |
startSymbol()
Returns a String which is the first (perhaps unique) start symbol
of the treebank, or null if none is defined.
|
java.util.function.Predicate<java.lang.String> |
startSymbolAcceptFilter()
Return a filter that accepts a String that is a start symbol
of the treebank, and rejects everything else.
|
java.lang.String[] |
startSymbols()
Returns a String array of treebank start symbols.
|
java.lang.String |
stripGF(java.lang.String category)
Returns the category for a String with everything following
the gf character (which may be language specific) stripped.
|
boolean |
supportsGrammaticalStructures()
Whether or not we have typed dependencies for this language.
|
java.lang.String |
treebankFileExtension()
Returns the extension of treebank files for this treebank.
|
TreeReaderFactory |
treeReaderFactory()
Returns a TreeReaderFactory suitable for general purpose use
with this language/treebank.
|
TokenizerFactory<Tree> |
treeTokenizerFactory()
Return a TokenizerFactory for Trees of this language/treebank.
|
HeadFinder |
typedDependencyHeadFinder()
The HeadFinder to use when making typed dependencies.
|
static final java.lang.String DEFAULT_ENCODING
boolean isPunctuationTag(java.lang.String str)
str
- The string to checkboolean isPunctuationWord(java.lang.String str)
str
- The string to checkboolean isSentenceFinalPunctuationTag(java.lang.String str)
str
- The string to checkboolean isEvalBIgnoredPunctuationTag(java.lang.String str)
str
- The string to checkjava.util.function.Predicate<java.lang.String> punctuationTagAcceptFilter()
java.util.function.Predicate<java.lang.String> punctuationTagRejectFilter()
java.util.function.Predicate<java.lang.String> punctuationWordAcceptFilter()
java.util.function.Predicate<java.lang.String> punctuationWordRejectFilter()
java.util.function.Predicate<java.lang.String> sentenceFinalPunctuationTagAcceptFilter()
java.util.function.Predicate<java.lang.String> evalBIgnoredPunctuationTagAcceptFilter()
java.util.function.Predicate<java.lang.String> evalBIgnoredPunctuationTagRejectFilter()
java.lang.String[] punctuationTags()
java.lang.String[] punctuationWords()
java.lang.String[] sentenceFinalPunctuationTags()
java.lang.String[] sentenceFinalPunctuationWords()
java.lang.String[] evalBIgnoredPunctuationTags()
GrammaticalStructureFactory grammaticalStructureFactory()
GrammaticalStructureFactory grammaticalStructureFactory(java.util.function.Predicate<java.lang.String> puncFilter)
puncFilter
- A filter which should reject punctuation words (as Strings)GrammaticalStructureFactory grammaticalStructureFactory(java.util.function.Predicate<java.lang.String> puncFilter, HeadFinder typedDependencyHF)
puncFilter
- A filter which should reject punctuation words (as Strings)typedDependencyHF
- A HeadFinder which finds heads for typed dependenciesboolean supportsGrammaticalStructures()
java.lang.String getEncoding()
Charset
class.TokenizerFactory<? extends HasWord> getTokenizerFactory()
char[] labelAnnotationIntroducingCharacters()
boolean isLabelAnnotationIntroducingCharacter(char ch)
ch
- A charjava.lang.String basicCategory(java.lang.String category)
labelAnnotationIntroducingCharacters()
. This
function should work on phrasal category and POS tag labels,
but needn't (and couldn't be expected to) work on arbitrary
Word strings.category
- The whole String name of the labeljava.lang.String stripGF(java.lang.String category)
category
- The String name of the label (may previously have had basic category called on it)java.util.function.Function<java.lang.String,java.lang.String> getBasicCategoryFunction()
Function
object that maps Strings to Strings according
to this TreebankLanguagePack's basicCategory method.java.lang.String categoryAndFunction(java.lang.String category)
category-function
.category
- The whole String name of the labeljava.util.function.Function<java.lang.String,java.lang.String> getCategoryAndFunctionFunction()
Function
object that maps Strings to Strings according
to this TreebankLanguagePack's categoryAndFunction method.boolean isStartSymbol(java.lang.String str)
str
- The str to testjava.util.function.Predicate<java.lang.String> startSymbolAcceptFilter()
java.lang.String[] startSymbols()
java.lang.String startSymbol()
java.lang.String treebankFileExtension()
void setGfCharacter(char gfCharacter)
gfCharacter
- Sets the character in label names that sets of
grammatical function marking (from the phrase label).TreeReaderFactory treeReaderFactory()
TokenizerFactory<Tree> treeTokenizerFactory()
HeadFinder headFinder()
HeadFinder typedDependencyHeadFinder()
MorphoFeatureSpecification morphFeatureSpec()
void setGenerateOriginalDependencies(boolean generateOriginalDependencies)
boolean generateOriginalDependencies()