|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectedu.stanford.nlp.trees.AbstractTreebankLanguagePack
edu.stanford.nlp.trees.international.negra.NegraPennLanguagePack
public class NegraPennLanguagePack
Language pack for Negra and Tiger treebanks after conversion to PTB format.
| Field Summary |
|---|
| Fields inherited from class edu.stanford.nlp.trees.AbstractTreebankLanguagePack |
|---|
DEFAULT_ENCODING, DEFAULT_GF_CHAR, gfCharacter |
| Constructor Summary | |
|---|---|
NegraPennLanguagePack()
Gives a handle to the TreebankLanguagePack |
|
NegraPennLanguagePack(boolean leaveGF,
char gfChar)
Make a new language pack with grammatical functions used based on the value of leaveGF and marked with the character gfChar. |
|
| Method Summary | |
|---|---|
java.lang.String |
basicCategory(java.lang.String category)
Returns the basic syntactic category of a String. |
java.lang.String[] |
evalBIgnoredPunctuationTags()
Returns a String array of punctuation tags that EVALB-style evaluation should ignore for this treebank/language. |
java.lang.String |
getEncoding()
Return the input Charset encoding for the Treebank. |
TokenizerFactory<Word> |
getTokenizerFactory()
Return a tokenizer which might be suitable for tokenizing text that will be used with this Treebank/Language pair, without tokenizing carriage returns (i.e., treating them as white space). |
HeadFinder |
headFinder()
The HeadFinder to use for your treebank. |
boolean |
isLeaveGF()
|
char[] |
labelAnnotationIntroducingCharacters()
Return an array of characters at which a String should be truncated to give the basic syntactic category of a label. |
java.lang.String[] |
punctuationTags()
Returns a String array of punctuation tags for this treebank/language. |
java.lang.String[] |
punctuationWords()
Returns a String array of punctuation words for this treebank/language. |
java.lang.String[] |
sentenceFinalPunctuationTags()
Returns a String array of sentence final punctuation tags for this treebank/language. |
java.lang.String[] |
sentenceFinalPunctuationWords()
Returns a String array of sentence final punctuation words for this treebank/language. |
void |
setLeaveGF(boolean leaveGF)
|
java.lang.String[] |
startSymbols()
Returns a String array of treebank start symbols. |
java.lang.String |
stripGF(java.lang.String category)
Returns the category for a String with everything following the gf character (which may be language specific) stripped. |
java.lang.String |
treebankFileExtension()
Returns the extension of treebank files for this treebank. |
TreeReaderFactory |
treeReaderFactory()
Returns a TreeReaderFactory suitable for general purpose use with this language/treebank. |
HeadFinder |
typedDependencyHeadFinder()
The HeadFinder to use when making typed dependencies. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public NegraPennLanguagePack()
public NegraPennLanguagePack(boolean leaveGF,
char gfChar)
| Method Detail |
|---|
public java.lang.String[] punctuationTags()
punctuationTags in interface TreebankLanguagePackpunctuationTags in class AbstractTreebankLanguagePackpublic java.lang.String[] punctuationWords()
punctuationWords in interface TreebankLanguagePackpunctuationWords in class AbstractTreebankLanguagePackpublic java.lang.String[] sentenceFinalPunctuationTags()
sentenceFinalPunctuationTags in interface TreebankLanguagePacksentenceFinalPunctuationTags in class AbstractTreebankLanguagePackpublic java.lang.String[] sentenceFinalPunctuationWords()
public java.lang.String basicCategory(java.lang.String category)
AbstractTreebankLanguagePacklabelAnnotationIntroducingCharacters().
However, there is also special case stuff to deal with
labelAnnotationIntroducingCharacters in category labels:
(i) if the first char is in this set, it's never truncated
(e.g., '-' or '=' as a token), and (ii) if it starts with
one of this set, a second instance of the same item from this set is
also excluded (to deal with '-LLB-', '-RCB-', etc.).
basicCategory in interface TreebankLanguagePackbasicCategory in class AbstractTreebankLanguagePackcategory - The whole String name of the label
public java.lang.String stripGF(java.lang.String category)
TreebankLanguagePack
stripGF in interface TreebankLanguagePackstripGF in class AbstractTreebankLanguagePackcategory - The String name of the label (may previously have had basic category called on it)
public java.lang.String[] evalBIgnoredPunctuationTags()
evalBIgnoredPunctuationTags in interface TreebankLanguagePackevalBIgnoredPunctuationTags in class AbstractTreebankLanguagePackpublic char[] labelAnnotationIntroducingCharacters()
labelAnnotationIntroducingCharacters in interface TreebankLanguagePacklabelAnnotationIntroducingCharacters in class AbstractTreebankLanguagePackpublic java.lang.String[] startSymbols()
startSymbols in interface TreebankLanguagePackstartSymbols in class AbstractTreebankLanguagePackpublic java.lang.String getEncoding()
Charset class.
getEncoding in interface TreebankLanguagePackgetEncoding in class AbstractTreebankLanguagePackpublic java.lang.String treebankFileExtension()
public boolean isLeaveGF()
public void setLeaveGF(boolean leaveGF)
public TreeReaderFactory treeReaderFactory()
AbstractTreebankLanguagePack
treeReaderFactory in interface TreebankLanguagePacktreeReaderFactory in class AbstractTreebankLanguagePackpublic HeadFinder headFinder()
public HeadFinder typedDependencyHeadFinder()
public TokenizerFactory<Word> getTokenizerFactory()
WhitespaceTokenizer,
but people didn't much like that.
So now we provide PTBTokenizer. It's not customized to German, but
will nevertheless do better than WhitespaceTokenizer at tokenizing German!
getTokenizerFactory in interface TreebankLanguagePackgetTokenizerFactory in class AbstractTreebankLanguagePack
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||