NegraPennLanguagePack (Stanford CoreNLP API)

java.lang.Object
- edu.stanford.nlp.trees.AbstractTreebankLanguagePack
- - edu.stanford.nlp.trees.international.negra.NegraPennLanguagePack

All Implemented Interfaces:

TreebankLanguagePack, Serializable
```
public class NegraPennLanguagePack
extends AbstractTreebankLanguagePack
```
Language pack for Negra and Tiger treebanks after conversion to PTB format.

Author:

Roger Levy, Spence Green

See Also:

Serialized Form

Field Summary
- Fields inherited from class edu.stanford.nlp.trees.AbstractTreebankLanguagePack
  DEFAULT_ENCODING, DEFAULT_GF_CHAR, gfCharacter

Constructor Summary

Constructors
Constructor and Description
`NegraPennLanguagePack()` Gives a handle to the TreebankLanguagePack
`NegraPennLanguagePack(boolean leaveGF)` Gives a handle to the TreebankLanguagePack
`NegraPennLanguagePack(boolean leaveGF, char gfChar)` Make a new language pack with grammatical functions used based on the value of leaveGF and marked with the character gfChar.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`String`	`basicCategory(String category)` Returns the basic syntactic category of a String.
`String[]`	`evalBIgnoredPunctuationTags()` Returns a String array of punctuation tags that EVALB-style evaluation should ignore for this treebank/language.
`String`	`getEncoding()` Return the input Charset encoding for the Treebank.
`TokenizerFactory<Word>`	`getTokenizerFactory()` Return a tokenizer which might be suitable for tokenizing text that will be used with this Treebank/Language pair, without tokenizing carriage returns (i.e., treating them as white space).
`HeadFinder`	`headFinder()` The HeadFinder to use for your treebank.
`boolean`	`isLeaveGF()`
`char[]`	`labelAnnotationIntroducingCharacters()` Return an array of characters at which a String should be truncated to give the basic syntactic category of a label.
`String[]`	`punctuationTags()` Returns a String array of punctuation tags for this treebank/language.
`String[]`	`punctuationWords()` Returns a String array of punctuation words for this treebank/language.
`String[]`	`sentenceFinalPunctuationTags()` Returns a String array of sentence final punctuation tags for this treebank/language.
`String[]`	`sentenceFinalPunctuationWords()` Returns a String array of sentence final punctuation words for this treebank/language.
`void`	`setLeaveGF(boolean leaveGF)`
`String[]`	`startSymbols()` Returns a String array of treebank start symbols.
`String`	`stripGF(String category)` Returns the category for a String with everything following the gf character (which may be language specific) stripped.
`String`	`treebankFileExtension()` Returns the extension of treebank files for this treebank.
`TreeReaderFactory`	`treeReaderFactory()` Returns a TreeReaderFactory suitable for general purpose use with this language/treebank.
`HeadFinder`	`typedDependencyHeadFinder()` The HeadFinder to use when making typed dependencies.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - NegraPennLanguagePack
```
public NegraPennLanguagePack()
```
    Gives a handle to the TreebankLanguagePack
  - NegraPennLanguagePack
```
public NegraPennLanguagePack(boolean leaveGF)
```
    Gives a handle to the TreebankLanguagePack
  - NegraPennLanguagePack
```
public NegraPennLanguagePack(boolean leaveGF,
                             char gfChar)
```
    Make a new language pack with grammatical functions used based on the value of leaveGF and marked with the character gfChar. gfChar should *not* be an annotation introducing character.
- Method Detail
  - punctuationTags
```
public String[] punctuationTags()
```
    Returns a String array of punctuation tags for this treebank/language.
    
    Specified by:
    
    punctuationTags in interface TreebankLanguagePack
    
    Specified by:
    
    punctuationTags in class AbstractTreebankLanguagePack
    
    Returns:
    
    The punctuation tags
  - punctuationWords
```
public String[] punctuationWords()
```
    Returns a String array of punctuation words for this treebank/language.
    
    Specified by:
    
    punctuationWords in interface TreebankLanguagePack
    
    Specified by:
    
    punctuationWords in class AbstractTreebankLanguagePack
    
    Returns:
    
    The punctuation words
  - sentenceFinalPunctuationTags
```
public String[] sentenceFinalPunctuationTags()
```
    Returns a String array of sentence final punctuation tags for this treebank/language.
    
    Specified by:
    
    sentenceFinalPunctuationTags in interface TreebankLanguagePack
    
    Specified by:
    
    sentenceFinalPunctuationTags in class AbstractTreebankLanguagePack
    
    Returns:
    
    The sentence final punctuation tags
  - sentenceFinalPunctuationWords
```
public String[] sentenceFinalPunctuationWords()
```
    Returns a String array of sentence final punctuation words for this treebank/language.
    
    Returns:
    
    The sentence final punctuation tags
  - basicCategory
```
public String basicCategory(String category)
```
    Description copied from class: AbstractTreebankLanguagePack
    
    Returns the basic syntactic category of a String. This implementation basically truncates stuff after an occurrence of one of the labelAnnotationIntroducingCharacters(). However, there is also special case stuff to deal with labelAnnotationIntroducingCharacters in category labels: (i) if the first char is in this set, it's never truncated (e.g., '-' or '=' as a token), and (ii) if it starts with one of this set, a second instance of the same item from this set is also excluded (to deal with '-LLB-', '-RCB-', etc.).
    
    Specified by:
    
    basicCategory in interface TreebankLanguagePack
    
    Overrides:
    
    basicCategory in class AbstractTreebankLanguagePack
    
    Parameters:
    
    category - The whole String name of the label
    
    Returns:
    
    The basic category of the String
  - stripGF
```
public String stripGF(String category)
```
    Description copied from interface: TreebankLanguagePack
    
    Returns the category for a String with everything following the gf character (which may be language specific) stripped.
    
    Specified by:
    
    stripGF in interface TreebankLanguagePack
    
    Overrides:
    
    stripGF in class AbstractTreebankLanguagePack
    
    Parameters:
    
    category - The String name of the label (may previously have had basic category called on it)
    
    Returns:
    
    The String stripped of grammatical functions
  - evalBIgnoredPunctuationTags
```
public String[] evalBIgnoredPunctuationTags()
```
    Returns a String array of punctuation tags that EVALB-style evaluation should ignore for this treebank/language. Traditionally, EVALB has ignored a subset of the total set of punctuation tags in the English Penn Treebank (quotes and period, comma, colon, etc., but not brackets)
    
    Specified by:
    
    evalBIgnoredPunctuationTags in interface TreebankLanguagePack
    
    Overrides:
    
    evalBIgnoredPunctuationTags in class AbstractTreebankLanguagePack
    
    Returns:
    
    Whether this is a EVALB-ignored punctuation tag
  - labelAnnotationIntroducingCharacters
```
public char[] labelAnnotationIntroducingCharacters()
```
    Return an array of characters at which a String should be truncated to give the basic syntactic category of a label. The idea here is that Penn treebank style labels follow a syntactic category with various functional and crossreferencing information introduced by special characters (such as "NP-SBJ=1"). This would be truncated to "NP" by the array containing '-' and "=".
    
    Specified by:
    
    labelAnnotationIntroducingCharacters in interface TreebankLanguagePack
    
    Overrides:
    
    labelAnnotationIntroducingCharacters in class AbstractTreebankLanguagePack
    
    Returns:
    
    An array of characters that set off label name suffixes
  - startSymbols
```
public String[] startSymbols()
```
    Returns a String array of treebank start symbols.
    
    Specified by:
    
    startSymbols in interface TreebankLanguagePack
    
    Specified by:
    
    startSymbols in class AbstractTreebankLanguagePack
    
    Returns:
    
    The start symbols
  - getEncoding
```
public String getEncoding()
```
    Return the input Charset encoding for the Treebank. See documentation for the Charset class.
    
    Specified by:
    
    getEncoding in interface TreebankLanguagePack
    
    Overrides:
    
    getEncoding in class AbstractTreebankLanguagePack
    
    Returns:
    
    Name of Charset
  - treebankFileExtension
```
public String treebankFileExtension()
```
    Returns the extension of treebank files for this treebank. This is "mrg".
    
    Returns:
    
    the extension on files for this treebank
  - isLeaveGF
```
public boolean isLeaveGF()
```
  - setLeaveGF
```
public void setLeaveGF(boolean leaveGF)
```
  - treeReaderFactory
```
public TreeReaderFactory treeReaderFactory()
```
    Description copied from class: AbstractTreebankLanguagePack
    
    Returns a TreeReaderFactory suitable for general purpose use with this language/treebank.
    
    Specified by:
    
    treeReaderFactory in interface TreebankLanguagePack
    
    Overrides:
    
    treeReaderFactory in class AbstractTreebankLanguagePack
    
    Returns:
    
    A TreeReaderFactory suitable for general purpose use with this language/treebank.
  - headFinder
```
public HeadFinder headFinder()
```
    The HeadFinder to use for your treebank.
    
    Returns:
    
    A suitable HeadFinder
  - typedDependencyHeadFinder
```
public HeadFinder typedDependencyHeadFinder()
```
    The HeadFinder to use when making typed dependencies.
    
    Returns:
    
    A suitable HeadFinder
  - getTokenizerFactory
```
public TokenizerFactory<Word> getTokenizerFactory()
```
    Return a tokenizer which might be suitable for tokenizing text that will be used with this Treebank/Language pair, without tokenizing carriage returns (i.e., treating them as white space). For German (Negra) we used to only provide a WhitespaceTokenizer, but people didn't much like that. So now we provide PTBTokenizer. It's not customized to German, but will nevertheless do better than WhitespaceTokenizer at tokenizing German!
    
    Specified by:
    
    getTokenizerFactory in interface TreebankLanguagePack
    
    Overrides:
    
    getTokenizerFactory in class AbstractTreebankLanguagePack
    
    Returns:
    
    A tokenizer

Class NegraPennLanguagePack

Field Summary

Fields inherited from class edu.stanford.nlp.trees.AbstractTreebankLanguagePack

Constructor Summary

Method Summary

Methods inherited from class edu.stanford.nlp.trees.AbstractTreebankLanguagePack

Methods inherited from class java.lang.Object

Constructor Detail

NegraPennLanguagePack

NegraPennLanguagePack

NegraPennLanguagePack

Method Detail

punctuationTags

punctuationWords

sentenceFinalPunctuationTags

sentenceFinalPunctuationWords

basicCategory

stripGF

evalBIgnoredPunctuationTags

labelAnnotationIntroducingCharacters

startSymbols

getEncoding

treebankFileExtension

isLeaveGF

setLeaveGF

treeReaderFactory

headFinder

typedDependencyHeadFinder

getTokenizerFactory