public abstract class AbstractTreebankParserParams extends Object implements TreebankLangParserParams
TreebankLangParserParams
implementing class.
With some extending classes you'll want to have access to special
attributes of the corresponding TreebankLanguagePack while taking
advantage of this class's code for making the TreebankLanguagePack
accessible. A good way to do this is to pass a new instance of the
appropriate TreebankLanguagePack into this class's constructor,
then get it back later on by casting a call to
treebankLanguagePack(). See ChineseTreebankParserParams for an
example.Modifier and Type | Class and Description |
---|---|
protected static class |
AbstractTreebankParserParams.AnnotatePunctuationFunction
Annotation function for mapping punctuation to PTB-style equivalence classes.
|
protected class |
AbstractTreebankParserParams.RemoveGFSubcategoryStripper
The job of this class is to remove subcategorizations from
tag and category nodes, so as to put a tree in a suitable
state for evaluation.
|
protected class |
AbstractTreebankParserParams.SubcategoryStripper
The job of this class is to remove subcategorizations from
tag and category nodes, so as to put a tree in a suitable
state for evaluation.
|
Modifier and Type | Field and Description |
---|---|
protected boolean |
evalGF
If true, then evaluation is over grammatical functions as well as the labels
If false, then grammatical functions are stripped for evaluation.
|
protected String |
inputEncoding |
protected String |
outputEncoding |
protected TreebankLanguagePack |
tlp |
Modifier | Constructor and Description |
---|---|
protected |
AbstractTreebankParserParams(TreebankLanguagePack tlp)
Stores the passed-in TreebankLanguagePack and sets up charset encodings.
|
Modifier and Type | Method and Description |
---|---|
abstract TreeTransformer |
collinizer()
the tree transformer used to produce trees for evaluation.
|
abstract TreeTransformer |
collinizerEvalb()
the tree transformer used to produce trees for evaluation.
|
String[] |
defaultCoreNLPFlags()
When run inside StanfordCoreNLP, which flags should be used by default
|
Extractor<DependencyGrammar> |
dependencyGrammarExtractor(Options op,
Index<String> wordIndex,
Index<String> tagIndex) |
static <E> Collection<E> |
dependencyObjectify(Tree t,
HeadFinder hf,
TreeTransformer collinizer,
DependencyTyper<E> typer)
Returns the set of dependencies in a tree, according to some
DependencyTyper . |
abstract DiskTreebank |
diskTreebank()
returns a DiskTreebank appropriate to the treebank source
|
abstract void |
display()
display language-specific settings
|
GrammaticalStructure |
getGrammaticalStructure(Tree t,
java.util.function.Predicate<String> filter,
HeadFinder hf)
Build a GrammaticalStructure from a Tree.
|
String |
getInputEncoding()
Returns the input encoding being used.
|
String |
getOutputEncoding()
Returns the output encoding being used.
|
abstract HeadFinder |
headFinder()
The HeadFinder to use for your treebank.
|
boolean |
isEvalGF() |
Lexicon |
lex(Options op,
Index<String> wordIndex,
Index<String> tagIndex)
Vends a
Lexicon object suitable to the particular language/treebank combination of interest. |
abstract MemoryTreebank |
memoryTreebank()
returns a MemoryTreebank appropriate to the treebank source
|
double[] |
MLEDependencyGrammarSmoothingParams()
Give the parameters for smoothing in the MLEDependencyGrammar.
|
static Collection<Constituent> |
parsevalObjectify(Tree t,
TreeTransformer collinizer)
Takes a Tree and a collinizer and returns a Collection of labeled
Constituent s for PARSEVAL. |
static Collection<Constituent> |
parsevalObjectify(Tree t,
TreeTransformer collinizer,
boolean labelConstituents)
Takes a Tree and a collinizer and returns a Collection of
Constituent s for
PARSEVAL evaluation. |
AbstractEval |
ppAttachmentEval()
Returns a language specific object for evaluating PP attachment
|
Label |
processHeadWord(Label headWord)
Allows language specific processing (e.g., stemming) of head words.
|
PrintWriter |
pw()
The PrintWriter used to print output.
|
PrintWriter |
pw(OutputStream o)
The PrintWriter used to print output.
|
List<GrammaticalStructure> |
readGrammaticalStructureFromFile(String filename)
Returns a function which reads the given filename and turns its
content in a list of GrammaticalStructures.
|
void |
setEvalGF(boolean evalGF) |
void |
setEvaluateGrammaticalFunctions(boolean evalGFs)
Sets whether to consider grammatical functions in evaluation
|
void |
setInputEncoding(String encoding)
Sets the input encoding.
|
int |
setOptionFlag(String[] args,
int i)
Set language-specific options according to flags.
|
void |
setOutputEncoding(String encoding)
Sets the output encoding.
|
abstract String[] |
sisterSplitters()
Returns the splitting strings used for selective splits.
|
TreeTransformer |
subcategoryStripper()
Returns a TreeTransformer appropriate to the Treebank which
can be used to remove functional tags (such as "-TMP") from
categories.
|
boolean |
supportsBasicDependencies()
By default, parsers are assumed to not support dependencies.
|
MemoryTreebank |
testMemoryTreebank()
You can often return the same thing for testMemoryTreebank as
for memoryTreebank
|
abstract Tree |
transformTree(Tree t,
Tree root)
This method does language-specific tree transformations such
as annotating particular nodes with language-relevant features.
|
Treebank |
treebank()
Implemented as required by TreebankFactory.
|
TreebankLanguagePack |
treebankLanguagePack()
Returns an appropriate treebankLanguagePack
|
TokenizerFactory<Tree> |
treeTokenizerFactory() |
static EquivalenceClasser<List<String>,String> |
typedDependencyClasser()
Returns an EquivalenceClasser that classes typed dependencies
by the syntactic categories of mother, head and daughter,
plus direction.
|
abstract HeadFinder |
typedDependencyHeadFinder()
The HeadFinder to use when extracting typed dependencies.
|
static Collection<List<String>> |
typedDependencyObjectify(Tree t,
HeadFinder hf,
TreeTransformer collinizer)
Returns a collection of word-word dependencies typed by mother, head, daughter node syntactic categories.
|
static Collection<List<String>> |
unorderedTypedDependencyObjectify(Tree t,
HeadFinder hf,
TreeTransformer collinizer)
Returns a collection of unordered (but directed!) typed word-word dependencies for the tree.
|
static Collection<List<String>> |
unorderedUntypedDependencyObjectify(Tree t,
HeadFinder hf,
TreeTransformer collinizer)
Returns a collection of unordered (but directed!) untyped word-word dependencies for the tree.
|
static Collection<List<String>> |
untypedDependencyObjectify(Tree t,
HeadFinder hf,
TreeTransformer collinizer)
Returns a collection of untyped word-word dependencies for the tree.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
defaultTestSentence, treeReaderFactory
protected boolean evalGF
protected String inputEncoding
protected String outputEncoding
protected TreebankLanguagePack tlp
protected AbstractTreebankParserParams(TreebankLanguagePack tlp)
tlp
- The treebank language pack to usepublic Label processHeadWord(Label headWord)
TreebankLangParserParams
processHeadWord
in interface TreebankLangParserParams
headWord
- An Label
that minimally implements the
HasWord
and HasTag
interfaces.Label
public void setEvaluateGrammaticalFunctions(boolean evalGFs)
setEvaluateGrammaticalFunctions
in interface TreebankLangParserParams
public void setInputEncoding(String encoding)
setInputEncoding
in interface TreebankLangParserParams
public void setOutputEncoding(String encoding)
setOutputEncoding
in interface TreebankLangParserParams
public String getOutputEncoding()
getOutputEncoding
in interface TreebankLangParserParams
public String getInputEncoding()
getInputEncoding
in interface TreebankLangParserParams
public AbstractEval ppAttachmentEval()
ppAttachmentEval
in interface TreebankLangParserParams
AbstractEval
public abstract MemoryTreebank memoryTreebank()
memoryTreebank
in interface TreebankLangParserParams
public abstract DiskTreebank diskTreebank()
diskTreebank
in interface TreebankLangParserParams
public MemoryTreebank testMemoryTreebank()
testMemoryTreebank
in interface TreebankLangParserParams
public Treebank treebank()
treebank
in interface TreebankLangParserParams
treebank
in interface TreebankFactory
public PrintWriter pw()
pw
in interface TreebankLangParserParams
public PrintWriter pw(OutputStream o)
pw
in interface TreebankLangParserParams
public TreebankLanguagePack treebankLanguagePack()
treebankLanguagePack
in interface TreebankLangParserParams
public abstract HeadFinder headFinder()
headFinder
in interface TreebankLangParserParams
public abstract HeadFinder typedDependencyHeadFinder()
typedDependencyHeadFinder
in interface TreebankLangParserParams
public Lexicon lex(Options op, Index<String> wordIndex, Index<String> tagIndex)
TreebankLangParserParams
Lexicon
object suitable to the particular language/treebank combination of interest.lex
in interface TreebankLangParserParams
op
- Options as to how the Lexicon behavespublic double[] MLEDependencyGrammarSmoothingParams()
MLEDependencyGrammarSmoothingParams
in interface TreebankLangParserParams
public static Collection<Constituent> parsevalObjectify(Tree t, TreeTransformer collinizer)
Constituent
s for PARSEVAL.t
- The tree to extract constituents fromcollinizer
- The TreeTransformer used to normalize the tree for
evaluationpublic static Collection<Constituent> parsevalObjectify(Tree t, TreeTransformer collinizer, boolean labelConstituents)
Constituent
s for
PARSEVAL evaluation. Some notes on this particular parseval:
labelConstituents
parameter
public static Collection<List<String>> untypedDependencyObjectify(Tree t, HeadFinder hf, TreeTransformer collinizer)
public static Collection<List<String>> unorderedUntypedDependencyObjectify(Tree t, HeadFinder hf, TreeTransformer collinizer)
public static Collection<List<String>> typedDependencyObjectify(Tree t, HeadFinder hf, TreeTransformer collinizer)
public static Collection<List<String>> unorderedTypedDependencyObjectify(Tree t, HeadFinder hf, TreeTransformer collinizer)
public static <E> Collection<E> dependencyObjectify(Tree t, HeadFinder hf, TreeTransformer collinizer, DependencyTyper<E> typer)
DependencyTyper
.public static EquivalenceClasser<List<String>,String> typedDependencyClasser()
public abstract TreeTransformer collinizer()
collinizer
in interface TreebankLangParserParams
public abstract TreeTransformer collinizerEvalb()
collinizerEvalb
in interface TreebankLangParserParams
public abstract String[] sisterSplitters()
sisterSplitters
in interface TreebankLangParserParams
public TreeTransformer subcategoryStripper()
subcategoryStripper
in interface TreebankLangParserParams
public abstract Tree transformTree(Tree t, Tree root)
t
. It changes both
labels and the tree shape.transformTree
in interface TreebankLangParserParams
t
- The input tree (with non-language specific annotation already
done, so you need to strip back to basic categories)root
- The root of the current tree (can be null for words)public abstract void display()
display
in interface TreebankLangParserParams
public int setOptionFlag(String[] args, int i)
Generic options are processed separately by
Options.setOption(String[],int)
,
and implementations of this method do not have to worry about them.
The Options class handles routing options.
TreebankParserParams that extend this class should call super when
overriding this method.
setOptionFlag
in interface TreebankLangParserParams
args
- Array of command line argumentsi
- Index in command line arguments to try to process as an optionpublic TokenizerFactory<Tree> treeTokenizerFactory()
treeTokenizerFactory
in interface TreebankLangParserParams
public Extractor<DependencyGrammar> dependencyGrammarExtractor(Options op, Index<String> wordIndex, Index<String> tagIndex)
dependencyGrammarExtractor
in interface TreebankLangParserParams
public boolean isEvalGF()
public void setEvalGF(boolean evalGF)
public List<GrammaticalStructure> readGrammaticalStructureFromFile(String filename)
TreebankLangParserParams
readGrammaticalStructureFromFile
in interface TreebankLangParserParams
public GrammaticalStructure getGrammaticalStructure(Tree t, java.util.function.Predicate<String> filter, HeadFinder hf)
TreebankLangParserParams
getGrammaticalStructure
in interface TreebankLangParserParams
public boolean supportsBasicDependencies()
supportsBasicDependencies
in interface TreebankLangParserParams
public String[] defaultCoreNLPFlags()
TreebankLangParserParams
defaultCoreNLPFlags
in interface TreebankLangParserParams