See: Description
Interface | Description |
---|---|
ConstituentFactory |
A
ConstituentFactory is a factory for creating objects
of class Constituent , or some descendent class. |
CopulaHeadFinder |
A mix-in interface for HeadFinders which support the
makesCopulaHead method, which says how the HeadFinder in question
handles "to be" verbs.
|
Dependency<G extends Label,D extends Label,N> |
An individual dependency between a governor and a dependent.
|
DependencyFactory |
A factory for dependencies of a certain type.
|
DependencyPrinter | |
DependencyReader | |
DependencyTyper<T> |
A generified interface for making some kind of dependency object
between a head and dependent.
|
GrammaticalStructureFactory |
A general factory for
GrammaticalStructure objects. |
GrammaticalStructureFromDependenciesFactory |
An interface for a factory that builds a GrammaticalStructure from
a list of TypedDependencies and a TreeGraphNode.
|
HasParent |
Only to be implemented by Tree subclasses that actualy keep their
parent pointers.
|
HeadFinder |
An interface for finding the "head" daughter of a phrase structure tree.
|
Labeled |
Interface for Objects which have a
Label . |
TreebankFactory |
An interface for treebank vendors.
|
TreebankLanguagePack |
This interface specifies language/treebank specific information for a
Treebank, which a parser or other treebank user might need to know.
|
TreebankTransformer | |
TreeFactory |
A
TreeFactory acts as a factory for creating objects of
class Tree , or some descendant class. |
TreeReader |
A
TreeReader adds functionality to another Reader
by reading in Trees, or some descendant class. |
TreeReaderFactory |
A
TreeReaderFactory is a factory for creating objects of
class TreeReader , or some descendant class. |
TreeTransformer |
This is a simple interface for a function that alters a
local
Tree . |
TreeVisitor |
This is a simple strategy-type interface for operations that are applied to
Tree . |
WordNetConnection |
Allows us to verify that a wordnet connection is available without compile
time errors if the package is not found.
|
Class | Description |
---|---|
AbstractCollinsHeadFinder |
A base class for a HeadFinder similar to the one described in
Michael Collins' 1999 thesis.
|
AbstractTreebankLanguagePack |
This provides an implementation of parts of the TreebankLanguagePack
API to reduce the load on fresh implementations.
|
BasicCategoryTreeTransformer |
Transforms trees by turning the labels into their basic categories
according to the
TreebankLanguagePack |
BobChrisTreeNormalizer |
Normalizes trees in the way used in Manning and Carpenter 1997.
|
BobChrisTreeNormalizer.AOverAFilter | |
BobChrisTreeNormalizer.EmptyFilter | |
CollinsDependency |
Extracts bilexical dependencies from Penn Treebank-style phrase structure trees
as described in (Collins, 1999) and the later Comp.
|
CollinsHeadFinder |
Implements the HeadFinder found in Michael Collins' 1999 thesis.
|
CollinsRelation |
A relation 4-tuple for the dependency representation of Collins (1999; 2003).
|
CollocationFinder |
Finds WordNet collocations in parse trees.
|
CompositeTreebank | |
CompositeTreeTransformer |
A TreeTransformer that applies component TreeTransformers in order.
|
Constituent |
A
Constituent object defines a generic edge in a graph. |
CoordinationTransformer |
Coordination transformer transforms a PennTreebank tree containing
a coordination in a flat structure in order to get the dependencies
right.
|
DateTreeTransformer |
Flattens the following two structures:
(NP (NP (NNP Month) (CD Day) ) (, ,) (NP (CD Year) )) becomes (NP (NNP Month) (CD Day) (, ,) (CD Year) ) (NP (NP (NNP Month) ) (NP (CD Year) )) becomes (NP (NNP Month) (CD Year)) |
DeepTree |
A tree combined with a map from subtree to SimpleMatrix vectors.
|
Dependencies |
Utilities for Dependency objects.
|
Dependencies.DependentPuncTagRejectFilter<G extends Label,D extends Label,N> | |
Dependencies.DependentPuncWordRejectFilter<G extends Label,D extends Label,N> | |
DependencyScoring |
Scoring of typed dependencies
|
DependencyScoring.Score | |
DependencyTreeTransformer |
Transforms an English structure parse tree in order to get the dependencies right:
-- put a ROOT node -- remove NONE nodes -- retain only NP-TMP, NP-ADV, UCP-TMP tags The UCP- tags will later be turned into NP- anyway (Note [cdm]: A lot of this overlaps other existing functionality in trees. |
DiskTreebank |
A
DiskTreebank is a Collection of
Tree s. |
EnglishGrammaticalRelations |
EnglishGrammaticalRelations is a
set of GrammaticalRelation objects for the English language. |
EnglishGrammaticalStructure |
A GrammaticalStructure for English.
|
EnglishGrammaticalStructure.FromDependenciesFactory | |
EnglishGrammaticalStructureFactory | |
EnglishPatterns |
This class contains some English String or Tregex regular expression
patterns.
|
FilteringTreebank |
This class wraps another Treebank, and will vend trees that passed
a Filter<Tree>.
|
FilteringTreeReader |
A
FilteringTreeReader filters the output of another TreeReader. |
FindTreebankTree |
This utility looks for a given sentence in a file or directory of
tree files.
|
GenerateTrees |
Generates trees based on simple grammars.
|
GrammaticalFunctionTreeNormalizer |
Tree normalizer for cleaning up labels and preserving the whole node label,
the grammatical function and category information from the label, or only
the category information.
|
GrammaticalRelation |
GrammaticalRelation is used to define a
standardized, hierarchical set of grammatical relations,
together with patterns for identifying them in
parse trees. |
GrammaticalStructure |
A
GrammaticalStructure stores dependency relations between
nodes in a tree. |
GrammaticalStructureConversionUtils |
Contains several utility methods to convert constituency trees to
dependency trees.
|
LabeledConstituent |
A
LabeledConstituent object represents a single bracketing in
a derivation, including start and end points and Label
information, but excluding probabilistic information. |
LabeledScoredConstituent |
A
LabeledScoredConstituent object defines an edge in a graph
with a label and a score. |
LabeledScoredConstituentFactory |
A
LabeledScoredConstituentFactory acts as a factory for
creating objects of class LabeledScoredConstituent . |
LabeledScoredTreeFactory |
A
LabeledScoredTreeFactory acts as a factory for creating
trees with labels and scores. |
LabeledScoredTreeNode |
A
LabeledScoredTreeNode represents a tree composed of a root
label, a score,
and an array of daughter parse trees. |
LabeledScoredTreeReaderFactory |
This class implements a
TreeReaderFactory that produces
labeled, scored array-based Trees, which have been cleaned up to
delete empties, etc. |
LeftHeadFinder |
HeadFinder that always returns the leftmost daughter as head.
|
LengthTreeFilter |
Only accept trees that are short enough (less than or equal to length).
|
MemoryTreebank |
A
MemoryTreebank object stores a corpus of examples with
given tree structures in memory (as a List ). |
ModCollinsHeadFinder |
Implements a variant on the HeadFinder found in Michael Collins' 1999
thesis.
|
NamedDependency |
An individual dependency between a head and a dependent.
|
NPTmpRetainingTreeNormalizer |
Same TreeNormalizer as BobChrisTreeNormalizer, but optionally provides
four extras.
|
NPTmpRetainingTreeNormalizer.NPTmpAdvRetainingTreeReaderFactory |
Implementation of TreeReaderFactory, mainly for convenience of
constructing by reflection.
|
NPTmpRetainingTreeNormalizer.NPTmpRetainingTreeReaderFactory |
Implementation of TreeReaderFactory, mainly for convenience of
constructing by reflection.
|
OrderedCombinationTreeNormalizer |
This class combines multiple tree normalizers.
|
OutputSubtrees |
Output a tree and all of its subtrees.
|
PennTreebankLanguagePack |
Specifies the treebank/language specific components needed for
parsing the English Penn Treebank.
|
PennTreebankTokenizer |
Builds a tokenizer for English PennTreebank (release 2) trees.
|
PennTreeReader |
This class implements the
TreeReader interface to read Penn Treebank-style
files. |
PennTreeReaderFactory |
Vends
PennTreeReader objects. |
ProcessDependencyConverterRequest | |
QPTreeTransformer |
Transforms an English structure parse tree in order to get the dependencies right:
Adds an extra structure in QP phrases:
(QP (RB well) (IN over) (CD 9)) becomes (QP (XS (RB well) (IN over)) (CD 9)) (QP (...) (CC ...) (...)) becomes (QP (NP ...) (CC ...) (NP ...)) |
RecursiveTreeTransformer |
A tool to recursively alter a tree in various ways.
|
RightHeadFinder |
HeadFinder that always returns the rightmost daughter as head.
|
SemanticHeadFinder |
Implements a 'semantic head' variant of the the English HeadFinder
found in Michael Collins' 1999 thesis.
|
SimpleConstituent |
A
SimpleConstituent object defines a generic edge in a graph. |
SimpleConstituentFactory |
A
ConstituentFactory acts as a factory for creating objects
of class Constituent , or some descendent class. |
SimpleTree |
A
SimpleTree is a minimal concrete implementation of an
unlabeled, unscored Tree . |
SimpleTreeFactory |
A
SimpleTreeFactory acts as a factory for creating objects
of class SimpleTree . |
Span |
A
Span is an optimized SimpleConstituent object. |
SplitTrainingSet |
Given a list of trees, splits the trees into three separate files.
|
StringLabeledScoredTreeReaderFactory |
This class implements a
TreeReaderFactory that produces
labeled, scored array-based Trees, which have been cleaned up to
delete empties, etc. |
SynchronizedTreeTransformer |
If you have a TreeTransformer which is not threadsafe, and you need
to call it from multiple threads, this will wrap it in a
synchronized manner.
|
TransformingTreebank |
This class wraps another Treebank, and will vend trees that have been through
a TreeTransformer.
|
Tree |
The abstract class
Tree is used to collect all of the
tree types, and acts as a generic extensible type. |
Treebank |
A
Treebank object provides access to a corpus of examples with
given tree structures. |
Treebanks |
This is just a main method and other static methods for
command-line manipulation, statistics, and testing of
Treebank objects.
|
TreebankTagUpdater |
Class for automatically applying tags to a treebank
|
TreeCoreAnnotations |
Set of common annotations for
CoreMap s
that require classes from the trees package. |
TreeCoreAnnotations.BinarizedTreeAnnotation |
The CoreMap key for getting the binarized version of the
syntactic parse tree of a sentence.
|
TreeCoreAnnotations.HeadTagLabelAnnotation |
The standard key for storing a head tag in the map as a pointer to
the head label.
|
TreeCoreAnnotations.HeadWordLabelAnnotation |
The standard key for storing a head word in the map as a pointer to
the head label.
|
TreeCoreAnnotations.KBestTreesAnnotation |
The standard key for storing a list of k-best parses.
|
TreeCoreAnnotations.TreeAnnotation |
The CoreMap key for getting the syntactic parse tree of a sentence.
|
TreeFilters |
A location for general implementations of Filter<Tree>.
|
TreeFilters.HasMatchingChild | |
TreeFunctions |
This is a utility class which vends tree transformers to translate
trees from one factory type to trees of another.
|
TreeGraphNode |
A
TreeGraphNode is simply a {@code Tree}
with some additional functionality. |
TreeGraphNodeFactory |
A
TreeGraphNodeFactory acts as a factory for creating
tree nodes of type TreeGraphNode . |
TreeLeafLabelTransformer |
Applies a Function to the labels in a tree.
|
TreeLemmatizer | |
TreeLengthComparator |
A
TreeLengthComparator orders trees by their yield sentence
lengths. |
TreeNormalizer |
A class for tree normalization.
|
TreePrint |
A class for customizing the print method(s) for a
edu.stanford.nlp.trees.Tree as the output of the
parser. |
Trees |
Various static utilities for the
Tree class. |
TreeToBracketProcessor | |
TreeTokenizerFactory |
Wrapper for TreeReaderFactory.
|
TypedDependency |
A
TypedDependency is a relation between two words in a
GrammaticalStructure . |
UniversalEnglishGrammaticalRelations |
UniversalEnglishGrammaticalRelations is a
set of GrammaticalRelation objects according to the Universal
Dependencies standard. |
UniversalEnglishGrammaticalStructure |
A GrammaticalStructure for Universal Dependencies English.
|
UniversalEnglishGrammaticalStructure.FromDependenciesFactory | |
UniversalEnglishGrammaticalStructureFactory | |
UniversalPOSMapper |
Helper class to perform a context-sensitive mapping of POS
tags in a tree to universal POS tags.
|
UniversalSemanticHeadFinder |
Implements a 'semantic head' variant of the the HeadFinder found
in Michael Collins' 1999 thesis.
|
UnnamedConcreteDependency |
An individual dependency between a head and a dependent.
|
UnnamedDependency |
An individual dependency between a head and a dependent.
|
WordCatConstituent |
A class storing information about a constituent in a character-based tree.
|
WordCatEqualityChecker |
An EqualityChecker for WordCatConstituents.
|
WordCatEquivalenceClasser |
An EquivalenceClasser for WordCatConstituents.
|
WordStemmer |
Stems the Words in a Tree using Morphology.
|
Enum | Description |
---|---|
CollinsRelation.Direction | |
GrammaticalStructure.Extras |
A specification for the types of extra edges to add to the dependency tree.
|
GrammaticalStructureConversionUtils.ConverterOptions |
Enum to identify the different TokenizerTypes.
|
A package for (NLP) trees, sentences, and similar things. This package provides several key abstractions (via abstract classes) and a number of further classes for related objects. Most of these classes use a Factory pattern to instantiate objects.
A Label
is something that can be the label of a Tree or a
Constituent. The simplest label is a StringLabel
.
A Word
or a TaggedWord
is a
Label
. They can be constructed with a
LabelFactory
. A Label
often implements
various interfaces, such as HasWord
.
A Constituent
object defines a generic edge in a graph. It
has a start and end, and usually a Label
. A
ConstituentFactory
builds a Constituent
.
A Tree
object provides generic facilities for manipulating
NLP trees. A TreeFactory
can build a Tree
.
A Treebank
provides an interface to a
collection of parsed sentences (normally found on disk as a corpus).
A TreeReader
reads trees from an InputStream
.
A TreeReaderFactory
builds a TreeReader
.
A TreeNormalizer
canonicalizes a Tree
on
input from a File
. A HeadFinder
finds the
head daughter of a Tree
. The TreeProcessor
interface is for general sequential processing of trees, and the
TreeTransformer
interface is for changing them.
A Sentence
is a subclass of an ArrayList
.
A Sentencebank
provides an interface to a large number of
sentences (normally found on disk as a corpus).
A SentenceReader
reads sentences from an
InputStream
. A SentenceReaderFactory
builds a SentenceReader
. A SentenceNormalizer
canonicalizes a Sentence
on input from a File
.
The SentenceProcessor
interface is for general sequential
processing of sentences.
There are also various subclasses of StreamTokenizer
. The class
PairFinder
should probably be removed to samples
.
Design notes: This package is the result of several iterations of trying to come up with a reusable and extendable set of tree classes. It may still be nonoptimal, but some thought went into it! At any rate, there are several things that it is important to understand to use the class effectively. One is that a Label has a primary value() which is always a String, and this is the only thing that matters for fundamental Label operations, such as checking equality. While anything else (or nothing) can be stored in a Label, all other Label content is regarded as purely decorative. All Label implementations should implement a labelFactory() method that returns a LabelFactory for the appropriate kind of Label. Since this depends on the exact class, this method should always be overwritten when a Label class is extended. The existing Label classes also provide a static factory() method which returns the same thing.
trees
packageHere is some fairly straightforward code for loading trees from a treebank and iterating over the trees contained therein. It builds a histogram of sentence lengths.
This example illustrates building a Treebank by hand, specifying a
custom
Dealing with the As well as the Treebank classes, there are corresponding Sentencebank
classes (though they are not quite so extensively developed.
This final example shows use of a Sentencebank. It also
illustrates the Visitor pattern for examining sentences in a
Sentencebank. This was actually the original visitation
pattern for Treebank and Sentencebank, but these days, it's in
general easier to use an Iterator. You can also get Sentences
from a Treebank, by taking the yield() or taggedYield() of
each Tree.
import java.util.Iterator;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.io.NumberRangesFileFilter;
import edu.stanford.nlp.util.Timing;
/** This class just prints out sentences and their lengths.
* Use: java SentenceLengths /turing/corpora/Treebank2/combined/wsj/07
* [fileRange]
*\/
public class SentenceLengths {
private static final int maxleng = 100;
private static int[] lengthCounts = new int[maxleng+1];
private static int numSents = 0;
public static void main(String[] args) {
Timing.startTime();
Treebank treebank = new DiskTreebank(
new LabeledScoredTreeReaderFactory());
if (args.length > 1) {
treebank.loadPath(args[0], new NumberRangesFileFilter(args[1],
true));
} else {
treebank.loadPath(args[0]);
}
for (Iterator it = treebank.iterator(); it.hasNext(); ) {
Tree t = (Tree) it.next();
numSents++;
int len = t.yield().length();
if (len <= maxleng) {
lengthCounts[len]++;
}
}
System.out.print("Files " + args[0] + " ");
if (args.length > 1) {
System.out.print(args[1] + " ");
}
System.out.println("consists of " + numSents + " sentences");
for (int i = 0; i <= maxleng; i++) {
System.out.println(" " + lengthCounts[i] + " of length " + i);
}
Timing.endTime("Read/count all trees");
}
}
Treebank, custom TreeReaderFactory, Tree, and Constituent
TreeReaderFactory
, and illustrates more of the
Tree
package, and the notion of a
Constituent
. A Constituent
has a
start and end point and a Label
.
import java.io.*;
import java.util.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.util.*;
/** This class counts how often each constituent appears
* Use: java ConstituentCounter /turing/corpora/Treebank2/combined/wsj/07
*\
public class ConstituentCounter {
public static void main(String[] args) {
Treebank treebank = new DiskTreebank(new TreeReaderFactory() {
public TreeReader newTreeReader(Reader in) {
return new TreeReader(in,
new LabeledScoredTreeFactory(new StringLabelFactory()),
new BobChrisTreeNormalizer());
}
});
treebank.loadPath(args[0]);
Counter cnt = new Counter();
ConstituentFactory confac = LabeledConstituent.factory();
for (Iterator it = treebank.iterator(); it.hasNext(); ) {
Tree t = (Tree) it.next();
Set constituents = t.constituents(confac);
for (Iterator it2 = constituents.iterator(); it2.hasNext(); ) {
Constituent c = (Constituent) it2.next();
cnt.increment(c);
}
}
SortedSet ss = new TreeSet(cnt.seenSet());
for (Iterator it = ss.iterator(); it.hasNext(); ) {
Constituent c = (Constituent) it.next();
System.out.println(c + " " + cnt.countOf(c));
}
}
}
Tree and Label
Tree
and Label
classes is a
central part of using this package. This code works out the
set of tags (preterminal labels) used in a Treebank. It
illustrates writing ones own code to recurse through a Tree, and getting
a String value for a Label.
import java.util.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.util.Counter;
/** This class prints out trees from strings and counts their preterminals.
* Use: java TreesFromStrings '(S (NP (DT This)) (VP (VBD was) (JJ good)))'
*\/
public class TreesFromStrings {
private static void addTerminals(Tree t, Counter c) {
if (t.isLeaf()) {
// do nothing
} else if (t.isPreTerminal()) {
c.increment(t.label().value());
} else {
// phrasal node
Tree[] kids = t.children();
for (int i = 0; i < kids.length; i++) {
addTerminals(kids[i], c);
}
}
}
public static void main(String[] args) {
Treebank tb = new MemoryTreebank();
for (int i = 0; i < args.length; i++) {
try {
Tree t = Tree.valueOf(args[i]);
tb.add(t);
} catch (Exception e) {
e.printStackTrace();
}
}
Counter c = new Counter();
for (Iterator it = tb.iterator(); it.hasNext(); ) {
Tree t = (Tree) it.next();
addTerminals(t, c);
}
System.out.println(c);
}
}
import java.io.*;
import edu.stanford.nlp.trees.*;
public class SentencePrinter {
/** Loads SentenceBank from first argument and prints it out.
* Usage: java SentencePrinter sentencebankPath
* @param args Array of command-line arguments
*\/
public static void main(String[] args) {
SentenceReaderFactory srf = new SentenceReaderFactory() {
public SentenceReader newSentenceReader(Reader in) {
return new SentenceReader(in, new TaggedWordFactory(),
new PennSentenceNormalizer(),
new PennTagbankStreamTokenizer(in));
}
};
Sentencebank sentencebank = new DiskSentencebank(srf);
sentencebank.loadPath(args[0]);
sentencebank.apply(new SentenceVisitor() {
public void visitSentence(final Sentence s) {
// also print tag as well as word
System.out.println(s.toString(false));
}
});
}
}