|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Interface Summary | |
ConstituentFactory | A ConstituentFactory is a factory for creating objects
of class Constituent , or some descendent class. |
HasAttributes | Something that implements the HasAttributes interface
knows about attributes stored in a Map . |
HasCategory | Something that implements the HasCategory interface
knows about categories. |
HasFollow | Something that implements the HasFollow interface
knows about the characters that follow a token. |
HasTag | Something that implements the HasTag interface
knows about part-of-speech tags. |
HasWord | Something that implements the HasWord interface
knows about words. |
HeadFinder | An interface for finding the "head" daughter of a phrase structure tree. |
Label | Something that implements the Label interface can act as a
constituent, node, or word label with linguistic attributes. |
Labeled | Interface for Objects which have a Label . |
LabelFactory | A LabelFactory object acts as a factory for creating
objects of class Label , or some descendent class. |
SentenceProcessor | This is a simple interface for applying a transformer to a
Sentence . |
SentenceReaderFactory | A SentenceReaderFactory is a factory for creating objects of
class SentenceReader , or some descendent class. |
SentenceVisitor | This is a simple interface for operations that are going to be applied
to a Sentence . |
TreebankLanguagePack | This interface specifies language/treebank specific information for a Treebank, which a parser might need to know. |
TreeFactory | A TreeFactory acts as a factory for creating objects of
class Tree , or some descendent class. |
TreeProcessor | This is a simple interface for operations that are going to be applied
to a Tree . |
TreeReader | A TreeReader adds functionality to another Reader
by reading in Trees, or some descendant class. |
TreeReaderFactory | A TreeReaderFactory is a factory for creating objects of
class TreeReader , or some descendent class. |
TreeTransformer | This is a simple interface for a function that alters a
local Tree . |
Class Summary | |
AbstractLabel | An AbstractLabel object acts as a Label with linguistic
attributes. |
AbstractTreebankLanguagePack | This provides an implementation of parts of the TreebankLanguagePack API to reduce the load on fresh implementations. |
AdaptiveLabelFactory | An AdaptiveLabelFactory object makes simple
Label s for objects, by creating a label of an
appropriate type depending on the arguments passed in. |
AdwaitSentenceReaderFactory | This class implements a SentenceReaderFactory which is
suitable for reading in tagged or untagged sentences formatted
one per line with underscores used only to separate words from POS tags. |
AdwaitStreamTokenizer | Builds a tokenizer for files where whitespace separates tokens, and eol is significant. |
BobChrisTreeNormalizer | Normalizes trees in the way used in Manning and Carpenter 1997. |
Category | A Category object acts as a Label by containing a
String that is a category (nonterminal). |
CategoryWordTag | A CategoryWordTag object acts as a complex Label
which contains a category, a head word, and a tag. |
CategoryWordTagFactory | A CategoryWordTagFactory is a factory that makes
a Label which is a CategoryWordTag triplet. |
ChineseCollinizer | Performs collinization operations on Chinese trees similar to those for English Namely: strips all functional & automatically-added tags strips all punctuation merges PRN and ADVP eliminates ROOT (note that there are a few non-unary ROOT nodes; these are not eliminated) |
ChineseHeadFinder | HeadFinder for the Penn Chinese Treebank. |
ChineseTreebankLanguagePack | Language pack for Chinese treebank. |
CollinsHeadFinder | Implements the HeadFinder found in Michael Collins' 1999 thesis. |
CollinsSemanticHeadFinder | Implements a 'semantic head' variant of the the HeadFinder found in Michael Collins' 1999 thesis. |
Constituent | A Constituent object defines a generic edge in a graph. |
DanBobChrisTreeNormalizer | Normalizes trees roughly the way used in Manning and Carpenter 1997. |
Dependency | An individual dependency. |
DiskSentencebank | A DiskSentencebank object stores merely the information to
get at a corpus of sentences that is stored on disk. |
DiskTreebank | A DiskTreebank object stores merely the information to
get at a corpus of trees that is stored on disk. |
LabeledConstituent | A LabeledConstituent object represents a single bracketing in
a derivation, including start and end points and Label
information, but excluding probabilistic information. |
LabeledScoredConstituent | A LabeledScoredConstituent object defines an edge in a graph
with a label and a score. |
LabeledScoredConstituentFactory | A LabeledScoredConstituentFactory acts as a factory for
creating objects of class LabeledScoredConstituent . |
LabeledScoredTreeFactory | A LabeledScoredTreeFactory acts as a factory for creating
trees with labels and scores. |
LabeledScoredTreeLeaf | A LabeledScoredTreeLeaf represents the leaf of a tree
in a parse tree with labels and scores. |
LabeledScoredTreeNode | A LabeledScoredTreeNode represents a tree composed of a root
label, a score,
and an array of daughter parse trees. |
LabeledScoredTreeReaderFactory | This class implements a TreeReaderFactory that produces
labeled, scored array-based Trees, which have been cleaned up to
delete empties, etc. |
LeftHeadFinder | HeadFinder that always returns the leftmost daughter as head. |
MemorySentencebank | A MemorySentencebank object stores a corpus of examples with
given sentence structures in memory (as a Collection) |
MemoryTreebank | A MemoryTreebank object stores a corpus of examples with
given tree structures in memory (as a List ). |
ModCollinsHeadFinder | Implements a variant on the HeadFinder found in Michael Collins' 1999 thesis. |
NegraTreeNormalizer | Tree normalizer for Negra. |
NoPunctTreeNormalizer | Normalizes trees roughly the way used in Manning and Carpenter 1997. |
NPTmpRetainingTreeNormalizer | Same TreeNormalizer as BobChrisTreeNormalizer, but optionally provides five extras. |
NullLabel | A NullLabel object acts as a Label with linguistic
attributes, but doesn't actually store or return anything. |
OnePerLineSentenceNormalizer | A class for sentence normalization. |
ParametricTreeNormalizer | Normalizes trees based on parameter settings. |
ParentTransformedLabeledNormalizedTreeReaderFactory | This class implements a TreeReaderFactory that produces
labeled, scored array-based Trees, which have been cleaned up to
delete empties, etc. |
PennSentenceMrgNormalizer | A class for sentence normalization. |
PennSentenceNormalizer | A class for Penn tag directory sentence normalization. |
PennSentenceReaderFactory | This class implements a SentenceReaderFactory which is
suitable for reading in tagged sentences from the Penn Treebank. |
PennTagbankStreamTokenizer | Builds a tokenizer for Penn pos tagged directories. |
PennTreebankLanguagePack | Specifies the treebank/language specific components needed for parsing the English Penn Treebank. |
PennTreebankStreamTokenizer | Builds a tokenizer for English PennTreebank (release 2) trees. |
PennTreeReader | A PennTreeReader is a TreeReader that
reads in Penn Treebank-style files. |
PruneNodesStripSubtagsTreeNormalizer | ?? |
SbjRetainingTreeNormalizer | Same TreeNormalizer as BobChrisTreeNormalizer, but retains -SBJ labels on NP with the new identification NP#SBJ |
Sentence | Sentence holds a single sentence, and mediating between word numbers and words. |
Sentencebank | A Sentencebank object provides access to a corpus of
sentences -- just plain sentences or tagged sentences, etc. |
SentenceNormalizer | A class for sentence normalization. |
SentenceReader | A SentenceReader adds functionality to a Reader
by reading in Sentence s, or some descendant class. |
SepTreeNormalizer | Exactly like BobChrisTreeNormalize, except does not strip functional tags from NPs. |
SimpleConstituent | A SimpleConstituent object defines a generic edge in a graph. |
SimpleConstituentFactory | A ConstituentFactory acts as a factory for creating objects
of class Constituent , or some descendent class. |
SimpleSentenceReaderFactory | This class implements a simple default SentenceReaderFactory . |
SimpleTree | A SimpleTree is a minimal concrete implementation of an
unlabeled, unscored Tree . |
SimpleTreeFactory | A SimpleTreeFactory acts as a factory for creating objects of
class SimpleTree . |
SimpleTreeReaderFactory | This class implements a simple default TreeReaderFactory . |
Span | A Span is an optimized SimpleConstituent object. |
StringLabel | A StringLabel object acts as a Label by containing a
single String, which it sets or returns in response to requests. |
StringLabeledScoredTreeReaderFactory | This class implements a TreeReaderFactory that produces
labeled, scored array-based Trees, which have been cleaned up to
delete empties, etc. |
StringLabelFactory | A StringLabelFactory object makes a simple
StringLabel out of a String . |
Tag | A Tag object acts as a Label by containing a
String that is a part-of-speech tag. |
TaggedWord | A TaggedWord object contains a word and its tag. |
TaggedWordFactory | A TaggedWordFactory acts as a factory for creating objects of
class TaggedWord . |
TagMapper | A POS tag to POS tag mapper. |
Tree | The abstract class Tree is used to collect all of the
tree types, and acts as a generic composite type. |
Treebank | A Treebank object provides access to a corpus of examples with
given tree structures. |
TreeJugglers | |
TreeLengthComparator | A TreeLengthComparator orders trees by their yield sentence
lengths. |
TreeNormalizer | A class for tree normalization. |
TreeNormalizers | A collection of static methods that return a TreeNormalizer. |
Trees | Various utilities for the Tree class. |
WeightedFollowedTaggedWord | A WeightedFollowedTaggedWord object contains a word and its
tag, but it also records what text follows the token. |
Word | A Word object acts as a Label by containing a String. |
WordFactory | A WordFactory acts as a factory for creating objects of
class Word . |
WordLabeledScoredTreeReaderFactory | This class implements a TreeReaderFactory that produces
Word labeled, scored array-based Trees, which have been
cleaned up to delete empties, etc., according to the
BobChrisTreeNormalizer . |
WordTag | A WordTag corresponds to a tagged (e.g., for part of speech) word and is implemented with String-valued word and tag. |
WordTagFactory | A WordTagFactory acts as a factory for creating
objects of class WordTag . |
A package for (NLP) trees, sentences, and similar things. This package provides several key abstractions (via abstract classes) and a number of further classes for related objects. Most of these classes use a Factory pattern to instantiate objects.
A Label
is something that can be the label of a Tree or a
Constituent. The simplest label is a StringLabel
.
A Word
or a TaggedWord
is a
Label
. They can be constructed with a
LabelFactory
. A Label
often implements
various interfaces, such as HasWord
.
A Constituent
object defines a generic edge in a graph. It
has a start and end, and usually a Label
. A
ConstituentFactory
builds a Constituent
.
A Tree
object provides generic facilities for manipulating
NLP trees. A TreeFactory
can build a Tree
.
A Treebank
provides an interface to a
collection of parsed sentences (normally found on disk as a corpus).
A TreeReader
reads trees from an InputStream
.
A TreeReaderFactory
builds a TreeReader
.
A TreeNormalizer
canonicalizes a Tree
on
input from a File
. A HeadFinder
finds the
head daughter of a Tree
. The TreeProcessor
interface is for general sequential processing of trees, and the
TreeTransformer
interface is for changing them.
A Sentence
is a subclass of an ArrayList
.
A Sentencebank
provides an interface to a large number of
sentences (normally found on disk as a corpus).
A SentenceReader
reads sentences from an
InputStream
. A SentenceReaderFactory
builds a SentenceReader
. A SentenceNormalizer
canonicalizes a Sentence
on input from a File
.
The SentenceProcessor
interface is for general sequential
processing of sentences.
There are also various subclasses of StreamTokenizer
. The class
PairFinder
should probably be removed to samples
.
Design notes: This package is the result of several iterations of trying to come up with a reusable and extendable set of tree classes. It may still be nonoptimal, but some thought went into it! At any rate, there are several things that it is important to understand to use the class effectively. One is that a Label has a primary value() which is always a String, and this is the only thing that matters for fundamental Label operations, such as checking equality. While anything else (or nothing) can be stored in a Label, all other Label content is regarded as purely decorative. All Label implementations should implement a labelFactory() method that returns a LabelFactory for the appropriate kind of Label. Since this depends on the exact class, this method should always be overwritten when a Label class is extended. The existing Label classes also provide a static factory() method which returns the same thing.
Road Map: There are some plans to change things. We plan to redo Label, so that all Label classes just inherit from AbstractLabel, and do a full equality test on all their fields. The default type of Treebank should be useful. TreeReader should be PennTreeReader. And there is probably more.
trees
packageHere is some fairly straightforward code for loading trees from a treebank and iterating over the trees contained therein. It builds a histogram of sentence lengths.
This example illustrates building a Treebank by hand, specifying a
custom
Dealing with the As well as the Treebank classes, there are corresponding Sentencebank
classes (though they are not quite so extensively developed.
This final example shows use of a Sentencebank. It also
illustrates the Visitor pattern for examining sentences in a
Sentencebank. This was actually the original visitation
pattern for Treebank and Sentencebank, but these days, it's in
general easier to use an Iterator. You can also get Sentences
from a Treebank, by taking the yield() or taggedYield() of
each Tree.
import java.util.Iterator;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.io.NumberRangesFileFilter;
import edu.stanford.nlp.util.Timing;
/** This class just prints out sentences and their lengths.
* Use: java SentenceLengths /turing/corpora/Treebank2/combined/wsj/07
* [fileRange]
*/
public class SentenceLengths {
private static final int maxleng = 100;
private static int[] lengthCounts = new int[maxleng+1];
private static int numSents = 0;
public static void main(String[] args) {
Timing.startTime();
Treebank treebank = new DiskTreebank(
new LabeledScoredTreeReaderFactory());
if (args.length > 1) {
treebank.loadPath(args[0], new NumberRangesFileFilter(args[1],
true));
} else {
treebank.loadPath(args[0]);
}
for (Iterator it = treebank.iterator(); it.hasNext(); ) {
Tree t = (Tree) it.next();
numSents++;
int len = t.yield().length();
if (len <= maxleng) {
lengthCounts[len]++;
}
}
System.out.print("Files " + args[0] + " ");
if (args.length > 1) {
System.out.print(args[1] + " ");
}
System.out.println("consists of " + numSents + " sentences");
for (int i = 0; i <= maxleng; i++) {
System.out.println(" " + lengthCounts[i] + " of length " + i);
}
Timing.endTime("Read/count all trees");
}
}
Treebank, custom TreeReaderFactory, Tree, and Constituent
TreeReaderFactory
, and illustrates more of the
Tree
package, and the notion of a
Constituent
. A Constituent
has a
start and end point and a Label
.
import java.io.*;
import java.util.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.util.*;
/** This class counts how often each constituent appears
* Use: java ConstituentCounter /turing/corpora/Treebank2/combined/wsj/07
*/
public class ConstituentCounter {
public static void main(String[] args) {
Treebank treebank = new DiskTreebank(new TreeReaderFactory() {
public TreeReader newTreeReader(Reader in) {
return new TreeReader(in,
new LabeledScoredTreeFactory(new StringLabelFactory()),
new BobChrisTreeNormalizer());
}
});
treebank.loadPath(args[0]);
Counter cnt = new Counter();
ConstituentFactory confac = LabeledConstituent.factory();
for (Iterator it = treebank.iterator(); it.hasNext(); ) {
Tree t = (Tree) it.next();
Set constituents = t.constituents(confac);
for (Iterator it2 = constituents.iterator(); it2.hasNext(); ) {
Constituent c = (Constituent) it2.next();
cnt.increment(c);
}
}
SortedSet ss = new TreeSet(cnt.seenSet());
for (Iterator it = ss.iterator(); it.hasNext(); ) {
Constituent c = (Constituent) it.next();
System.out.println(c + " " + cnt.countOf(c));
}
}
}
Tree and Label
Tree
and Label
classes is a
central part of using this package. This code works out the
set of tags (preterminal labels) used in a Treebank. It
illustrates writing ones own code to recurse through a Tree, and getting
a String value for a Label.
import java.util.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.util.Counter;
/** This class prints out trees from strings and counts their preterminals.
* Use: java TreesFromStrings '(S (NP (DT This)) (VP (VBD was) (JJ good)))'
*/
public class TreesFromStrings {
private static void addTerminals(Tree t, Counter c) {
if (t.isLeaf()) {
// do nothing
} else if (t.isPreTerminal()) {
c.increment(t.label().value());
} else {
// phrasal node
Tree[] kids = t.children();
for (int i = 0; i < kids.length; i++) {
addTerminals(kids[i], c);
}
}
}
public static void main(String[] args) {
Treebank tb = new MemoryTreebank();
for (int i = 0; i < args.length; i++) {
try {
Tree t = Tree.valueOf(args[i]);
tb.add(t);
} catch (Exception e) {
e.printStackTrace();
}
}
Counter c = new Counter();
for (Iterator it = tb.iterator(); it.hasNext(); ) {
Tree t = (Tree) it.next();
addTerminals(t, c);
}
System.out.println(c);
}
}
import java.io.*;
import edu.stanford.nlp.trees.*;
public class SentencePrinter {
/** Loads SentenceBank from first argument and prints it out.
* Usage: java SentencePrinter sentencebankPath
* @param args Array of command-line arguments
*/
public static void main(String[] args) {
SentenceReaderFactory srf = new SentenceReaderFactory() {
public SentenceReader newSentenceReader(Reader in) {
return new SentenceReader(in, new TaggedWordFactory(),
new PennSentenceNormalizer(),
new PennTagbankStreamTokenizer(in));
}
};
Sentencebank sentencebank = new DiskSentencebank(srf);
sentencebank.loadPath(args[0]);
sentencebank.apply(new SentenceVisitor() {
public void visitSentence(final Sentence s) {
// also print tag as well as word
System.out.println(s.toString(false));
}
});
}
}
Overview
Package
Class
Tree
Deprecated
Index
Help
PREV PACKAGE
NEXT PACKAGE
FRAMES
NO FRAMES
Stanford NLP Group