public abstract class Treebank extends java.util.AbstractCollection<Tree>
Treebank
object provides access to a corpus of examples with
given tree structures.
This class now implements the Collection interface. However, it may offer
less than the full power of the Collection interface: some Treebanks are
read only, and so may throw the UnsupportedOperationException.Modifier and Type | Field and Description |
---|---|
static java.lang.String |
DEFAULT_TREE_FILE_SUFFIX |
Constructor and Description |
---|
Treebank()
Create a new Treebank (using a LabeledScoredTreeReaderFactory).
|
Treebank(int initialCapacity)
Create a new Treebank.
|
Treebank(int initialCapacity,
TreeReaderFactory trf)
Create a new Treebank.
|
Treebank(TreeReaderFactory trf)
Create a new Treebank.
|
Treebank(TreeReaderFactory trf,
java.lang.String encoding)
Create a new Treebank.
|
Modifier and Type | Method and Description |
---|---|
abstract void |
apply(TreeVisitor tp)
Apply a TreeVisitor to each tree in the Treebank.
|
abstract void |
clear()
Empty a
Treebank . |
void |
decimate(java.io.Writer trainW,
java.io.Writer devW,
java.io.Writer testW)
Divide a Treebank into 3, by taking every 9th sentence for the dev
set and every 10th for the test set.
|
java.lang.String |
encoding()
Returns the encoding in use for treebank file bytestream access.
|
void |
loadPath(java.io.File path)
Load a sequence of trees from given file or directory and its subdirectories.
|
abstract void |
loadPath(java.io.File path,
java.io.FileFilter filt)
Load trees from given path specification.
|
void |
loadPath(java.io.File path,
java.lang.String suffix,
boolean recursively)
Load trees from given directory.
|
void |
loadPath(java.lang.String pathName)
Load a sequence of trees from given directory and its subdirectories.
|
void |
loadPath(java.lang.String pathName,
java.io.FileFilter filt)
Load a sequence of trees from given directory and its subdirectories
which match the file filter.
|
void |
loadPath(java.lang.String pathName,
java.lang.String suffix,
boolean recursively)
Load trees from given directory.
|
boolean |
remove(java.lang.Object o)
This operation isn't supported for a Treebank.
|
int |
size()
Returns the size of the Treebank.
|
java.lang.String |
textualSummary()
Return various statistics about the treebank (number of sentences,
words, tag set, etc.).
|
java.lang.String |
textualSummary(TreebankLanguagePack tlp)
Return various statistics about the treebank (number of sentences,
words, tag set, etc.).
|
java.lang.String |
toString()
Return the whole treebank as a series of big bracketed lists.
|
Treebank |
transform(TreeTransformer treeTrans)
Return a Treebank (actually a TransformingTreebank) where each
Tree in the current treebank has been transformed using the
TreeTransformer.
|
TreeReaderFactory |
treeReaderFactory()
Get the
TreeReaderFactory for a Treebank --
this method is provided in order to make the
TreeReaderFactory available to subclasses. |
add, addAll, contains, containsAll, isEmpty, iterator, removeAll, retainAll, toArray, toArray
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
public static final java.lang.String DEFAULT_TREE_FILE_SUFFIX
public Treebank()
public Treebank(TreeReaderFactory trf)
trf
- the factory class to be called to create a new
TreeReader
public Treebank(TreeReaderFactory trf, java.lang.String encoding)
trf
- the factory class to be called to create a new
TreeReader
encoding
- The charset encoding to use for treebank file decodingpublic Treebank(int initialCapacity)
initialCapacity
- The initial size of the underlying Collection,
(if a Collection-based storage mechanism is being provided)public Treebank(int initialCapacity, TreeReaderFactory trf)
initialCapacity
- The initial size of the underlying Collection,
(if a Collection-based storage mechanism is being provided)trf
- the factory class to be called to create a new
TreeReader
public TreeReaderFactory treeReaderFactory()
TreeReaderFactory
for a Treebank
--
this method is provided in order to make the
TreeReaderFactory
available to subclasses.public java.lang.String encoding()
public abstract void clear()
Treebank
.public void loadPath(java.lang.String pathName)
pathName
- file or directory namepublic void loadPath(java.io.File path)
path
- File specificationpublic void loadPath(java.lang.String pathName, java.lang.String suffix, boolean recursively)
pathName
- File or directory namesuffix
- Extension of files to load: If pathName
is a directory, then, if this is
non-null
, all and only files ending in "." followed
by this extension will be loaded; if it is null
,
all files in directories will be loaded. If pathName
is not a directory, this parameter is ignored.recursively
- descend into subdirectories as wellpublic void loadPath(java.io.File path, java.lang.String suffix, boolean recursively)
path
- file or directory to load fromsuffix
- suffix of files to loadrecursively
- descend into subdirectories as wellpublic void loadPath(java.lang.String pathName, java.io.FileFilter filt)
pathName
- file or directory namefilt
- A filter used to determine which files matchpublic abstract void loadPath(java.io.File path, java.io.FileFilter filt)
path
- file or directory to load fromfilt
- a FilenameFilter of files to loadpublic abstract void apply(TreeVisitor tp)
tp
- The TreeVisitor to be appliedpublic Treebank transform(TreeTransformer treeTrans)
treeTrans
- The TreeTransformer to usepublic java.lang.String toString()
toString
in class java.util.AbstractCollection<Tree>
public int size()
public void decimate(java.io.Writer trainW, java.io.Writer devW, java.io.Writer testW)
public java.lang.String textualSummary()
public java.lang.String textualSummary(TreebankLanguagePack tlp)
tlp
- The TreebankLanguagePack used to determine punctuation and an
appropriate character encoding