public abstract class Treebank extends AbstractCollection<Tree>
Treebank
object provides access to a corpus of examples with
given tree structures.
This class now implements the Collection interface. However, it may offer
less than the full power of the Collection interface: some Treebanks are
read only, and so may throw the UnsupportedOperationException.Modifier and Type | Field and Description |
---|---|
static String |
DEFAULT_TREE_FILE_SUFFIX |
Constructor and Description |
---|
Treebank()
Create a new Treebank (using a LabeledScoredTreeReaderFactory).
|
Treebank(int initialCapacity)
Create a new Treebank.
|
Treebank(int initialCapacity,
TreeReaderFactory trf)
Create a new Treebank.
|
Treebank(TreeReaderFactory trf)
Create a new Treebank.
|
Treebank(TreeReaderFactory trf,
String encoding)
Create a new Treebank.
|
Modifier and Type | Method and Description |
---|---|
abstract void |
apply(TreeVisitor tp)
Apply a TreeVisitor to each tree in the Treebank.
|
abstract void |
clear()
Empty a
Treebank . |
void |
decimate(Writer trainW,
Writer devW,
Writer testW)
Divide a Treebank into 3, by taking every 9th sentence for the dev
set and every 10th for the test set.
|
String |
encoding()
Returns the encoding in use for treebank file bytestream access.
|
void |
loadPath(File path)
Load a sequence of trees from given file or directory and its subdirectories.
|
abstract void |
loadPath(File path,
FileFilter filt)
Load trees from given path specification.
|
void |
loadPath(File path,
String suffix,
boolean recursively)
Load trees from given directory.
|
void |
loadPath(String pathName)
Load a sequence of trees from given directory and its subdirectories.
|
void |
loadPath(String pathName,
FileFilter filt)
Load a sequence of trees from given directory and its subdirectories
which match the file filter.
|
void |
loadPath(String pathName,
String suffix,
boolean recursively)
Load trees from given directory.
|
boolean |
remove(Object o)
This operation isn't supported for a Treebank.
|
int |
size()
Returns the size of the Treebank.
|
String |
textualSummary()
Return various statistics about the treebank (number of sentences,
words, tag set, etc.).
|
String |
textualSummary(TreebankLanguagePack tlp)
Return various statistics about the treebank (number of sentences,
words, tag set, etc.).
|
String |
toString()
Return the whole treebank as a series of big bracketed lists.
|
Treebank |
transform(TreeTransformer treeTrans)
Return a Treebank (actually a TransformingTreebank) where each
Tree in the current treebank has been transformed using the
TreeTransformer.
|
protected TreeReaderFactory |
treeReaderFactory()
Get the
TreeReaderFactory for a Treebank --
this method is provided in order to make the
TreeReaderFactory available to subclasses. |
add, addAll, contains, containsAll, isEmpty, iterator, removeAll, retainAll, toArray, toArray
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
equals, hashCode, parallelStream, removeIf, spliterator, stream
public static final String DEFAULT_TREE_FILE_SUFFIX
public Treebank()
public Treebank(TreeReaderFactory trf)
trf
- the factory class to be called to create a new
TreeReader
public Treebank(TreeReaderFactory trf, String encoding)
trf
- the factory class to be called to create a new
TreeReader
encoding
- The charset encoding to use for treebank file decodingpublic Treebank(int initialCapacity)
initialCapacity
- The initial size of the underlying Collection,
(if a Collection-based storage mechanism is being provided)public Treebank(int initialCapacity, TreeReaderFactory trf)
initialCapacity
- The initial size of the underlying Collection,
(if a Collection-based storage mechanism is being provided)trf
- the factory class to be called to create a new
TreeReader
protected TreeReaderFactory treeReaderFactory()
TreeReaderFactory
for a Treebank
--
this method is provided in order to make the
TreeReaderFactory
available to subclasses.public String encoding()
public abstract void clear()
Treebank
.clear
in interface Collection<Tree>
clear
in class AbstractCollection<Tree>
public void loadPath(String pathName)
pathName
- file or directory namepublic void loadPath(File path)
path
- File specificationpublic void loadPath(String pathName, String suffix, boolean recursively)
pathName
- File or directory namesuffix
- Extension of files to load: If pathName
is a directory, then, if this is
non-null
, all and only files ending in "." followed
by this extension will be loaded; if it is null
,
all files in directories will be loaded. If pathName
is not a directory, this parameter is ignored.recursively
- descend into subdirectories as wellpublic void loadPath(File path, String suffix, boolean recursively)
path
- file or directory to load fromsuffix
- suffix of files to loadrecursively
- descend into subdirectories as wellpublic void loadPath(String pathName, FileFilter filt)
pathName
- file or directory namefilt
- A filter used to determine which files matchpublic abstract void loadPath(File path, FileFilter filt)
path
- file or directory to load fromfilt
- a FilenameFilter of files to loadpublic abstract void apply(TreeVisitor tp)
tp
- The TreeVisitor to be appliedpublic Treebank transform(TreeTransformer treeTrans)
treeTrans
- The TreeTransformer to usepublic String toString()
toString
in class AbstractCollection<Tree>
public int size()
size
in interface Collection<Tree>
size
in class AbstractCollection<Tree>
public void decimate(Writer trainW, Writer devW, Writer testW) throws IOException
IOException
public String textualSummary()
public String textualSummary(TreebankLanguagePack tlp)
tlp
- The TreebankLanguagePack used to determine punctuation and an
appropriate character encodingpublic boolean remove(Object o)
remove
in interface Collection<Tree>
remove
in class AbstractCollection<Tree>