Interface | Description |
---|---|
Dataset |
A generic interface loading, processing, and writing a data set.
|
Mapper |
Generic interface for mapping one string to another given some contextual evidence.
|
Class | Description |
---|---|
AbstractDataset | |
AbstractDataset.SplitFilter | |
ConfigParser | |
DefaultMapper | |
DistributionPackage |
Adds data files to a tar'd / gzip'd distribution package.
|
DuplicateTreeStringFilter |
Filters trees based on duplicate toString()
for example, java edu.stanford.nlp.trees.Treebanks -filter edu.stanford.nlp.trees.treebank.DuplicateTreeStringFilter -pennPrint /u/nlp/data/constituency-parser/models-4.0.0/data/ewt/ptb/train/ewt-train.mrg |
EnglishPTBTreebankCorrector |
Correct some of the errors in the LDC99T42 Penn Treebank 3.
|
OntoNotesUDUpdater |
Class for updating the OntoNotes data.
|
PunctCountingTreeVisitor |
Counts punctuation statistics of a treebank.
|
TreebankPreprocessor |
A data preparation pipeline for treebanks.
|
UselessTreeFilter |
Deletes trees from the EWT which we deem to be useless.
|
Enum | Description |
---|---|
Dataset.Encoding |