| Interface | Description |
|---|---|
| Dataset |
A generic interface loading, processing, and writing a data set.
|
| Mapper |
Generic interface for mapping one string to another given some contextual evidence.
|
| Class | Description |
|---|---|
| AbstractDataset | |
| AbstractDataset.SplitFilter | |
| ConfigParser | |
| DefaultMapper | |
| DistributionPackage |
Adds data files to a tar'd / gzip'd distribution package.
|
| DuplicateTreeStringFilter |
Filters trees based on duplicate toString()
for example, java edu.stanford.nlp.trees.Treebanks -filter edu.stanford.nlp.trees.treebank.DuplicateTreeStringFilter -pennPrint /u/nlp/data/constituency-parser/models-4.0.0/data/ewt/ptb/train/ewt-train.mrg |
| EnglishPTBTreebankCorrector |
Correct some of the errors in the LDC99T42 Penn Treebank 3.
|
| OntoNotesUDUpdater |
Class for updating the OntoNotes data.
|
| PunctCountingTreeVisitor |
Counts punctuation statistics of a treebank.
|
| TreebankPreprocessor |
A data preparation pipeline for treebanks.
|
| UselessTreeFilter |
Deletes trees from the EWT which we deem to be useless.
|
| Enum | Description |
|---|---|
| Dataset.Encoding |