edu.stanford.nlp.process.treebank
Class AbstractDataset
java.lang.Object
edu.stanford.nlp.process.treebank.AbstractDataset
- All Implemented Interfaces:
- Dataset
- Direct Known Subclasses:
- ATBArabicDataset, FTBDataset
public abstract class AbstractDataset
- extends Object
- implements Dataset
- Author:
- Spence Green
outputFileList
protected final List<String> outputFileList
posMapper
protected Mapper posMapper
posMapOptions
protected String posMapOptions
lexMapper
protected Mapper lexMapper
lexMapOptions
protected String lexMapOptions
encoding
protected Dataset.Encoding encoding
pathsToData
protected final List<File> pathsToData
pathsToMappings
protected final List<File> pathsToMappings
splitFilter
protected FileFilter splitFilter
addDeterminer
protected boolean addDeterminer
removeDashTags
protected boolean removeDashTags
addRoot
protected boolean addRoot
removeEscapeTokens
protected boolean removeEscapeTokens
maxLen
protected int maxLen
morphDelim
protected String morphDelim
customTreeVisitor
protected TreeVisitor customTreeVisitor
outFileName
protected String outFileName
flatFileName
protected String flatFileName
makeFlatFile
protected boolean makeFlatFile
fileNameNormalizer
protected final Pattern fileNameNormalizer
treebank
protected Treebank treebank
configuredOptions
protected final Set<String> configuredOptions
requiredOptions
protected final Set<String> requiredOptions
toStringBuffer
protected final StringBuilder toStringBuffer
treeFileExtension
protected String treeFileExtension
options
protected StringMap options
- Provides access for sub-classes to the data set parameters
AbstractDataset
public AbstractDataset()
build
public abstract void build()
- Description copied from interface:
Dataset
- Generic method for loading, processing, and writing a dataset.
- Specified by:
build
in interface Dataset
setOptions
public boolean setOptions(StringMap opts)
- Description copied from interface:
Dataset
- Sets options for a dataset.
- Specified by:
setOptions
in interface Dataset
- Parameters:
opts
- A map from parameter types defined in ConfigParser
to
values
- Returns:
- true if opts contains all required options. false, otherwise.
buildSplitMap
protected StringMap buildSplitMap(File path)
getFilenames
public List<String> getFilenames()
- Description copied from interface:
Dataset
- Returns the filenames written by
Dataset.build()
.
- Specified by:
getFilenames
in interface Dataset
- Returns:
- A collection of filenames
toString
public String toString()
- Overrides:
toString
in class Object
Stanford NLP Group