edu.stanford.nlp.international.process
Class AbstractDataset

java.lang.Object
  extended by edu.stanford.nlp.international.process.AbstractDataset
All Implemented Interfaces:
Dataset
Direct Known Subclasses:
ATBArabicDataset, FTBDataset

public abstract class AbstractDataset
extends java.lang.Object
implements Dataset

Author:
Spence Green

Nested Class Summary
protected  class AbstractDataset.SplitFilter
           
 
Nested classes/interfaces inherited from interface edu.stanford.nlp.international.process.Dataset
Dataset.Encoding
 
Field Summary
protected  boolean addDeterminer
           
protected  boolean addRoot
           
protected  java.util.Set<java.lang.String> configuredOptions
           
protected  TreeVisitor customTreeVisitor
           
protected  Dataset.Encoding encoding
           
protected  java.util.regex.Pattern fileNameNormalizer
           
protected  java.lang.String flatFileName
           
protected  java.lang.String lexMapOptions
           
protected  Mapper lexMapper
           
protected  boolean makeFlatFile
           
protected  int maxLen
           
protected  java.lang.String morphDelim
           
protected  StringMap options
          Provides access for sub-classes to the data set parameters
protected  java.lang.String outFileName
           
protected  java.util.List<java.lang.String> outputFileList
           
protected  java.util.List<java.io.File> pathsToData
           
protected  java.util.List<java.io.File> pathsToMappings
           
protected  java.lang.String posMapOptions
           
protected  Mapper posMapper
           
protected  boolean removeDashTags
           
protected  boolean removeEscapeTokens
           
protected  java.util.Set<java.lang.String> requiredOptions
           
protected  java.io.FileFilter splitFilter
           
protected  java.lang.StringBuilder toStringBuffer
           
protected  Treebank treebank
           
protected  java.lang.String treeFileExtension
           
 
Constructor Summary
AbstractDataset()
           
 
Method Summary
abstract  void build()
          Generic method for loading, processing, and writing a dataset.
protected  StringMap buildSplitMap(java.io.File path)
           
 java.util.List<java.lang.String> getFilenames()
          Returns the filenames written by Dataset.build().
 boolean setOptions(StringMap opts)
          Sets options for a dataset.
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

outputFileList

protected final java.util.List<java.lang.String> outputFileList

posMapper

protected Mapper posMapper

posMapOptions

protected java.lang.String posMapOptions

lexMapper

protected Mapper lexMapper

lexMapOptions

protected java.lang.String lexMapOptions

encoding

protected Dataset.Encoding encoding

pathsToData

protected final java.util.List<java.io.File> pathsToData

pathsToMappings

protected final java.util.List<java.io.File> pathsToMappings

splitFilter

protected java.io.FileFilter splitFilter

addDeterminer

protected boolean addDeterminer

removeDashTags

protected boolean removeDashTags

addRoot

protected boolean addRoot

removeEscapeTokens

protected boolean removeEscapeTokens

maxLen

protected int maxLen

morphDelim

protected java.lang.String morphDelim

customTreeVisitor

protected TreeVisitor customTreeVisitor

outFileName

protected java.lang.String outFileName

flatFileName

protected java.lang.String flatFileName

makeFlatFile

protected boolean makeFlatFile

fileNameNormalizer

protected final java.util.regex.Pattern fileNameNormalizer

treebank

protected Treebank treebank

configuredOptions

protected final java.util.Set<java.lang.String> configuredOptions

requiredOptions

protected final java.util.Set<java.lang.String> requiredOptions

toStringBuffer

protected final java.lang.StringBuilder toStringBuffer

treeFileExtension

protected java.lang.String treeFileExtension

options

protected StringMap options
Provides access for sub-classes to the data set parameters

Constructor Detail

AbstractDataset

public AbstractDataset()
Method Detail

build

public abstract void build()
Description copied from interface: Dataset
Generic method for loading, processing, and writing a dataset.

Specified by:
build in interface Dataset

setOptions

public boolean setOptions(StringMap opts)
Description copied from interface: Dataset
Sets options for a dataset.

Specified by:
setOptions in interface Dataset
Parameters:
opts - A map from parameter types defined in ConfigParser to values
Returns:
true if opts contains all required options. false, otherwise.

buildSplitMap

protected StringMap buildSplitMap(java.io.File path)

getFilenames

public java.util.List<java.lang.String> getFilenames()
Description copied from interface: Dataset
Returns the filenames written by Dataset.build().

Specified by:
getFilenames in interface Dataset
Returns:
A collection of filenames

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object


Stanford NLP Group