|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.stanford.nlp.classify.GeneralDataset<L,F>
edu.stanford.nlp.classify.RVFDataset<L,F>
L
- The type of the labels in the DatasetF
- The type of the features in the Datasetpublic class RVFDataset<L,F>
An interfacing class for ClassifierFactory
that incrementally builds
a more memory-efficient representation of a List
of RVFDatum
objects for the purposes of training a Classifier
with a
ClassifierFactory
.
Field Summary |
---|
Fields inherited from class edu.stanford.nlp.classify.GeneralDataset |
---|
data, featureIndex, labelIndex, labels, size |
Constructor Summary | |
---|---|
RVFDataset()
|
|
RVFDataset(Index<F> featureIndex,
Index<L> labelIndex)
|
|
RVFDataset(Index<L> labelIndex,
int[] labels,
Index<F> featureIndex,
int[][] data,
double[][] values)
Constructor that fully specifies a Dataset. |
|
RVFDataset(int numDatums)
|
|
RVFDataset(int numDatums,
Index<F> featureIndex,
Index<L> labelIndex)
|
Method Summary | |
---|---|
void |
add(Datum<L,F> d)
|
void |
add(Datum<L,F> d,
String src,
String id)
|
void |
applyFeatureCountThreshold(int k)
Applies a feature count threshold to the RVFDataset. |
void |
applyFeatureMaxCountThreshold(int k)
Applies a feature max count threshold to the RVFDataset. |
void |
clear()
Resets the Dataset so that it is empty and ready to collect data. |
void |
clear(int numDatums)
Resets the Dataset so that it is empty and ready to collect data. |
void |
ensureRealValues()
Checks if the dataset has any unbounded values. |
RVFDatum<L,F> |
getDatum(int index)
|
RVFDatum<L,F> |
getRVFDatum(int index)
|
String |
getRVFDatumId(int index)
|
String |
getRVFDatumSource(int index)
|
double[][] |
getValuesArray()
|
protected void |
initialize(int numDatums)
This method takes care of resetting values of the dataset such that it is empty with an initial capacity of numDatums Should be accessed only by appropriate methods within the class, such as clear(), which take care of other parts of the emptying of data |
Iterator<RVFDatum<L,F>> |
iterator()
|
static void |
main(String[] args)
|
void |
printFullFeatureMatrix(PrintWriter pw)
prints the full feature matrix in tab-delimited form. |
void |
printFullFeatureMatrixWithValues(PrintWriter pw)
Modification of printFullFeatureMatrix to correct bugs & print values (Rajat). |
void |
printSparseFeatureMatrix()
Prints the sparse feature matrix using printSparseFeatureMatrix(PrintWriter) to System.out . |
void |
printSparseFeatureMatrix(PrintWriter pw)
Prints a sparse feature matrix representation of the Dataset. |
void |
printSparseFeatureValues(int datumNo,
PrintWriter pw)
Prints a sparse feature-value output of the Dataset. |
void |
printSparseFeatureValues(PrintWriter pw)
Prints a sparse feature-value output of the Dataset. |
void |
randomize(int randomSeed)
Randomizes the data array in place Needs to be redefined here because we need to randomize the values as well |
void |
readSVMLightFormat(File file)
Read SVM-light formatted data into this dataset. |
static RVFDataset<String,String> |
readSVMLightFormat(String filename)
Constructs a Dataset by reading in a file in SVM light format. |
static RVFDataset<String,String> |
readSVMLightFormat(String filename,
Index<String> featureIndex,
Index<String> labelIndex)
Constructs a Dataset by reading in a file in SVM light format. |
static RVFDataset<String,String> |
readSVMLightFormat(String filename,
List<String> lines)
Constructs a Dataset by reading in a file in SVM light format. |
RVFDataset<L,F> |
scaleDataset(RVFDataset<L,F> dataset)
Scales the values of each feature in each linearly using the min and max values found in the training set. |
RVFDataset<L,F> |
scaleDatasetGaussian(RVFDataset<L,F> dataset)
|
RVFDatum<L,F> |
scaleDatum(RVFDatum<L,F> datum)
Scales the values of each feature linearly using the min and max values found in the training set. |
RVFDatum<L,F> |
scaleDatumGaussian(RVFDatum<L,F> datum)
|
void |
scaleFeatures()
Scales feature values linearly such that each feature value lies between 0 and 1. |
void |
scaleFeaturesGaussian()
|
void |
selectFeaturesFromSet(Set<F> featureSet)
Removes all features from the dataset that are not in featureSet. |
Pair<GeneralDataset<L,F>,GeneralDataset<L,F>> |
split(double percentDev)
|
Pair<GeneralDataset<L,F>,GeneralDataset<L,F>> |
split(int start,
int end)
|
void |
summaryStatistics()
Prints some summary statistics to stderr for the Dataset. |
static RVFDatum<String,String> |
svmLightLineToRVFDatum(String l)
|
String |
toString()
|
String |
toSummaryString()
|
void |
writeSVMLightFormat(File file)
Write the dataset in SVM-light format to the file. |
void |
writeSVMLightFormat(PrintWriter writer)
|
Methods inherited from class edu.stanford.nlp.classify.GeneralDataset |
---|
addAll, featureIndex, getDataArray, getFeatureCounts, getLabelsArray, labelIndex, labelIterator, makeSvmLabelMap, mapDataset, mapDataset, mapDatum, numClasses, numFeatures, numFeatureTokens, numFeatureTypes, printSVMLightFormat, printSVMLightFormat, sampleDataset, size, trimData, trimLabels, trimToSize, trimToSize, trimToSize |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public RVFDataset()
public RVFDataset(int numDatums, Index<F> featureIndex, Index<L> labelIndex)
public RVFDataset(Index<F> featureIndex, Index<L> labelIndex)
public RVFDataset(int numDatums)
public RVFDataset(Index<L> labelIndex, int[] labels, Index<F> featureIndex, int[][] data, double[][] values)
Method Detail |
---|
public Pair<GeneralDataset<L,F>,GeneralDataset<L,F>> split(double percentDev)
split
in class GeneralDataset<L,F>
public void scaleFeaturesGaussian()
public void scaleFeatures()
public void ensureRealValues()
public RVFDataset<L,F> scaleDataset(RVFDataset<L,F> dataset)
dataset
-
public RVFDatum<L,F> scaleDatum(RVFDatum<L,F> datum)
datum
-
public RVFDataset<L,F> scaleDatasetGaussian(RVFDataset<L,F> dataset)
public RVFDatum<L,F> scaleDatumGaussian(RVFDatum<L,F> datum)
public Pair<GeneralDataset<L,F>,GeneralDataset<L,F>> split(int start, int end)
split
in class GeneralDataset<L,F>
public void add(Datum<L,F> d)
add
in class GeneralDataset<L,F>
public void add(Datum<L,F> d, String src, String id)
public RVFDatum<L,F> getDatum(int index)
getDatum
in class GeneralDataset<L,F>
public RVFDatum<L,F> getRVFDatum(int index)
getRVFDatum
in class GeneralDataset<L,F>
public String getRVFDatumSource(int index)
public String getRVFDatumId(int index)
public void clear()
clear
in class GeneralDataset<L,F>
public void clear(int numDatums)
clear
in class GeneralDataset<L,F>
numDatums
- initial capacity of datasetprotected void initialize(int numDatums)
GeneralDataset
initialize
in class GeneralDataset<L,F>
numDatums
- initial capacity of datasetpublic void summaryStatistics()
summaryStatistics
in class GeneralDataset<L,F>
public void printFullFeatureMatrix(PrintWriter pw)
public void printFullFeatureMatrixWithValues(PrintWriter pw)
public static RVFDataset<String,String> readSVMLightFormat(String filename)
public static RVFDataset<String,String> readSVMLightFormat(String filename, List<String> lines)
public static RVFDataset<String,String> readSVMLightFormat(String filename, Index<String> featureIndex, Index<String> labelIndex)
public void selectFeaturesFromSet(Set<F> featureSet)
featureSet
- public void applyFeatureCountThreshold(int k)
applyFeatureCountThreshold
in class GeneralDataset<L,F>
public void applyFeatureMaxCountThreshold(int k)
applyFeatureMaxCountThreshold
in class GeneralDataset<L,F>
public static RVFDatum<String,String> svmLightLineToRVFDatum(String l)
public void readSVMLightFormat(File file)
file
- The file from which the data should be read.public void writeSVMLightFormat(File file) throws FileNotFoundException
readSVMLightFormat(File)
.
file
- The location where the dataset should be written.
FileNotFoundException
public void writeSVMLightFormat(PrintWriter writer)
public void printSparseFeatureMatrix()
printSparseFeatureMatrix(PrintWriter)
to System.out
.
public void printSparseFeatureMatrix(PrintWriter pw)
Object.toString()
representations of features.
public void printSparseFeatureValues(PrintWriter pw)
Object.toString()
representations of features. This is probably
what you want for RVFDataset since the above two methods seem useless and
unused.
public void printSparseFeatureValues(int datumNo, PrintWriter pw)
Object.toString()
representations of features. This is probably
what you want for RVFDataset since the above two methods seem useless and
unused.
public static void main(String[] args)
public double[][] getValuesArray()
getValuesArray
in class GeneralDataset<L,F>
public String toString()
toString
in class Object
public String toSummaryString()
public Iterator<RVFDatum<L,F>> iterator()
iterator
in interface Iterable<RVFDatum<L,F>>
iterator
in class GeneralDataset<L,F>
public void randomize(int randomSeed)
randomize
in class GeneralDataset<L,F>
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |