mark.nlp.data
Class DataSet

java.lang.Object
  |
  +--mark.nlp.data.DataSet

public class DataSet
extends java.lang.Object

A DataSet stores a vocabulary, priors, and a set of sufficient statistics necessarly to describe the data in a dataset.


Field Summary
 Bag[] fCatBags
           
 BagCorpusCounter[] fCatCounters
           
 Bag fCorpusBag
           
 BagCorpusCounter fCorpusCounter
           
 int[] fCounts
           
 SparseBagInstance[] fInstanceBags
           
 double[] fPriors
           
 ObjectMap fVocabulary
           
 
Constructor Summary
DataSet(int numCategories, ObjectMap vocabulary, SparseBagInstance[] sparseBagInstances)
          Constructs a DataSet.
DataSet(int numCategories, TextInstance[] textInstances, java.lang.String scannerName)
          Constructs a DataSet.
 
Method Summary
static SparseBagInstance[] bagize(TextInstance[] textInstances, ObjectMap vocabulary, java.lang.String scannerName)
          Generates a list of bags from a list of text instances.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

fVocabulary

public ObjectMap fVocabulary

fCounts

public int[] fCounts

fPriors

public double[] fPriors

fInstanceBags

public SparseBagInstance[] fInstanceBags

fCatBags

public Bag[] fCatBags

fCorpusBag

public Bag fCorpusBag

fCatCounters

public BagCorpusCounter[] fCatCounters

fCorpusCounter

public BagCorpusCounter fCorpusCounter
Constructor Detail

DataSet

public DataSet(int numCategories,
               TextInstance[] textInstances,
               java.lang.String scannerName)
        throws java.lang.Exception
Constructs a DataSet.

Parameters:
numCategories - the number of categories.
textInstances - the text instances that comprise the data set.
scannerName - the name of the scanner to use in tokenizing the text instances.

DataSet

public DataSet(int numCategories,
               ObjectMap vocabulary,
               SparseBagInstance[] sparseBagInstances)
Constructs a DataSet.

Parameters:
numCategories - the number of categories.
vocabulary - the vocabulary.
Method Detail

bagize

public static SparseBagInstance[] bagize(TextInstance[] textInstances,
                                         ObjectMap vocabulary,
                                         java.lang.String scannerName)
                                  throws java.lang.Exception
Generates a list of bags from a list of text instances. Each bag contains the tokens in the corresponding text instance.

Parameters:
textInstances - the list of text instances.
vocabulary - the vocabulary.
java.lang.Exception