mark.nlp.data
Class DataSet
java.lang.Object
|
+--mark.nlp.data.DataSet
- public class DataSet
- extends java.lang.Object
A DataSet stores a vocabulary, priors, and a set of
sufficient statistics necessarly to describe the data in a dataset.
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
fVocabulary
public ObjectMap fVocabulary
fCounts
public int[] fCounts
fPriors
public double[] fPriors
fInstanceBags
public SparseBagInstance[] fInstanceBags
fCatBags
public Bag[] fCatBags
fCorpusBag
public Bag fCorpusBag
fCatCounters
public BagCorpusCounter[] fCatCounters
fCorpusCounter
public BagCorpusCounter fCorpusCounter
DataSet
public DataSet(int numCategories,
TextInstance[] textInstances,
java.lang.String scannerName)
throws java.lang.Exception
- Constructs a DataSet.
- Parameters:
numCategories - the number of categories.textInstances - the text instances that comprise the data set.scannerName - the name of the scanner to use in tokenizing the
text instances.
DataSet
public DataSet(int numCategories,
ObjectMap vocabulary,
SparseBagInstance[] sparseBagInstances)
- Constructs a DataSet.
- Parameters:
numCategories - the number of categories.vocabulary - the vocabulary.
bagize
public static SparseBagInstance[] bagize(TextInstance[] textInstances,
ObjectMap vocabulary,
java.lang.String scannerName)
throws java.lang.Exception
- Generates a list of bags from a list of text instances. Each bag
contains the tokens in the corresponding text instance.
- Parameters:
textInstances - the list of text instances.vocabulary - the vocabulary.
java.lang.Exception