edu.stanford.nlp.classify
Class LinearClassifier<L,F>

java.lang.Object
  extended by edu.stanford.nlp.classify.LinearClassifier<L,F>
Type Parameters:
L - The type of the labels in the Classifier
F - The type of the features in the Classifier
All Implemented Interfaces:
Classifier<L,F>, ProbabilisticClassifier<L,F>, RVFClassifier<L,F>, java.io.Serializable

public class LinearClassifier<L,F>
extends java.lang.Object
implements ProbabilisticClassifier<L,F>, RVFClassifier<L,F>

Implements a multiclass linear classifier. At classification time this can be any generalized linear model classifier (such as a perceptron, naive logistic regression, SVM).

Author:
Dan Klein, Jenny Finkel, Galen Andrew (converted to arrays and indices), Christopher Manning (most of the printing options), Eric Yeh (save to text file, new constructor w/thresholds), Sarah Spikes (sdspikes@cs.stanford.edu) (Templatization), (nmramesh@cs.stanford.edu) weightsAsMapOfCounters(), Angel Chang (Add functions to get top features, and number of features with weights above a certain threshold)
See Also:
Serialized Form

Field Summary
 boolean intern
           
static java.lang.String TEXT_SERIALIZATION_DELIMITER
           
 
Constructor Summary
LinearClassifier(Counter<? extends Pair<F,L>> weightCounter)
           
LinearClassifier(Counter<? extends Pair<F,L>> weightCounter, Counter<L> thresholdsC)
           
LinearClassifier(double[][] weights, Index<F> featureIndex, Index<L> labelIndex)
           
LinearClassifier(double[][] weights, Index<F> featureIndex, Index<L> labelIndex, double[] thresholds)
           
LinearClassifier(double[] weights, Index<Pair<F,L>> weightIndex)
           
 
Method Summary
 void adaptWeights(Dataset<L,F> adapt, LinearClassifierFactory<L,F> lcf)
           
 L classOf(Datum<L,F> example)
           
 L classOf(RVFDatum<L,F> example)
          Deprecated. 
 void dump()
          Print all features in the classifier and the weight that they assign to each class.
 void dump(java.io.PrintWriter pw)
           
 void dumpSorted()
          Print all features in the classifier and the weight that they assign to each class.
 L experimentalClassOf(Datum<L,F> example)
           
 Index<F> featureIndex()
           
 java.util.Collection<F> features()
           
 int getFeatureCount(double threshold, boolean useMagnitude)
          Returns number of features with weight above a certain threshold (across all labels)
 int getFeatureCount(java.util.Set<L> labels, double threshold, boolean useMagnitude)
          Returns number of features with weight above a certain threshold
protected  int getFeatureCountLabelIndices(java.util.Set<java.lang.Integer> iLabels, double threshold, boolean useMagnitude)
          Returns number of features with weight above a certain threshold
protected  java.util.Set<java.lang.Integer> getLabelIndices(java.util.Set<L> labels)
          Returns indices of labels
 java.util.List<Triple<F,L,java.lang.Double>> getTopFeatures(double threshold, boolean useMagnitude, int numFeatures)
          Returns list of top features with weight above a certain threshold (list is descending and across all labels)
 java.util.List<Triple<F,L,java.lang.Double>> getTopFeatures(java.util.Set<L> labels, double threshold, boolean useMagnitude, int numFeatures, boolean descending)
          Returns list of top features with weight above a certain threshold
protected  java.util.List<Triple<F,L,java.lang.Double>> getTopFeaturesLabelIndices(java.util.Set<java.lang.Integer> iLabels, double threshold, boolean useMagnitude, int numFeatures, boolean descending)
          Returns list of top features with weight above a certain threshold
 void justificationOf(Datum<L,F> example)
           
 void justificationOf(Datum<L,F> example, java.io.PrintWriter pw)
          Print all features active for a particular datum and the weight that the classifier assigns to each class for those features.
 void justificationOf(Datum<L,F> example, java.io.PrintWriter pw, boolean sorted)
          Print all features active for a particular datum and the weight that the classifier assigns to each class for those features.
<T> void
justificationOf(Datum<L,F> example, java.io.PrintWriter pw, Function<F,T> printer)
           
<T> void
justificationOf(Datum<L,F> example, java.io.PrintWriter pw, Function<F,T> printer, boolean sortedByFeature)
          Print all features active for a particular datum and the weight that the classifier assigns to each class for those features.
 void justificationOf(RVFDatum<L,F> example)
          Deprecated. 
 void justificationOf(RVFDatum<L,F> example, java.io.PrintWriter pw)
          Deprecated. 
 Index<L> labelIndex()
           
 java.util.Collection<L> labels()
           
 Counter<L> logProbabilityOf(Datum<L,F> example)
          Returns a counter mapping from each class name to the log probability of that class for a certain example.
 Counter<L> logProbabilityOf(RVFDatum<L,F> example)
          Deprecated. 
 Counter<L> probabilityOf(Datum<L,F> example)
          Returns a counter mapping from each class name to the probability of that class for a certain example.
 Counter<L> probabilityOf(RVFDatum<L,F> example)
          Deprecated. 
static
<L,F> LinearClassifier<L,F>
readClassifier(java.lang.String loadPath)
          Loads a classifier from a file.
 void saveToFilename(java.lang.String file)
          Saves this out to a standard text file, instead of as a serialized Java object.
 double scoreOf(Datum<L,F> example, L label)
          Returns of the score of the Datum for the specified label.
 double scoreOf(RVFDatum<L,F> example, L label)
          Deprecated. 
 Counter<L> scoresOf(Datum<L,F> example)
          Construct a counter with keys the labels of the classifier and values the score (unnormalized log probability) of each class.
 Counter<L> scoresOf(Datum<L,F> example, java.util.Collection<L> possibleLabels)
           
 Counter<L> scoresOf(RVFDatum<L,F> example)
          Deprecated. 
 void setWeights(double[][] newWeights)
           
 java.lang.String toAllWeightsString()
           
 java.lang.String toBiggestWeightFeaturesString(boolean useMagnitude, int numFeatures, boolean printDescending)
          Return a String that prints features with large weights.
 java.lang.String toDistributionString(int treshold)
          Similar to histogram but exact values of the weights to see whether there are many equal weights.
 java.lang.String toHistogramString()
           
 java.lang.String topFeaturesToString(java.util.List<Triple<F,L,java.lang.Double>> topFeatures)
          Returns string representation of a list of top features
 java.lang.String toString()
          Print out a partial representation of a linear classifier.
 java.lang.String toString(java.lang.String style, int param)
          Print out a partial representation of a linear classifier in one of several ways.
 int totalSize()
           
 double weight(F feature, L label)
           
 double[][] weights()
           
 java.util.Map<L,Counter<F>> weightsAsMapOfCounters()
          This method returns a map from each label to a counter of feature weights for that label.
static void writeClassifier(LinearClassifier<?,?> classifier, java.lang.String writePath)
          Convenience wrapper for IOUtils.writeObjectToFile
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

intern

public boolean intern

TEXT_SERIALIZATION_DELIMITER

public static final java.lang.String TEXT_SERIALIZATION_DELIMITER
See Also:
Constant Field Values
Constructor Detail

LinearClassifier

public LinearClassifier(double[][] weights,
                        Index<F> featureIndex,
                        Index<L> labelIndex)

LinearClassifier

public LinearClassifier(double[][] weights,
                        Index<F> featureIndex,
                        Index<L> labelIndex,
                        double[] thresholds)
                 throws java.lang.Exception
Throws:
java.lang.Exception

LinearClassifier

public LinearClassifier(double[] weights,
                        Index<Pair<F,L>> weightIndex)

LinearClassifier

public LinearClassifier(Counter<? extends Pair<F,L>> weightCounter)

LinearClassifier

public LinearClassifier(Counter<? extends Pair<F,L>> weightCounter,
                        Counter<L> thresholdsC)
Method Detail

labels

public java.util.Collection<L> labels()
Specified by:
labels in interface Classifier<L,F>

features

public java.util.Collection<F> features()

labelIndex

public Index<L> labelIndex()

featureIndex

public Index<F> featureIndex()

weight

public double weight(F feature,
                     L label)

scoresOf

public Counter<L> scoresOf(Datum<L,F> example)
Construct a counter with keys the labels of the classifier and values the score (unnormalized log probability) of each class.

Specified by:
scoresOf in interface Classifier<L,F>

scoreOf

public double scoreOf(Datum<L,F> example,
                      L label)
Returns of the score of the Datum for the specified label. Ignores the true label of the Datum.


scoresOf

@Deprecated
public Counter<L> scoresOf(RVFDatum<L,F> example)
Deprecated. 

Construct a counter with keys the labels of the classifier and values the score (unnormalized log probability) of each class for an RVFDatum.

Specified by:
scoresOf in interface RVFClassifier<L,F>

scoreOf

@Deprecated
public double scoreOf(RVFDatum<L,F> example,
                                 L label)
Deprecated. 

Returns the score of the RVFDatum for the specified label. Ignores the true label of the RVFDatum.


probabilityOf

public Counter<L> probabilityOf(Datum<L,F> example)
Returns a counter mapping from each class name to the probability of that class for a certain example. Looking at the the sum of each count v, should be 1.0.

Specified by:
probabilityOf in interface ProbabilisticClassifier<L,F>

probabilityOf

@Deprecated
public Counter<L> probabilityOf(RVFDatum<L,F> example)
Deprecated. 

Returns a counter mapping from each class name to the probability of that class for a certain example. Looking at the the sum of each count v, should be 1.0.


logProbabilityOf

public Counter<L> logProbabilityOf(Datum<L,F> example)
Returns a counter mapping from each class name to the log probability of that class for a certain example. Looking at the the sum of e^v for each count v, should be 1.0.

Specified by:
logProbabilityOf in interface ProbabilisticClassifier<L,F>

logProbabilityOf

@Deprecated
public Counter<L> logProbabilityOf(RVFDatum<L,F> example)
Deprecated. 

Returns a counter for the log probability of each of the classes looking at the the sum of e^v for each count v, should be 1


getLabelIndices

protected java.util.Set<java.lang.Integer> getLabelIndices(java.util.Set<L> labels)
Returns indices of labels

Parameters:
labels - - Set of labels to get indicies
Returns:
Set of indicies

getFeatureCount

public int getFeatureCount(double threshold,
                           boolean useMagnitude)
Returns number of features with weight above a certain threshold (across all labels)

Parameters:
threshold - Threshold above which we will count the feature
useMagnitude - Whether the notion of "large" should ignore the sign of the feature weight.
Returns:
number of features satisfying the specified conditions

getFeatureCount

public int getFeatureCount(java.util.Set<L> labels,
                           double threshold,
                           boolean useMagnitude)
Returns number of features with weight above a certain threshold

Parameters:
labels - Set of labels we care about when counting features Use null to get counts across all labels
threshold - Threshold above which we will count the feature
useMagnitude - Whether the notion of "large" should ignore the sign of the feature weight.
Returns:
number of features satisfying the specified conditions

getFeatureCountLabelIndices

protected int getFeatureCountLabelIndices(java.util.Set<java.lang.Integer> iLabels,
                                          double threshold,
                                          boolean useMagnitude)
Returns number of features with weight above a certain threshold

Parameters:
iLabels - Set of label indices we care about when counting features Use null to get counts across all labels
threshold - Threshold above which we will count the feature
useMagnitude - Whether the notion of "large" should ignore the sign of the feature weight.
Returns:
number of features satisfying the specified conditions

getTopFeatures

public java.util.List<Triple<F,L,java.lang.Double>> getTopFeatures(double threshold,
                                                                   boolean useMagnitude,
                                                                   int numFeatures)
Returns list of top features with weight above a certain threshold (list is descending and across all labels)

Parameters:
threshold - Threshold above which we will count the feature
useMagnitude - Whether the notion of "large" should ignore the sign of the feature weight.
numFeatures - How many top features to return (-1 for unlimited)
Returns:
List of triples indicating feature, label, weight

getTopFeatures

public java.util.List<Triple<F,L,java.lang.Double>> getTopFeatures(java.util.Set<L> labels,
                                                                   double threshold,
                                                                   boolean useMagnitude,
                                                                   int numFeatures,
                                                                   boolean descending)
Returns list of top features with weight above a certain threshold

Parameters:
labels - Set of labels we care about when getting features Use null to get features across all labels
threshold - Threshold above which we will count the feature
useMagnitude - Whether the notion of "large" should ignore the sign of the feature weight.
numFeatures - How many top features to return (-1 for unlimited)
descending - Return weights in descending order
Returns:
List of triples indicating feature, label, weight

getTopFeaturesLabelIndices

protected java.util.List<Triple<F,L,java.lang.Double>> getTopFeaturesLabelIndices(java.util.Set<java.lang.Integer> iLabels,
                                                                                  double threshold,
                                                                                  boolean useMagnitude,
                                                                                  int numFeatures,
                                                                                  boolean descending)
Returns list of top features with weight above a certain threshold

Parameters:
iLabels - Set of label indices we care about when getting features Use null to get features across all labels
threshold - Threshold above which we will count the feature
useMagnitude - Whether the notion of "large" should ignore the sign of the feature weight.
numFeatures - How many top features to return (-1 for unlimited)
descending - Return weights in descending order
Returns:
List of triples indicating feature, label, weight

topFeaturesToString

public java.lang.String topFeaturesToString(java.util.List<Triple<F,L,java.lang.Double>> topFeatures)
Returns string representation of a list of top features

Parameters:
topFeatures - List of triples indicating feature, label, weight
Returns:
String representation of the list of features

toBiggestWeightFeaturesString

public java.lang.String toBiggestWeightFeaturesString(boolean useMagnitude,
                                                      int numFeatures,
                                                      boolean printDescending)
Return a String that prints features with large weights.

Parameters:
useMagnitude - Whether the notion of "large" should ignore the sign of the feature weight.
numFeatures - How many top features to print
printDescending - Print weights in descending order
Returns:
The String representation of features with large weights

toDistributionString

public java.lang.String toDistributionString(int treshold)
Similar to histogram but exact values of the weights to see whether there are many equal weights.

Returns:
A human readable string about the classifier distribution.

totalSize

public int totalSize()

toHistogramString

public java.lang.String toHistogramString()

toString

public java.lang.String toString()
Print out a partial representation of a linear classifier. This just calls toString("WeightHistogram", 0)

Overrides:
toString in class java.lang.Object

toString

public java.lang.String toString(java.lang.String style,
                                 int param)
Print out a partial representation of a linear classifier in one of several ways.

Parameters:
style - Options are: HighWeight: print out the param parameters with largest weights; HighMagnitude: print out the param parameters for which the absolute value of their weight is largest; AllWeights: print out the weights of all features; WeightHistogram: print out a particular hard-coded textual histogram representation of a classifier; WeightDistribution;
param - Determines the number of things printed in certain styles
Throws:
java.lang.IllegalArgumentException - if the style name is unrecognized

toAllWeightsString

public java.lang.String toAllWeightsString()

dump

public void dump()
Print all features in the classifier and the weight that they assign to each class.


dump

public void dump(java.io.PrintWriter pw)

justificationOf

@Deprecated
public void justificationOf(RVFDatum<L,F> example)
Deprecated. 


justificationOf

@Deprecated
public void justificationOf(RVFDatum<L,F> example,
                                       java.io.PrintWriter pw)
Deprecated. 

Print all features active for a particular datum and the weight that the classifier assigns to each class for those features.


justificationOf

public void justificationOf(Datum<L,F> example)

justificationOf

public <T> void justificationOf(Datum<L,F> example,
                                java.io.PrintWriter pw,
                                Function<F,T> printer)

justificationOf

public <T> void justificationOf(Datum<L,F> example,
                                java.io.PrintWriter pw,
                                Function<F,T> printer,
                                boolean sortedByFeature)
Print all features active for a particular datum and the weight that the classifier assigns to each class for those features.

Parameters:
example - The datum for which features are to be printed
pw - Where to print it to
printer - If this is non-null, then it is applied to each feature to convert it to a more readable form
sortedByFeature - Whether to sort by feature names

weightsAsMapOfCounters

public java.util.Map<L,Counter<F>> weightsAsMapOfCounters()
This method returns a map from each label to a counter of feature weights for that label. Useful for feature analysis.

Returns:
a map of counters

justificationOf

public void justificationOf(Datum<L,F> example,
                            java.io.PrintWriter pw)
Print all features active for a particular datum and the weight that the classifier assigns to each class for those features.


dumpSorted

public void dumpSorted()
Print all features in the classifier and the weight that they assign to each class. The feature names are printed in sorted order.


justificationOf

public void justificationOf(Datum<L,F> example,
                            java.io.PrintWriter pw,
                            boolean sorted)
Print all features active for a particular datum and the weight that the classifier assigns to each class for those features. Sorts by feature name if 'sorted' is true.


scoresOf

public Counter<L> scoresOf(Datum<L,F> example,
                           java.util.Collection<L> possibleLabels)

experimentalClassOf

public L experimentalClassOf(Datum<L,F> example)

classOf

public L classOf(Datum<L,F> example)
Specified by:
classOf in interface Classifier<L,F>

classOf

@Deprecated
public L classOf(RVFDatum<L,F> example)
Deprecated. 

Specified by:
classOf in interface RVFClassifier<L,F>

adaptWeights

public void adaptWeights(Dataset<L,F> adapt,
                         LinearClassifierFactory<L,F> lcf)

weights

public double[][] weights()

setWeights

public void setWeights(double[][] newWeights)

readClassifier

public static <L,F> LinearClassifier<L,F> readClassifier(java.lang.String loadPath)
Loads a classifier from a file. Simple convenience wrapper for IOUtils.readFromString.


writeClassifier

public static void writeClassifier(LinearClassifier<?,?> classifier,
                                   java.lang.String writePath)
Convenience wrapper for IOUtils.writeObjectToFile


saveToFilename

public void saveToFilename(java.lang.String file)
Saves this out to a standard text file, instead of as a serialized Java object. NOTE: this currently assumes feature and weights are represented as Strings.

Parameters:
file - String filepath to write out to.


Stanford NLP Group