L
- The type of the labels in the ClassifierF
- The type of the features in the Classifierpublic class LinearClassifier<L,F> extends java.lang.Object implements ProbabilisticClassifier<L,F>, RVFClassifier<L,F>
weightsAsMapOfCounters()
, Angel Chang (Add functions to get top features, and number of features with weights above a certain threshold)Modifier and Type | Field and Description |
---|---|
boolean |
intern |
static java.lang.String |
TEXT_SERIALIZATION_DELIMITER |
Constructor and Description |
---|
LinearClassifier(Counter<? extends Pair<F,L>> weightCounter) |
LinearClassifier(Counter<? extends Pair<F,L>> weightCounter,
Counter<L> thresholdsC) |
LinearClassifier(double[][] weights,
Index<F> featureIndex,
Index<L> labelIndex)
Make a linear classifier from the parameters.
|
LinearClassifier(double[][] weights,
Index<F> featureIndex,
Index<L> labelIndex,
double[] thresholds) |
LinearClassifier(double[] weights,
Index<Pair<F,L>> weightIndex) |
Modifier and Type | Method and Description |
---|---|
void |
adaptWeights(Dataset<L,F> adapt,
LinearClassifierFactory<L,F> lcf) |
L |
classOf(Datum<L,F> example) |
L |
classOf(RVFDatum<L,F> example)
Deprecated.
|
void |
dump()
Print all features in the classifier and the weight that they assign
to each class.
|
void |
dump(java.io.PrintWriter pw)
Print all features in the classifier and the weight that they assign
to each class.
|
void |
dumpSorted()
Print all features in the classifier and the weight that they assign
to each class.
|
Index<F> |
featureIndex() |
java.util.Collection<F> |
features() |
int |
getFeatureCount(double threshold,
boolean useMagnitude)
Returns number of features with weight above a certain threshold
(across all labels).
|
int |
getFeatureCount(java.util.Set<L> labels,
double threshold,
boolean useMagnitude)
Returns number of features with weight above a certain threshold.
|
protected int |
getFeatureCountLabelIndices(java.util.Set<java.lang.Integer> iLabels,
double threshold,
boolean useMagnitude)
Returns number of features with weight above a certain threshold.
|
protected java.util.Set<java.lang.Integer> |
getLabelIndices(java.util.Set<L> labels)
Returns indices of labels
|
java.util.List<Triple<F,L,java.lang.Double>> |
getTopFeatures(double threshold,
boolean useMagnitude,
int numFeatures)
Returns list of top features with weight above a certain threshold
(list is descending and across all labels).
|
java.util.List<Triple<F,L,java.lang.Double>> |
getTopFeatures(java.util.Set<L> labels,
double threshold,
boolean useMagnitude,
int numFeatures,
boolean descending)
Returns list of top features with weight above a certain threshold
|
protected java.util.List<Triple<F,L,java.lang.Double>> |
getTopFeaturesLabelIndices(java.util.Set<java.lang.Integer> iLabels,
double threshold,
boolean useMagnitude,
int numFeatures,
boolean descending)
Returns list of top features with weight above a certain threshold
|
void |
justificationOf(Datum<L,F> example) |
void |
justificationOf(Datum<L,F> example,
java.io.PrintWriter pw)
Print all features active for a particular datum and the weight that
the classifier assigns to each class for those features.
|
void |
justificationOf(Datum<L,F> example,
java.io.PrintWriter pw,
boolean sorted)
Print all features active for a particular datum and the weight that
the classifier assigns to each class for those features.
|
<T> void |
justificationOf(Datum<L,F> example,
java.io.PrintWriter pw,
java.util.function.Function<F,T> printer) |
<T> void |
justificationOf(Datum<L,F> example,
java.io.PrintWriter pw,
java.util.function.Function<F,T> printer,
boolean sortedByFeature)
Print all features active for a particular datum and the weight that
the classifier assigns to each class for those features.
|
Index<L> |
labelIndex() |
java.util.Collection<L> |
labels() |
Counter<L> |
logProbabilityOf(Datum<L,F> example)
Returns a counter mapping from each class name to the log probability of
that class for a certain example.
|
Counter<L> |
logProbabilityOf(int[] features)
Given a datum's features, returns a counter mapping from each
class name to the log probability of that class.
|
Counter<L> |
logProbabilityOf(RVFDatum<L,F> example)
Deprecated.
|
Counter<L> |
probabilityOf(Datum<L,F> example)
Returns a counter mapping from each class name to the probability of
that class for a certain example.
|
Counter<L> |
probabilityOf(int[] features) |
Counter<L> |
probabilityOf(RVFDatum<L,F> example)
Deprecated.
|
static <L,F> LinearClassifier<L,F> |
readClassifier(java.lang.String loadPath)
Loads a classifier from a file.
|
void |
saveToFilename(java.lang.String file)
Saves this out to a standard text file, instead of as a serialized Java object.
|
double |
scoreOf(Datum<L,F> example,
L label)
Returns of the score of the Datum for the specified label.
|
Counter<L> |
scoresOf(Datum<L,F> example)
Construct a counter with keys the labels of the classifier and
values the score (unnormalized log probability) of each class.
|
Counter<L> |
scoresOf(Datum<L,F> example,
java.util.Collection<L> possibleLabels) |
Counter<L> |
scoresOf(int[] features)
Given a datum's features, construct a counter with keys
the labels and values the score (unnormalized log probability)
for each class.
|
Counter<L> |
scoresOf(RVFDatum<L,F> example)
Deprecated.
|
void |
setWeights(double[][] newWeights) |
java.lang.String |
toAllWeightsString() |
java.lang.String |
toBiggestWeightFeaturesString(boolean useMagnitude,
int numFeatures,
boolean printDescending)
Return a String that prints features with large weights.
|
java.lang.String |
toDistributionString(int threshold)
Similar to histogram but exact values of the weights
to see whether there are many equal weights.
|
java.lang.String |
toHistogramString() |
java.lang.String |
topFeaturesToString(java.util.List<Triple<F,L,java.lang.Double>> topFeatures)
Returns string representation of a list of top features
|
java.lang.String |
toString()
Print out a partial representation of a linear classifier.
|
java.lang.String |
toString(java.lang.String style,
int param)
Print out a partial representation of a linear classifier in one of
several ways.
|
int |
totalSize() |
double |
weight(F feature,
L label) |
double[][] |
weights() |
java.util.Map<L,Counter<F>> |
weightsAsMapOfCounters()
This method returns a map from each label to a counter of feature weights for that label.
|
static void |
writeClassifier(LinearClassifier<?,?> classifier,
java.lang.String serializePath)
Convenience wrapper for IOUtils.writeObjectToFile.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
evaluateAccuracy, evaluatePrecisionAndRecall
public boolean intern
public static final java.lang.String TEXT_SERIALIZATION_DELIMITER
public LinearClassifier(double[][] weights, Index<F> featureIndex, Index<L> labelIndex)
weights
- The parameters of the classifier. The first index is the
featureIndex value and second index is the labelIndex value.featureIndex
- An index from F to integers used to index the features in the weights arraylabelIndex
- An index from L to integers used to index the labels in the weights arraypublic LinearClassifier(double[][] weights, Index<F> featureIndex, Index<L> labelIndex, double[] thresholds) throws java.lang.Exception
java.lang.Exception
public java.util.Collection<L> labels()
labels
in interface Classifier<L,F>
public java.util.Collection<F> features()
public Counter<L> scoresOf(Datum<L,F> example)
That is, the scores are assumed to be unnormalized log scores. To convert them into probabilities, you should use a "softmax regression" formulation: If you take the exponential e^x of a score and divide it by the sum of the exponentials for each class (including itself) then this is the probability of the class.
scoresOf
in interface Classifier<L,F>
public Counter<L> scoresOf(int[] features)
public double scoreOf(Datum<L,F> example, L label)
@Deprecated public Counter<L> scoresOf(RVFDatum<L,F> example)
scoresOf
in interface RVFClassifier<L,F>
public Counter<L> probabilityOf(Datum<L,F> example)
probabilityOf
in interface ProbabilisticClassifier<L,F>
@Deprecated public Counter<L> probabilityOf(RVFDatum<L,F> example)
public Counter<L> logProbabilityOf(Datum<L,F> example)
logProbabilityOf
in interface ProbabilisticClassifier<L,F>
public Counter<L> logProbabilityOf(int[] features)
@Deprecated public Counter<L> logProbabilityOf(RVFDatum<L,F> example)
protected java.util.Set<java.lang.Integer> getLabelIndices(java.util.Set<L> labels)
labels
- - Set of labels to get indicespublic int getFeatureCount(double threshold, boolean useMagnitude)
threshold
- Threshold above which we will count the featureuseMagnitude
- Whether the notion of "large" should ignore
the sign of the feature weight.public int getFeatureCount(java.util.Set<L> labels, double threshold, boolean useMagnitude)
labels
- Set of labels we care about when counting features
Use null to get counts across all labelsthreshold
- Threshold above which we will count the featureuseMagnitude
- Whether the notion of "large" should ignore
the sign of the feature weight.protected int getFeatureCountLabelIndices(java.util.Set<java.lang.Integer> iLabels, double threshold, boolean useMagnitude)
iLabels
- Set of label indices we care about when counting features
Use null to get counts across all labelsthreshold
- Threshold above which we will count the featureuseMagnitude
- Whether the notion of "large" should ignore
the sign of the feature weight.public java.util.List<Triple<F,L,java.lang.Double>> getTopFeatures(double threshold, boolean useMagnitude, int numFeatures)
threshold
- Threshold above which we will count the featureuseMagnitude
- Whether the notion of "large" should ignore
the sign of the feature weight.numFeatures
- How many top features to return (-1 for unlimited)public java.util.List<Triple<F,L,java.lang.Double>> getTopFeatures(java.util.Set<L> labels, double threshold, boolean useMagnitude, int numFeatures, boolean descending)
labels
- Set of labels we care about when getting features
Use null to get features across all labelsthreshold
- Threshold above which we will count the featureuseMagnitude
- Whether the notion of "large" should ignore
the sign of the feature weight.numFeatures
- How many top features to return (-1 for unlimited)descending
- Return weights in descending orderprotected java.util.List<Triple<F,L,java.lang.Double>> getTopFeaturesLabelIndices(java.util.Set<java.lang.Integer> iLabels, double threshold, boolean useMagnitude, int numFeatures, boolean descending)
iLabels
- Set of label indices we care about when getting features
Use null to get features across all labelsthreshold
- Threshold above which we will count the featureuseMagnitude
- Whether the notion of "large" should ignore
the sign of the feature weight.numFeatures
- How many top features to return (-1 for unlimited)descending
- Return weights in descending orderpublic java.lang.String topFeaturesToString(java.util.List<Triple<F,L,java.lang.Double>> topFeatures)
topFeatures
- List of triples indicating feature, label, weightpublic java.lang.String toBiggestWeightFeaturesString(boolean useMagnitude, int numFeatures, boolean printDescending)
useMagnitude
- Whether the notion of "large" should ignore
the sign of the feature weight.numFeatures
- How many top features to printprintDescending
- Print weights in descending orderpublic java.lang.String toDistributionString(int threshold)
public int totalSize()
public java.lang.String toHistogramString()
public java.lang.String toString()
toString
in class java.lang.Object
public java.lang.String toString(java.lang.String style, int param)
style
- Options are:
HighWeight: print out the param parameters with largest weights;
HighMagnitude: print out the param parameters for which the absolute
value of their weight is largest;
AllWeights: print out the weights of all features;
WeightHistogram: print out a particular hard-coded textual histogram
representation of a classifier;
WeightDistribution;param
- Determines the number of things printed in certain stylesjava.lang.IllegalArgumentException
- if the style name is unrecognizedpublic java.lang.String toAllWeightsString()
public void dump()
public void dump(java.io.PrintWriter pw)
public void dumpSorted()
public void justificationOf(Datum<L,F> example, java.io.PrintWriter pw)
public void justificationOf(Datum<L,F> example, java.io.PrintWriter pw, boolean sorted)
public <T> void justificationOf(Datum<L,F> example, java.io.PrintWriter pw, java.util.function.Function<F,T> printer)
public <T> void justificationOf(Datum<L,F> example, java.io.PrintWriter pw, java.util.function.Function<F,T> printer, boolean sortedByFeature)
example
- The datum for which features are to be printedpw
- Where to print it toprinter
- If this is non-null, then it is applied to each
feature to convert it to a more readable formsortedByFeature
- Whether to sort by feature namespublic java.util.Map<L,Counter<F>> weightsAsMapOfCounters()
@Deprecated public L classOf(RVFDatum<L,F> example)
classOf
in interface RVFClassifier<L,F>
public double[][] weights()
public void setWeights(double[][] newWeights)
public static <L,F> LinearClassifier<L,F> readClassifier(java.lang.String loadPath)
public static void writeClassifier(LinearClassifier<?,?> classifier, java.lang.String serializePath)
public void saveToFilename(java.lang.String file)
file
- String filepath to write out to.