edu.stanford.nlp.stats
Class Counters

java.lang.Object
  extended by edu.stanford.nlp.stats.Counters

public class Counters
extends Object

Static methods for operating on Counters.

Author:
Galen Andrew (galand@cs.stanford.edu), Jeff Michels (jmichels@stanford.edu)

Method Summary
static
<E> Counter<E>
absoluteDifference(GenericCounter<E> c1, GenericCounter<E> c2)
          Returns |c1 - c2|.
static
<E> Counter<E>
average(GenericCounter<E> c1, GenericCounter<E> c2)
          Returns a new Counter with counts averaged from the two given Counters.
static
<E> double
cosine(GenericCounter<E> c1, GenericCounter<E> c2)
           
static
<E> Counter<E>
createCounterFromCollection(Collection<E> l)
           
static
<E> Counter<E>
createCounterFromList(List<E> l)
           
static
<E> double
crossEntropy(GenericCounter<E> from, Counter<E> to)
          Note that this implementation doesn't normalize the "from" Counter.
static
<E> double
crossEntropy(GenericCounter<E> from, GenericCounter<E> to)
          Note that this implementation doesn't normalize the "from" Counter.
static Counter deserializeCounter(String filename)
           
static
<E> Counter<E>
division(GenericCounter<E> c1, GenericCounter<E> c2)
          Returns c1 divided by c2.
static
<E> double
dotProduct(GenericCounter<E> c1, GenericCounter<E> c2)
          Returns the product of c1 and c2.
static
<E> double
entropy(GenericCounter<E> c)
          Calculates the entropy of the given counter (in bits).
static
<T> Counter<T>
exp(Counter<T> c)
           
static
<E> Counter<Double>
getCountCounts(GenericCounter<E> c)
           
static
<E> Counter<E>
intersection(GenericCounter<E> c1, GenericCounter<E> c2)
          Returns a counter that is the intersection of c1 and c2.
static
<E> double
jaccardCoefficient(GenericCounter<E> c1, GenericCounter<E> c2)
          Returns the Jaccard Coefficient of the two counters.
static
<E> double
jensenShannonDivergence(GenericCounter<E> c1, GenericCounter<E> c2)
          Calculates the Jensen-Shannon divergence between the two counters.
static
<E> double
klDivergence(GenericCounter<E> from, GenericCounter<E> to)
          Calculates the KL divergence between the two counters.
static
<E> Counter<E>
L2Normalize(GenericCounter<E> c)
          L2 normalize a counter.
static
<E> Counter<E>
linearCombination(GenericCounter<E> c1, double w1, GenericCounter<E> c2, double w2)
          Returns a Counter which is a weighted average of c1 and c2.
static
<E> Counter<E>
loadCounter(String filename, Class c)
          Loads a Counter from a text file.
static IntCounter loadIntCounter(String filename, Class c)
          Loads a Counter from a text file.
static
<E> Counter<E>
perturbCounts(GenericCounter<E> c, Random random, double p)
           
static
<T> Counter<T>
pow(Counter<T> c, double temp)
           
static
<E> void
printCounterComparison(GenericCounter<E> a, GenericCounter<E> b)
          Great for debugging.
static
<E> void
printCounterComparison(GenericCounter<E> a, GenericCounter<E> b, PrintStream out)
          Great for debugging.
static
<E> void
printCounterSortedByKeys(GenericCounter<E> c)
           
static
<E> Counter<E>
product(GenericCounter<E> c1, GenericCounter<E> c2)
          Returns the product of c1 and c2.
static
<E> Object
restrictedArgMax(Counter<E> c, Collection<E> restriction)
           
static
<T> T
sample(Counter<T> c, Random rand)
          Assumes c is normalized.
static
<E> void
saveCounter(GenericCounter<E> c, String filename)
          Saves a Counter to a text file.
static
<E> Counter<E>
scale(GenericCounter<E> c, double s)
          Returns a new Counter which is scaled by the given scale factor.
static
<T1,T2> TwoDimensionalCounter<T1,T2>
scale(TwoDimensionalCounter<T1,T2> c, double d)
          Creates a new TwoDimensionalCounter where all the counts are scaled by d.
static void serializeCounter(GenericCounter c, String filename)
           
static
<E> double
skewDivergence(GenericCounter<E> c1, GenericCounter<E> c2, double skew)
          Calculates the skew divergence between the two counters.
static
<E> List<E>
sortedKeys(Counter<E> x)
           
static
<E> String
toBiggestValuesFirstString(Counter<E> c)
           
static
<E> String
toBiggestValuesFirstString(Counter<E> c, int k)
           
static
<T> Counter<T>
toCounter(double[] counts, Index<T> index)
           
static
<E> PriorityQueue
toPriorityQueue(GenericCounter<E> c)
          Returns a PriorityQueue of the c where the score of the object is its priority.
static
<E> List<E>
toSortedList(GenericCounter<E> c)
          A List of the keys in c, sorted from highest count to lowest.
static String toVerticalString(Counter c)
           
static String toVerticalString(Counter c, int k)
           
static
<E> Counter<E>
union(GenericCounter<E> c1, GenericCounter<E> c2)
          Returns a Counter that is the union of the two Counters passed in (counts are added).
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

union

public static <E> Counter<E> union(GenericCounter<E> c1,
                                   GenericCounter<E> c2)
Returns a Counter that is the union of the two Counters passed in (counts are added).

Parameters:
c1 -
c2 -
Returns:
A Counter that is the union of the two Counters passed in (counts are added).

intersection

public static <E> Counter<E> intersection(GenericCounter<E> c1,
                                          GenericCounter<E> c2)
Returns a counter that is the intersection of c1 and c2. If both c1 and c2 contain a key, the min of the two counts is used.

Parameters:
c1 -
c2 -
Returns:
A counter that is the intersection of c1 and c2

jaccardCoefficient

public static <E> double jaccardCoefficient(GenericCounter<E> c1,
                                            GenericCounter<E> c2)
Returns the Jaccard Coefficient of the two counters. Calculated as |c1 intersect c2| / ( |c1| + |c2| - |c1 intersect c2|

Parameters:
c1 -
c2 -
Returns:
The Jaccard Coefficient of the two counters

product

public static <E> Counter<E> product(GenericCounter<E> c1,
                                     GenericCounter<E> c2)
Returns the product of c1 and c2.

Parameters:
c1 -
c2 -
Returns:
The product of c1 and c2.

dotProduct

public static <E> double dotProduct(GenericCounter<E> c1,
                                    GenericCounter<E> c2)
Returns the product of c1 and c2.

Parameters:
c1 -
c2 -
Returns:
The product of c1 and c2.

absoluteDifference

public static <E> Counter<E> absoluteDifference(GenericCounter<E> c1,
                                                GenericCounter<E> c2)
Returns |c1 - c2|.

Parameters:
c1 -
c2 -
Returns:
The difference between sets c1 and c2.

division

public static <E> Counter<E> division(GenericCounter<E> c1,
                                      GenericCounter<E> c2)
Returns c1 divided by c2. Note that this can create NaN if c1 has non-zero counts for keys that c2 has zero counts.

Parameters:
c1 -
c2 -
Returns:
c1 divided by c2.

entropy

public static <E> double entropy(GenericCounter<E> c)
Calculates the entropy of the given counter (in bits). This method internally uses normalized counts (so they sum to one), but the value returned is meaningless if some of the counts are negative.

Returns:
The entropy of the given counter (in bits)

crossEntropy

public static <E> double crossEntropy(GenericCounter<E> from,
                                      GenericCounter<E> to)
Note that this implementation doesn't normalize the "from" Counter. It does, however, normalize the "to" Counter. Result is meaningless if any of the counts are negative.

Returns:
The cross entropy of H(from, to)

crossEntropy

public static <E> double crossEntropy(GenericCounter<E> from,
                                      Counter<E> to)
Note that this implementation doesn't normalize the "from" Counter. Result is meaningless if any of the counts are negative.

Returns:
The cross entropy of H(from, to)

klDivergence

public static <E> double klDivergence(GenericCounter<E> from,
                                      GenericCounter<E> to)
Calculates the KL divergence between the two counters. That is, it calculates KL(from || to). This method internally uses normalized counts (so they sum to one), but the value returned is meaningless if any of the counts are negative. In other words, how well can c1 be represented by c2. if there is some value in c1 that gets zero prob in c2, then return positive infinity.

Parameters:
from -
to -
Returns:
The KL divergence between the distributions

jensenShannonDivergence

public static <E> double jensenShannonDivergence(GenericCounter<E> c1,
                                                 GenericCounter<E> c2)
Calculates the Jensen-Shannon divergence between the two counters. That is, it calculates 1/2 [KL(c1 || avg(c1,c2)) + KL(c2 || avg(c1,c2))] .

Parameters:
c1 -
c2 -
Returns:
The Jensen-Shannon divergence between the distributions

skewDivergence

public static <E> double skewDivergence(GenericCounter<E> c1,
                                        GenericCounter<E> c2,
                                        double skew)
Calculates the skew divergence between the two counters. That is, it calculates KL(c1 || (c2*skew + c1*(1-skew))) . In other words, how well can c1 be represented by a "smoothed" c2.

Parameters:
c1 -
c2 -
skew -
Returns:
The skew divergence between the distributions

L2Normalize

public static <E> Counter<E> L2Normalize(GenericCounter<E> c)
L2 normalize a counter.

Parameters:
c - the GenericCounter to be L2 normalized.

cosine

public static <E> double cosine(GenericCounter<E> c1,
                                GenericCounter<E> c2)

average

public static <E> Counter<E> average(GenericCounter<E> c1,
                                     GenericCounter<E> c2)
Returns a new Counter with counts averaged from the two given Counters. The average Counter will contain the union of keys in both source Counters, and each count will be the average of the two source counts for that key, where as usual a missing count in one Counter is treated as count 0.

Returns:
A new counter with counts that are the mean of the resp. counts in the given counters.

linearCombination

public static <E> Counter<E> linearCombination(GenericCounter<E> c1,
                                               double w1,
                                               GenericCounter<E> c2,
                                               double w2)
Returns a Counter which is a weighted average of c1 and c2. Counts from c1 are weighted with weight w1 and counts from c2 are weighted with w2.


perturbCounts

public static <E> Counter<E> perturbCounts(GenericCounter<E> c,
                                           Random random,
                                           double p)

createCounterFromList

public static <E> Counter<E> createCounterFromList(List<E> l)

createCounterFromCollection

public static <E> Counter<E> createCounterFromCollection(Collection<E> l)

toSortedList

public static <E> List<E> toSortedList(GenericCounter<E> c)
A List of the keys in c, sorted from highest count to lowest.

Parameters:
c -
Returns:
A List of the keys in c, sorted from highest count to lowest.

toPriorityQueue

public static <E> PriorityQueue toPriorityQueue(GenericCounter<E> c)
Returns a PriorityQueue of the c where the score of the object is its priority.


printCounterComparison

public static <E> void printCounterComparison(GenericCounter<E> a,
                                              GenericCounter<E> b)
Great for debugging.

Parameters:
a -
b -

printCounterComparison

public static <E> void printCounterComparison(GenericCounter<E> a,
                                              GenericCounter<E> b,
                                              PrintStream out)
Great for debugging.

Parameters:
a -
b -

getCountCounts

public static <E> Counter<Double> getCountCounts(GenericCounter<E> c)

scale

public static <E> Counter<E> scale(GenericCounter<E> c,
                                   double s)
Returns a new Counter which is scaled by the given scale factor.


printCounterSortedByKeys

public static <E> void printCounterSortedByKeys(GenericCounter<E> c)

loadCounter

public static <E> Counter<E> loadCounter(String filename,
                                         Class c)
                              throws Exception
Loads a Counter from a text file. File must have the format of one key/count pair per line, separated by whitespace.

Parameters:
filename - the path to the file to load the Counter from
c - the Class to instantiate each member of the set. Must have a String constructor.
Returns:
The counter loaded from the file.
Throws:
Exception

loadIntCounter

public static IntCounter loadIntCounter(String filename,
                                        Class c)
                                 throws Exception
Loads a Counter from a text file. File must have the format of one key/count pair per line, separated by whitespace.

Parameters:
filename - the path to the file to load the Counter from
c - the Class to instantiate each member of the set. Must have a String constructor.
Returns:
The counter loaded from the file.
Throws:
Exception

saveCounter

public static <E> void saveCounter(GenericCounter<E> c,
                                   String filename)
                        throws IOException
Saves a Counter to a text file. Counter written as one key/count pair per line, separated by whitespace.

Parameters:
c -
filename -
Throws:
IOException

serializeCounter

public static void serializeCounter(GenericCounter c,
                                    String filename)
                             throws IOException
Throws:
IOException

deserializeCounter

public static Counter deserializeCounter(String filename)
                                  throws Exception
Throws:
Exception

sortedKeys

public static <E> List<E> sortedKeys(Counter<E> x)

toBiggestValuesFirstString

public static <E> String toBiggestValuesFirstString(Counter<E> c)

toBiggestValuesFirstString

public static <E> String toBiggestValuesFirstString(Counter<E> c,
                                                    int k)

toVerticalString

public static String toVerticalString(Counter c)

toVerticalString

public static String toVerticalString(Counter c,
                                      int k)

restrictedArgMax

public static <E> Object restrictedArgMax(Counter<E> c,
                                          Collection<E> restriction)
Parameters:
c -
restriction -
Returns:
Returns the maximum element of c that is within the restriction Collection

toCounter

public static <T> Counter<T> toCounter(double[] counts,
                                       Index<T> index)

scale

public static <T1,T2> TwoDimensionalCounter<T1,T2> scale(TwoDimensionalCounter<T1,T2> c,
                                                         double d)
Creates a new TwoDimensionalCounter where all the counts are scaled by d. Internally, uses Counters.scale();

Parameters:
c -
d -
Returns:
The TwoDimensionalCounter

sample

public static <T> T sample(Counter<T> c,
                           Random rand)
Assumes c is normalized.

Parameters:
c -
rand -
Returns:
A sample from c

pow

public static <T> Counter<T> pow(Counter<T> c,
                                 double temp)

exp

public static <T> Counter<T> exp(Counter<T> c)


Stanford NLP Group