edu.stanford.nlp.stats
Class Distribution<E>

java.lang.Object
  extended by edu.stanford.nlp.stats.Distribution<E>
All Implemented Interfaces:
ProbabilityDistribution<E>, Sampler<E>, java.io.Serializable

public class Distribution<E>
extends java.lang.Object
implements Sampler<E>, ProbabilityDistribution<E>

Immutable class for representing normalized, smoothed discrete distributions from Counters. Smoothed counters reserve probability mass for unseen items, so queries for the probability of unseen items will return a small positive amount. Normalization is L1 normalization: totalCount() should always return 1.

A Counter passed into a constructor is copied. This class is Serializable.

Author:
Galen Andrew (galand@cs.stanford.edu), Sebastian Pado
See Also:
Serialized Form

Field Summary
protected  Counter<E> counter
           
 
Method Summary
static
<E> Distribution<E>
absolutelyDiscountedDistribution(Counter<E> counter, int numberOfKeys, double discount)
           
 void addToKeySet(E o)
          Insures that object is in keyset (with possibly zero value)
 E argmax()
           
 boolean containsKey(E key)
           
static
<E> Distribution<E>
distributionFromLogisticCounter(Counter<E> cntr)
          Maps a counter representing the linear weights of a multiclass logistic regression model to the probabilities of each class.
static
<E> Distribution<E>
distributionWithDirichletPrior(Counter<E> c, Distribution<E> prior, double weight)
          Returns a Distribution that uses prior as a Dirichlet prior weighted by weight.
 E drawSample()
          Exactly the same as sampleFrom(), needed for the Sampler interface.
 E drawSample(java.util.Random random)
          A method to draw a sample, providing an own random number generator.
static
<E> Distribution<E>
dynamicCounterWithDirichletPrior(Counter<E> c, Distribution<E> prior, double weight)
          Like normalizedCounterWithDirichletPrior except probabilities are computed dynamically from the counter and prior instead of all at once up front.
 boolean equals(Distribution<E> distribution)
           
 boolean equals(java.lang.Object o)
           
 double getCount(E key)
          Returns the current count for the given key, which is 0 if it hasn't been seen before.
 Counter<E> getCounter()
           
static
<E> Distribution<E>
getDistribution(Counter<E> counter)
          Creates a Distribution from the given counter.
static
<E> Distribution<E>
getDistributionFromLogValues(Counter<E> counter)
          Creates a Distribution from the given counter, ie makes an internal copy of the counter and divides all counts by the total count.
static
<E> Distribution<E>
getDistributionFromPartiallySpecifiedCounter(Counter<E> c, int numKeys)
          Assuming that c has a total count < 1, returns a new Distribution using the counts in c as probabilities.
static
<E> Distribution<E>
getDistributionWithReservedMass(Counter<E> counter, double reservedMass)
           
 int getNumberOfKeys()
           
static
<E> Distribution<E>
getPerturbedDistribution(Counter<E> wordCounter, java.util.Random r)
           
static
<E> Distribution<E>
getPerturbedUniformDistribution(java.util.Set<E> s, java.util.Random r)
           
 double getReservedMass()
           
static
<E> Distribution<E>
getUniformDistribution(java.util.Set<E> s)
           
static
<E> Distribution<E>
goodTuringSmoothedCounter(Counter<E> counter, int numberOfKeys)
          Creates a Good-Turing smoothed Distribution from the given counter.
static
<E> Distribution<E>
goodTuringWithExplicitUnknown(Counter<E> counter, E UNK)
          Creates a Good-Turing smoothed Distribution from the given counter without creating any reserved mass-- instead, the special object UNK in the counter is assumed to be the count of "UNSEEN" items.
 int hashCode()
           
 java.util.Set<E> keySet()
           
static
<E> Distribution<E>
laplaceSmoothedDistribution(Counter<E> counter, int numberOfKeys)
          Creates an Laplace smoothed Distribution from the given counter, ie adds one count to every item, including unseen ones, and divides by the total count.
static
<E> Distribution<E>
laplaceSmoothedDistribution(Counter<E> counter, int numberOfKeys, double lambda)
          Creates a smoothed Distribution using Lidstone's law, ie adds lambda (typically between 0 and 1) to every item, including unseen ones, and divides by the total count.
static
<E> Distribution<E>
laplaceWithExplicitUnknown(Counter<E> counter, double lambda, E UNK)
          Creates a smoothed Distribution with Laplace smoothing, but assumes an explicit count of "UNKNOWN" items.
 double logProbabilityOf(E key)
          Returns the natural logarithm of the object's probability
static void main(java.lang.String[] args)
          For internal testing purposes only.
 double probabilityOf(E key)
          Returns the normalized count of the given object.
 E sampleFrom()
          Returns an object sampled from the distribution using Math.random().
 E sampleFrom(java.util.Random random)
          Returns an object sampled from the distribution using a self-provided random number generator.
static
<E> Distribution<E>
simpleGoodTuring(Counter<E> counter, int numberOfKeys)
          Creates a Distribution from the given counter using Gale & Sampsons' "simple Good-Turing" smoothing.
 java.lang.String toString()
           
 java.lang.String toString(java.text.NumberFormat nf)
           
 double totalCount()
           
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

counter

protected Counter<E> counter
Method Detail

getCounter

public Counter<E> getCounter()

drawSample

public E drawSample()
Exactly the same as sampleFrom(), needed for the Sampler interface.

Specified by:
drawSample in interface Sampler<E>
Returns:
labels (of type T) drawn from the underlying distribution for the observation this Sampler was created for.

drawSample

public E drawSample(java.util.Random random)
A method to draw a sample, providing an own random number generator. Needed for the ProbabilityDistribution interface.

Specified by:
drawSample in interface ProbabilityDistribution<E>

toString

public java.lang.String toString(java.text.NumberFormat nf)

getReservedMass

public double getReservedMass()

getNumberOfKeys

public int getNumberOfKeys()

keySet

public java.util.Set<E> keySet()

containsKey

public boolean containsKey(E key)

getCount

public double getCount(E key)
Returns the current count for the given key, which is 0 if it hasn't been seen before. This is a convenient version of get that casts and extracts the primitive value.

Parameters:
key - The key to look up.
Returns:
The current count for the given key, which is 0 if it hasn't been seen before

getDistributionFromPartiallySpecifiedCounter

public static <E> Distribution<E> getDistributionFromPartiallySpecifiedCounter(Counter<E> c,
                                                                               int numKeys)
Assuming that c has a total count < 1, returns a new Distribution using the counts in c as probabilities. If c has a total count > 1, returns a normalized distribution with no remaining mass.


getUniformDistribution

public static <E> Distribution<E> getUniformDistribution(java.util.Set<E> s)
Parameters:
s - a Set of keys.

getPerturbedUniformDistribution

public static <E> Distribution<E> getPerturbedUniformDistribution(java.util.Set<E> s,
                                                                  java.util.Random r)
Parameters:
s - a Set of keys.

getPerturbedDistribution

public static <E> Distribution<E> getPerturbedDistribution(Counter<E> wordCounter,
                                                           java.util.Random r)

getDistribution

public static <E> Distribution<E> getDistribution(Counter<E> counter)
Creates a Distribution from the given counter. It makes an internal copy of the counter and divides all counts by the total count.

Returns:
a new Distribution

getDistributionWithReservedMass

public static <E> Distribution<E> getDistributionWithReservedMass(Counter<E> counter,
                                                                  double reservedMass)

getDistributionFromLogValues

public static <E> Distribution<E> getDistributionFromLogValues(Counter<E> counter)
Creates a Distribution from the given counter, ie makes an internal copy of the counter and divides all counts by the total count.

Returns:
a new Distribution

absolutelyDiscountedDistribution

public static <E> Distribution<E> absolutelyDiscountedDistribution(Counter<E> counter,
                                                                   int numberOfKeys,
                                                                   double discount)

laplaceSmoothedDistribution

public static <E> Distribution<E> laplaceSmoothedDistribution(Counter<E> counter,
                                                              int numberOfKeys)
Creates an Laplace smoothed Distribution from the given counter, ie adds one count to every item, including unseen ones, and divides by the total count.

Returns:
a new add-1 smoothed Distribution

laplaceSmoothedDistribution

public static <E> Distribution<E> laplaceSmoothedDistribution(Counter<E> counter,
                                                              int numberOfKeys,
                                                              double lambda)
Creates a smoothed Distribution using Lidstone's law, ie adds lambda (typically between 0 and 1) to every item, including unseen ones, and divides by the total count.

Returns:
a new Lidstone smoothed Distribution

laplaceWithExplicitUnknown

public static <E> Distribution<E> laplaceWithExplicitUnknown(Counter<E> counter,
                                                             double lambda,
                                                             E UNK)
Creates a smoothed Distribution with Laplace smoothing, but assumes an explicit count of "UNKNOWN" items. Thus anything not in the original counter will have probability zero.

Parameters:
counter - the counter to normalize
lambda - the value to add to each count
UNK - the UNKNOWN symbol
Returns:
a new Laplace-smoothed distribution

goodTuringSmoothedCounter

public static <E> Distribution<E> goodTuringSmoothedCounter(Counter<E> counter,
                                                            int numberOfKeys)
Creates a Good-Turing smoothed Distribution from the given counter.

Returns:
a new Good-Turing smoothed Distribution.

goodTuringWithExplicitUnknown

public static <E> Distribution<E> goodTuringWithExplicitUnknown(Counter<E> counter,
                                                                E UNK)
Creates a Good-Turing smoothed Distribution from the given counter without creating any reserved mass-- instead, the special object UNK in the counter is assumed to be the count of "UNSEEN" items. Probability of objects not in original counter will be zero.

Parameters:
counter - the counter
UNK - the unknown symbol
Returns:
a good-turing smoothed distribution

simpleGoodTuring

public static <E> Distribution<E> simpleGoodTuring(Counter<E> counter,
                                                   int numberOfKeys)
Creates a Distribution from the given counter using Gale & Sampsons' "simple Good-Turing" smoothing.

Returns:
a new simple Good-Turing smoothed Distribution.

distributionWithDirichletPrior

public static <E> Distribution<E> distributionWithDirichletPrior(Counter<E> c,
                                                                 Distribution<E> prior,
                                                                 double weight)
Returns a Distribution that uses prior as a Dirichlet prior weighted by weight. Essentially adds "pseudo-counts" for each Object in prior equal to that Object's mass in prior times weight, then normalizes.

WARNING: If unseen item is encountered in c, total may not be 1. NOTE: This will not work if prior is a DynamicDistribution to fix this, you could add a CounterView to Distribution and use that in the linearCombination call below

Parameters:
weight - multiplier of prior to get "pseudo-count"
Returns:
new Distribution

dynamicCounterWithDirichletPrior

public static <E> Distribution<E> dynamicCounterWithDirichletPrior(Counter<E> c,
                                                                   Distribution<E> prior,
                                                                   double weight)
Like normalizedCounterWithDirichletPrior except probabilities are computed dynamically from the counter and prior instead of all at once up front. The main advantage of this is if you are making many distributions from relatively sparse counters using the same relatively dense prior, the prior is only represented once, for major memory savings.

Parameters:
weight - multiplier of prior to get "pseudo-count"
Returns:
new Distribution

distributionFromLogisticCounter

public static <E> Distribution<E> distributionFromLogisticCounter(Counter<E> cntr)
Maps a counter representing the linear weights of a multiclass logistic regression model to the probabilities of each class.


sampleFrom

public E sampleFrom()
Returns an object sampled from the distribution using Math.random(). There may be a faster way to do this if you need to...

Returns:
a sampled object

sampleFrom

public E sampleFrom(java.util.Random random)
Returns an object sampled from the distribution using a self-provided random number generator.

Returns:
a sampled object

probabilityOf

public double probabilityOf(E key)
Returns the normalized count of the given object.

Specified by:
probabilityOf in interface ProbabilityDistribution<E>
Returns:
the normalized count of the object

logProbabilityOf

public double logProbabilityOf(E key)
Returns the natural logarithm of the object's probability

Specified by:
logProbabilityOf in interface ProbabilityDistribution<E>
Returns:
the logarithm of the normalised count (may be NaN if Pr==0.0)

argmax

public E argmax()

totalCount

public double totalCount()

addToKeySet

public void addToKeySet(E o)
Insures that object is in keyset (with possibly zero value)

Parameters:
o - object to put in keyset

equals

public boolean equals(java.lang.Object o)
Overrides:
equals in class java.lang.Object

equals

public boolean equals(Distribution<E> distribution)

hashCode

public int hashCode()
Overrides:
hashCode in class java.lang.Object

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object

main

public static void main(java.lang.String[] args)
For internal testing purposes only.



Stanford NLP Group