public class Distribution<E> extends Object implements Sampler<E>, ProbabilityDistribution<E>
Counters
. Smoothed counters reserve probability mass for unseen
items, so queries for the probability of unseen items will return a small
positive amount. Normalization is L1 normalization:
totalCount()
should always return 1.
A Counter passed into a constructor is copied. This class is Serializable.Modifier and Type | Method and Description |
---|---|
static <E> Distribution<E> |
absolutelyDiscountedDistribution(Counter<E> counter,
int numberOfKeys,
double discount) |
void |
addToKeySet(E o)
Insures that object is in keyset (with possibly zero value)
|
E |
argmax() |
boolean |
containsKey(E key) |
static <E> Distribution<E> |
distributionFromLogisticCounter(Counter<E> cntr)
Maps a counter representing the linear weights of a multiclass
logistic regression model to the probabilities of each class.
|
static <E> Distribution<E> |
distributionWithDirichletPrior(Counter<E> c,
Distribution<E> prior,
double weight)
Returns a Distribution that uses prior as a Dirichlet prior
weighted by weight.
|
E |
drawSample()
Exactly the same as sampleFrom(), needed for the Sampler interface.
|
E |
drawSample(Random random)
A method to draw a sample, providing an own random number generator.
|
static <E> Distribution<E> |
dynamicCounterWithDirichletPrior(Counter<E> c,
Distribution<E> prior,
double weight)
Like normalizedCounterWithDirichletPrior except probabilities are
computed dynamically from the counter and prior instead of all at once up front.
|
boolean |
equals(Distribution<E> distribution) |
boolean |
equals(Object o) |
double |
getCount(E key)
Returns the current count for the given key, which is 0 if it hasn't
been
seen before.
|
Counter<E> |
getCounter() |
static <E> Distribution<E> |
getDistribution(Counter<E> counter)
Creates a Distribution from the given counter.
|
static <E> Distribution<E> |
getDistributionFromLogValues(Counter<E> counter)
Creates a Distribution from the given counter, ie makes an internal
copy of the counter and divides all counts by the total count.
|
static <E> Distribution<E> |
getDistributionFromPartiallySpecifiedCounter(Counter<E> c,
int numKeys)
Assuming that c has a total count < 1, returns a new Distribution using the counts in c as probabilities.
|
static <E> Distribution<E> |
getDistributionWithReservedMass(Counter<E> counter,
double reservedMass) |
int |
getNumberOfKeys() |
static <E> Distribution<E> |
getPerturbedDistribution(Counter<E> wordCounter,
Random r) |
static <E> Distribution<E> |
getPerturbedUniformDistribution(Collection<E> s,
Random r) |
double |
getReservedMass() |
static <E> Distribution<E> |
getUniformDistribution(Collection<E> s) |
static <E> Distribution<E> |
goodTuringSmoothedCounter(Counter<E> counter,
int numberOfKeys)
Creates a Good-Turing smoothed Distribution from the given counter.
|
static <E> Distribution<E> |
goodTuringWithExplicitUnknown(Counter<E> counter,
E UNK)
Creates a Good-Turing smoothed Distribution from the given counter without
creating any reserved mass-- instead, the special object UNK in the counter
is assumed to be the count of "UNSEEN" items.
|
int |
hashCode() |
Set<E> |
keySet() |
static <E> Distribution<E> |
laplaceSmoothedDistribution(Counter<E> counter,
int numberOfKeys)
Creates an Laplace smoothed Distribution from the given counter, ie adds one count
to every item, including unseen ones, and divides by the total count.
|
static <E> Distribution<E> |
laplaceSmoothedDistribution(Counter<E> counter,
int numberOfKeys,
double lambda)
Creates a smoothed Distribution using Lidstone's law, ie adds lambda (typically
between 0 and 1) to every item, including unseen ones, and divides by the total count.
|
static <E> Distribution<E> |
laplaceWithExplicitUnknown(Counter<E> counter,
double lambda,
E UNK)
Creates a smoothed Distribution with Laplace smoothing, but assumes an explicit
count of "UNKNOWN" items.
|
double |
logProbabilityOf(E key)
Returns the natural logarithm of the object's probability
|
static void |
main(String[] args)
For internal testing purposes only.
|
double |
probabilityOf(E key)
Returns the normalized count of the given object.
|
E |
sampleFrom()
Returns an object sampled from the distribution using Math.random().
|
E |
sampleFrom(Random random)
Returns an object sampled from the distribution using a self-provided
random number generator.
|
static <E> Distribution<E> |
simpleGoodTuring(Counter<E> counter,
int numberOfKeys)
Creates a Distribution from the given counter using Gale & Sampsons'
"simple Good-Turing" smoothing.
|
String |
toString() |
String |
toString(NumberFormat nf) |
double |
totalCount() |
public E drawSample()
drawSample
in interface Sampler<E>
public E drawSample(Random random)
drawSample
in interface ProbabilityDistribution<E>
public String toString(NumberFormat nf)
public double getReservedMass()
public int getNumberOfKeys()
public boolean containsKey(E key)
public double getCount(E key)
get
that casts
and extracts the primitive value.key
- The key to look up.public static <E> Distribution<E> getDistributionFromPartiallySpecifiedCounter(Counter<E> c, int numKeys)
public static <E> Distribution<E> getUniformDistribution(Collection<E> s)
s
- a Collection of keys.public static <E> Distribution<E> getPerturbedUniformDistribution(Collection<E> s, Random r)
s
- a Collection of keys.public static <E> Distribution<E> getPerturbedDistribution(Counter<E> wordCounter, Random r)
public static <E> Distribution<E> getDistribution(Counter<E> counter)
public static <E> Distribution<E> getDistributionWithReservedMass(Counter<E> counter, double reservedMass)
public static <E> Distribution<E> getDistributionFromLogValues(Counter<E> counter)
public static <E> Distribution<E> absolutelyDiscountedDistribution(Counter<E> counter, int numberOfKeys, double discount)
public static <E> Distribution<E> laplaceSmoothedDistribution(Counter<E> counter, int numberOfKeys)
public static <E> Distribution<E> laplaceSmoothedDistribution(Counter<E> counter, int numberOfKeys, double lambda)
public static <E> Distribution<E> laplaceWithExplicitUnknown(Counter<E> counter, double lambda, E UNK)
counter
- the counter to normalizelambda
- the value to add to each countUNK
- the UNKNOWN symbolpublic static <E> Distribution<E> goodTuringSmoothedCounter(Counter<E> counter, int numberOfKeys)
public static <E> Distribution<E> goodTuringWithExplicitUnknown(Counter<E> counter, E UNK)
counter
- the counterUNK
- the unknown symbolpublic static <E> Distribution<E> simpleGoodTuring(Counter<E> counter, int numberOfKeys)
public static <E> Distribution<E> distributionWithDirichletPrior(Counter<E> c, Distribution<E> prior, double weight)
weight
- multiplier of prior to get "pseudo-count"public static <E> Distribution<E> dynamicCounterWithDirichletPrior(Counter<E> c, Distribution<E> prior, double weight)
weight
- multiplier of prior to get "pseudo-count"public static <E> Distribution<E> distributionFromLogisticCounter(Counter<E> cntr)
public E sampleFrom()
public E sampleFrom(Random random)
public double probabilityOf(E key)
probabilityOf
in interface ProbabilityDistribution<E>
public double logProbabilityOf(E key)
logProbabilityOf
in interface ProbabilityDistribution<E>
public E argmax()
public double totalCount()
public void addToKeySet(E o)
o
- object to put in keysetpublic boolean equals(Distribution<E> distribution)
public static void main(String[] args)