edu.stanford.nlp.stats

## Class Distributions

• java.lang.Object
• edu.stanford.nlp.stats.Distributions

• ```public class Distributions
extends java.lang.Object```
Static methods for operating on `Distributions`s. In general, if a method is operating on a pair of Distribution objects, we imagine that the set of possible keys for each Distribution is the same. Therefore we require that d1.numberOFKeys = d2.numberOfKeys and that the number of keys in the union of the two key sets <= numKeys
Author:
Jeff Michels (jmichels@stanford.edu)
• ### Method Summary

All Methods
Modifier and Type Method and Description
`static <K> Distribution<K>` ```average(Distribution<K> d1, Distribution<K> d2)```
`protected static <K> java.util.Set<K>` ```getSetOfAllKeys(Distribution<K> d1, Distribution<K> d2)```
`static <K> double` ```informationRadius(Distribution<K> d1, Distribution<K> d2)```
Calculates the information radius (aka the Jensen-Shannon divergence) between the two Distributions.
`static <K> double` ```jensenShannonDivergence(Distribution<K> d1, Distribution<K> d2)```
Calculates the Jensen-Shannon divergence between the two distributions.
`static <K> double` ```klDivergence(Distribution<K> from, Distribution<K> to)```
Calculates the KL divergence between the two distributions.
`static <K> double` ```overlap(Distribution<K> d1, Distribution<K> d2)```
Returns a double between 0 and 1 representing the overlap of d1 and d2.
`static <K> double` ```skewDivergence(Distribution<K> d1, Distribution<K> d2, double skew)```
Calculates the skew divergence between the two distributions.
`static <K> Distribution<K>` ```weightedAverage(Distribution<K> d1, double w1, Distribution<K> d2)```
Returns a new `Distribution<K>` with counts averaged from the two given Distributions.
• ### Methods inherited from class java.lang.Object

`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`
• ### Method Detail

• #### getSetOfAllKeys

```protected static <K> java.util.Set<K> getSetOfAllKeys(Distribution<K> d1,
Distribution<K> d2)```
• #### overlap

```public static <K> double overlap(Distribution<K> d1,
Distribution<K> d2)```
Returns a double between 0 and 1 representing the overlap of d1 and d2. Equals 0 if there is no overlap, equals 1 iff d1==d2
• #### weightedAverage

```public static <K> Distribution<K> weightedAverage(Distribution<K> d1,
double w1,
Distribution<K> d2)```
Returns a new `Distribution<K>` with counts averaged from the two given Distributions. The average `Distribution<K>` will contain the union of keys in both source Distributions, and each count will be the weighted average of the two source counts for that key, a missing count in one Distribution is treated as if it has probability equal to that returned by the `probabilityOf()` function.
Returns:
A new distribution with counts that are the mean of the resp. counts in the given distributions with the remaining probability mass adjusted accordingly.
• #### average

```public static <K> Distribution<K> average(Distribution<K> d1,
Distribution<K> d2)```
• #### klDivergence

```public static <K> double klDivergence(Distribution<K> from,
Distribution<K> to)```
Calculates the KL divergence between the two distributions. That is, it calculates KL(from || to). In other words, how well can d1 be represented by d2. if there is some value in d1 that gets zero prob in d2, then return positive infinity.
Returns:
The KL divergence between the distributions
• #### jensenShannonDivergence

```public static <K> double jensenShannonDivergence(Distribution<K> d1,
Distribution<K> d2)```
Calculates the Jensen-Shannon divergence between the two distributions. That is, it calculates 1/2 [KL(d1 || avg(d1,d2)) + KL(d2 || avg(d1,d2))] .
Returns:
The KL divergence between the distributions
• #### skewDivergence

```public static <K> double skewDivergence(Distribution<K> d1,
Distribution<K> d2,
double skew)```
Calculates the skew divergence between the two distributions. That is, it calculates KL(d1 || (d2*skew + d1*(1-skew))) . In other words, how well can d1 be represented by a "smoothed" d2.
Returns:
The skew divergence between the distributions
```public static <K> double informationRadius(Distribution<K> d1,