Assert invariants.
Assert invariants.
Computes the total cross-entropy of the terms in the second half of the document based on an estimate of theta from the terms in the fisrt half of the doucment.
Computes the total cross-entropy of the terms in the second half of the document based on an estimate of theta from the terms in the fisrt half of the doucment. Returns (sum crossEntropy, numTerms). This is used as the basis of computePerplexity.
Computes the log probability for the current document.
Computes the log probability for the current document. This measure treats the assignment to theta and the model counts as observed. Returns sum_i P(w_i | theta*, beta*). Beta maps from (topic,term) to probability.
Computes the average per-word perplexity of the given dataset.
Computes the average per-word perplexity of the given dataset.
How many times each topic is seen overall.
How many times each topic is seen overall.
How many times each term is seen in each topic.
How many times each term is seen in each topic.
Creates a document from the given document parameters.
Creates a document from the given document parameters.
Returns the distribution over terms for the given topic.
Returns the distribution over terms for the given topic. The return value of this method is assumed to have already incorporated the corresponding getTermSmoothing to the appropriate extent.
Returns the distribution over terms for the given topic.
Returns the distribution over terms for the given topic. The return value of this method is assumed to have already incorporated the corresponding getTermSmoothing to the appropriate extent.
Returns an array of per-topic probabilities.
Returns an array of per-topic probabilities.
Returns an array of per-topic probabilities. Loops while the largest difference between iterations in probabilities for any given topic is greater than delta (default 1e-5).
Does inference on the given document until convergence.
Does inference on the given document until convergence.
Does inference on the given document until convergence.
Does inference on the given document until convergence.
Where log messages go.
Where log messages go. Defaults to System.err.println.
The number of terms in the model.
The number of terms in the model.
The number of topics in the model.
The number of topics in the model.
Returns the probability of the given term in the given topic.
Returns the probability of the given term in the given topic.
Returns the probability of the given term in the given topic.
Returns the probability of the given term in the given topic.
The parameters used to create this model.
The parameters used to create this model.
Registers a function as a checker of invariants.
Registers a function as a checker of invariants.
Resets to the default state.
Resets to the default state.
Gets the current state of this object.
Gets the current state of this object.
Sets the current state of this object.
Sets the current state of this object.
Returns human-readable summary of the current topic model.
Returns human-readable summary of the current topic model.
The term index describing which terms are in the model.
The term index describing which terms are in the model.
Add-k prior counts for each term (eta in the model formulation).
Add-k prior counts for each term (eta in the model formulation).
Tokenizes the given input string using our stored tokenizer and term index, if available.
Tokenizes the given input string using our stored tokenizer and term index, if available. Otherwise, throws an IllegalArgumentException.
The tokenizer used to break input documents into terms.
The tokenizer used to break input documents into terms.
The term index describing which terms are in the model.
The term index describing which terms are in the model.
Gets the name for this topic.
Gets the name for this topic.
Prior counts for each topic (alpha in the model formulation).
Prior counts for each topic (alpha in the model formulation).
CVB0 learning and inference model for vanilla LDA. This algorithm is like the collapsed Gibbs sampler implemented in GibbsLDA except that instead of sampling a hard topic assignment for each z as it iterates through the data, the model keeps the soft assignment distribution. Consequently, the algorithm converges in fewer iterations (and it is easier to determine convergence) but requires more memory during training.
See "On Smoothing and Inference for Topic Models" in UAI 2009 for more on CVB0.