Assert invariants.
Assert invariants.
Computes the total cross-entropy of the terms in the second half of the document based on an estimate of theta from the terms in the fisrt half of the doucment.
Computes the total cross-entropy of the terms in the second half of the document based on an estimate of theta from the terms in the fisrt half of the doucment. Returns (sum crossEntropy, numTerms). This is used as the basis of computePerplexity.
Computes the log probability for the current document.
Computes the log probability for the current document. This measure treats the assignment to theta and the model counts as observed. Returns sum_i P(w_i | theta*, beta*). Beta maps from (topic,term) to probability.
Computes the average per-word perplexity of the given dataset.
Computes the average per-word perplexity of the given dataset.
How many times each topic is seen overall.
How many times each topic is seen overall.
How many times each term is seen in each topic.
How many times each term is seen in each topic.
Creates a document from the given document parameters.
Creates a document from the given document parameters.
Returns the distribution over terms for the given topic.
Returns the distribution over terms for the given topic. The return value of this method is assumed to have already incorporated the corresponding getTermSmoothing to the appropriate extent.
Returns the distribution over terms for the given topic.
Returns the distribution over terms for the given topic. The return value of this method is assumed to have already incorporated the corresponding getTermSmoothing to the appropriate extent.
Does inference on the given document until convergence.
Does inference on the given document until convergence.
Does inference on the given document until convergence.
Does inference on the given document until convergence.
Does inference on the given document until convergence.
Gets a thread-local inference sampler.
Where log messages go.
Where log messages go. Defaults to System.err.println.
The number of terms in the model.
The number of terms in the model.
The number of topics in the model.
The number of topics in the model.
Returns the probability of the given term in the given topic.
Returns the probability of the given term in the given topic.
Returns the probability of the given term in the given topic.
Returns the probability of the given term in the given topic.
The parameters used to create this model.
The parameters used to create this model.
Registers a function as a checker of invariants.
Registers a function as a checker of invariants.
Resets to the default state.
Resets to the default state.
Gets the current state of this object.
Gets the current state of this object.
Sets the current state of this object.
Sets the current state of this object.
Returns human-readable summary of the current topic model.
Returns human-readable summary of the current topic model.
The term index describing which terms are in the model.
The term index describing which terms are in the model.
Add-k prior counts for each term (eta in the model formulation).
Add-k prior counts for each term (eta in the model formulation).
Tokenizes the given input string using our stored tokenizer and term index, if available.
Tokenizes the given input string using our stored tokenizer and term index, if available. Otherwise, throws an IllegalArgumentException.
The tokenizer used to break input documents into terms.
The tokenizer used to break input documents into terms.
The term index describing which terms are in the model.
The term index describing which terms are in the model.
Gets the name for this topic.
Gets the name for this topic.
Prior counts for each topic (alpha in the model formulation).
Prior counts for each topic (alpha in the model formulation).
Collapsed Gibbs sampler for LDA learning and inference. This class is not threadsafe for learning. However, it is threadsafe for inerence, but no guarantees are provided about repeatability in a threaded environment if the number of threads is different between runs. This is because each thread is given its own random number generator to avoid synchronization overhead, so the sequence of random numbers seen on an particular document may be a function of the number of threads.