edu.stanford.nlp.tmt.model.lda

CVB0LDA

class CVB0LDA extends LDA[SoftAssignmentModelState, CVB0LDADocument, (String, Array[Array[Double]])] with SoftAssignmentModel[LDAModelParams, LDADocumentParams, CVB0LDADocument]

CVB0 learning and inference model for vanilla LDA. This algorithm is like the collapsed Gibbs sampler implemented in GibbsLDA except that instead of sampling a hard topic assignment for each z as it iterates through the data, the model keeps the soft assignment distribution. Consequently, the algorithm converges in fewer iterations (and it is easier to determine convergence) but requires more memory during training.

See "On Smoothing and Inference for Topic Models" in UAI 2009 for more on CVB0.

Linear Supertypes
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. Hide All
  2. Show all
  1. CVB0LDA
  2. SoftAssignmentModel
  3. LDA
  4. DirichletTopicSmoothing
  5. DirichletTermSmoothing
  6. ClosedTopicSet
  7. TopicModel
  8. RepCheck
  9. Stateful
  10. AnyRef
  11. Any
Visibility
  1. Public
  2. All

Instance Constructors

  1. new CVB0LDA (params: LDAModelParams, seed: Long, log: (String) ⇒ Unit)

Value Members

  1. def != (arg0: AnyRef): Boolean

    Attributes
    final
    Definition Classes
    AnyRef
  2. def != (arg0: Any): Boolean

    Attributes
    final
    Definition Classes
    Any
  3. def ## (): Int

    Attributes
    final
    Definition Classes
    AnyRef → Any
  4. def == (arg0: AnyRef): Boolean

    Attributes
    final
    Definition Classes
    AnyRef
  5. def == (arg0: Any): Boolean

    Attributes
    final
    Definition Classes
    Any
  6. def asInstanceOf [T0] : T0

    Attributes
    final
    Definition Classes
    Any
  7. var checkers : List[Function0[_]]

    Attributes
    protected
    Definition Classes
    RepCheck
  8. def checkrep (): Unit

    Assert invariants.

    Assert invariants.

    Attributes
    protected final
    Definition Classes
    RepCheck
  9. def clone (): AnyRef

    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws()
  10. def computeCrossEntropy (doc: LDADocumentParams): (Double, Int)

    Computes the total cross-entropy of the terms in the second half of the document based on an estimate of theta from the terms in the fisrt half of the doucment.

    Computes the total cross-entropy of the terms in the second half of the document based on an estimate of theta from the terms in the fisrt half of the doucment. Returns (sum crossEntropy, numTerms). This is used as the basis of computePerplexity.

    Definition Classes
    LDA
  11. def computeLogPW (doc: CVB0LDADocument): Double

    Computes the log probability for the current document.

    Computes the log probability for the current document. This measure treats the assignment to theta and the model counts as observed. Returns sum_i P(w_i | theta*, beta*). Beta maps from (topic,term) to probability.

    Definition Classes
    LDA
  12. def computePerplexity (docs: Traversable[LDADocumentParams]): Double

    Computes the average per-word perplexity of the given dataset.

    Computes the average per-word perplexity of the given dataset.

    Definition Classes
    LDA
  13. val countTopic : Array[Double]

    How many times each topic is seen overall.

    How many times each topic is seen overall.

    Definition Classes
    SoftAssignmentModel
  14. val countTopicTerm : Array[Array[Double]]

    How many times each term is seen in each topic.

    How many times each term is seen in each topic.

    Definition Classes
    SoftAssignmentModel
  15. def create (dp: LDADocumentParams): CVB0LDADocument

    Creates a document from the given document parameters.

    Creates a document from the given document parameters.

    Definition Classes
    CVB0LDATopicModel
  16. def doAssignments (doc: CVB0LDADocument, learn: Boolean): Unit

  17. def doCounts (doc: CVB0LDADocument): Unit

  18. def eq (arg0: AnyRef): Boolean

    Attributes
    final
    Definition Classes
    AnyRef
  19. def equals (arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  20. def finalize (): Unit

    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws()
  21. def getClass (): java.lang.Class[_]

    Attributes
    final
    Definition Classes
    AnyRef → Any
  22. def getTopicTermDistribution (topic: String): Array[Double]

    Returns the distribution over terms for the given topic.

    Returns the distribution over terms for the given topic. The return value of this method is assumed to have already incorporated the corresponding getTermSmoothing to the appropriate extent.

    Attributes
    final
    Definition Classes
    ClosedTopicSet
  23. def getTopicTermDistribution (topic: Int): Array[Double]

    Returns the distribution over terms for the given topic.

    Returns the distribution over terms for the given topic. The return value of this method is assumed to have already incorporated the corresponding getTermSmoothing to the appropriate extent.

    Definition Classes
    ClosedTopicSet
  24. def hashCode (): Int

    Definition Classes
    AnyRef → Any
  25. def infer (doc: CVB0LDADocument): Array[Double]

    Returns an array of per-topic probabilities.

    Returns an array of per-topic probabilities. Loops while the largest difference between iterations in probabilities for any given topic is greater than delta (default 1e-5).

    Definition Classes
    CVB0LDALDA
  26. def infer (doc: CVB0LDADocument, delta: Double): Array[Double]

    Returns an array of per-topic probabilities.

    Returns an array of per-topic probabilities. Loops while the largest difference between iterations in probabilities for any given topic is greater than delta (default 1e-5).

  27. def infer (doc: String): Array[Double]

    Does inference on the given document until convergence.

    Does inference on the given document until convergence.

    Definition Classes
    LDA
  28. def infer (doc: LDADocumentParams): Array[Double]

    Does inference on the given document until convergence.

    Does inference on the given document until convergence.

    Definition Classes
    LDA
  29. def isInstanceOf [T0] : Boolean

    Attributes
    final
    Definition Classes
    Any
  30. val log : (String) ⇒ Unit

    Where log messages go.

    Where log messages go. Defaults to System.err.println.

    Definition Classes
    CVB0LDATopicModel
  31. def ne (arg0: AnyRef): Boolean

    Attributes
    final
    Definition Classes
    AnyRef
  32. def notify (): Unit

    Attributes
    final
    Definition Classes
    AnyRef
  33. def notifyAll (): Unit

    Attributes
    final
    Definition Classes
    AnyRef
  34. val numTerms : Int

    The number of terms in the model.

    The number of terms in the model.

    Definition Classes
    LDATopicModel
  35. val numTopics : Int

    The number of topics in the model.

    The number of topics in the model.

    Definition Classes
    LDAClosedTopicSet
  36. def pTopicTerm (topic: Int, term: Int): Double

    Returns the probability of the given term in the given topic.

    Returns the probability of the given term in the given topic.

    Attributes
    final
    Definition Classes
    SoftAssignmentModelClosedTopicSet
  37. def pTopicTerm (topic: String, term: String): Double

    Returns the probability of the given term in the given topic.

    Returns the probability of the given term in the given topic.

    Definition Classes
    ClosedTopicSet
  38. val params : LDAModelParams

    The parameters used to create this model.

    The parameters used to create this model.

    Definition Classes
    CVB0LDALDATopicModel
  39. def registerCheck (check: Function0[_]): Unit

    Registers a function as a checker of invariants.

    Registers a function as a checker of invariants.

    Attributes
    protected
    Definition Classes
    RepCheck
  40. def reset (): Unit

    Resets to the default state.

    Resets to the default state.

    Definition Classes
    CVB0LDASoftAssignmentModelStateful
  41. val seed : Long

  42. def state : SoftAssignmentModelState

    Gets the current state of this object.

    Gets the current state of this object.

    Definition Classes
    SoftAssignmentModelStateful
  43. def state_= (state: SoftAssignmentModelState): Unit

    Sets the current state of this object.

    Sets the current state of this object.

    Definition Classes
    SoftAssignmentModelStateful
  44. def summary : Iterator[String]

    Returns human-readable summary of the current topic model.

    Returns human-readable summary of the current topic model.

    Definition Classes
    SoftAssignmentModel
  45. def synchronized [T0] (arg0: ⇒ T0): T0

    Attributes
    final
    Definition Classes
    AnyRef
  46. def termIndex : Option[Index[String]]

    The term index describing which terms are in the model.

    The term index describing which terms are in the model.

    Attributes
    final
    Definition Classes
    TopicModel
  47. def termIndex_= (index: Option[Index[String]]): Unit

    Attributes
    protected final
    Definition Classes
    TopicModel
  48. def termSmoothDenom : Double

    Attributes
    protected
    Definition Classes
    DirichletTermSmoothing
  49. def termSmoothing : Array[Double]

    Add-k prior counts for each term (eta in the model formulation).

    Add-k prior counts for each term (eta in the model formulation).

    Attributes
    final
    Definition Classes
    DirichletTermSmoothing
  50. def termSmoothing_= (smoothing: Array[Double]): Unit

    Attributes
    protected
    Definition Classes
    DirichletTermSmoothing
  51. def toString (): String

    Definition Classes
    AnyRef → Any
  52. def tokenize (document: String): Iterable[Int]

    Tokenizes the given input string using our stored tokenizer and term index, if available.

    Tokenizes the given input string using our stored tokenizer and term index, if available. Otherwise, throws an IllegalArgumentException.

    Attributes
    protected
    Definition Classes
    TopicModel
  53. def tokenizer : Option[Tokenizer]

    The tokenizer used to break input documents into terms.

    The tokenizer used to break input documents into terms.

    Attributes
    final
    Definition Classes
    TopicModel
  54. def tokenizer_= (tokenizer: Option[Tokenizer]): Unit

    Attributes
    protected final
    Definition Classes
    TopicModel
  55. var topicIndex : Option[Index[String]]

    The term index describing which terms are in the model.

    The term index describing which terms are in the model.

    Definition Classes
    ClosedTopicSet
  56. def topicName (topic: Int): String

    Gets the name for this topic.

    Gets the name for this topic.

    Definition Classes
    ClosedTopicSet
  57. def topicSmoothing : Array[Double]

    Prior counts for each topic (alpha in the model formulation).

    Prior counts for each topic (alpha in the model formulation).

    Attributes
    final
    Definition Classes
    DirichletTopicSmoothing
  58. def topicSmoothing_= (smoothing: Array[Double]): Unit

    Attributes
    protected
    Definition Classes
    DirichletTopicSmoothing
  59. def wait (): Unit

    Attributes
    final
    Definition Classes
    AnyRef
    Annotations
    @throws()
  60. def wait (arg0: Long, arg1: Int): Unit

    Attributes
    final
    Definition Classes
    AnyRef
    Annotations
    @throws()
  61. def wait (arg0: Long): Unit

    Attributes
    final
    Definition Classes
    AnyRef
    Annotations
    @throws()

Inherited from SoftAssignmentModel[LDAModelParams, LDADocumentParams, CVB0LDADocument]

Inherited from LDA[SoftAssignmentModelState, CVB0LDADocument, (String, Array[Array[Double]])]

Inherited from DirichletTopicSmoothing

Inherited from DirichletTermSmoothing

Inherited from ClosedTopicSet

Inherited from TopicModel[LDAModelParams, SoftAssignmentModelState, LDADocumentParams, CVB0LDADocument, (String, Array[Array[Double]])]

Inherited from RepCheck

Inherited from Stateful[SoftAssignmentModelState]

Inherited from AnyRef

Inherited from Any