CMMClassifier (Stanford JavaNLP API)

java.lang.Object
- edu.stanford.nlp.ie.AbstractSequenceClassifier<IN>
- - edu.stanford.nlp.ie.ner.CMMClassifier<IN>

All Implemented Interfaces:

java.util.function.Function<java.lang.String,java.lang.String>
```
public class CMMClassifier<IN extends CoreLabel>
extends AbstractSequenceClassifier<IN>
```
Does Sequence Classification using a Conditional Markov Model. It could be used for other purposes, but the provided features are aimed at doing Named Entity Recognition. The code has functionality for different document encodings, but when using the standard ColumnDocumentReader, input files are expected to be one word per line with the columns indicating things like the word, POS, chunk, and class. Typical usage For running a trained model with a provided serialized classifier: java -server -mx1000m edu.stanford.nlp.ie.ner.CMMClassifier -loadClassifier conll.ner.gz -textFile samplesentences.txt When specifying all parameters in a properties file (train, test, or runtime): java -mx1000m edu.stanford.nlp.ie.ner.CMMClassifier -prop propFile To train and test a model from the command line: java -mx1000m edu.stanford.nlp.ie.ner.CMMClassifier -trainFile trainFile -testFile testFile -goodCoNLL > output Features are defined by a FeatureFactory; the FeatureFactory which is used by default is NERFeatureFactory, and you should look there for feature templates. Features are specified either by a Properties file (which is the recommended method) or on the command line. The features are read into a SeqClassifierFlags object, which the user need not know much about, unless one wishes to add new features. CMMClassifier may also be used programmatically. When creating a new instance, you must specify a properties file. The other way to get a CMMClassifier is to deserialize one via getClassifier(String), which returns a deserialized classifier. You may then tag sentences using either the assorted test or testSentence methods.

Author:

Dan Klein, Jenny Finkel, Christopher Manning, Shipra Dingare, Huy Nguyen, Sarah Spikes (sdspikes@cs.stanford.edu) - cleanup and filling in types

Field Summary

Fields
Modifier and Type Field and Description

static java.lang.String DEFAULT_CLASSIFIER
Default place to look in Jar file for classifier.
- Fields inherited from class edu.stanford.nlp.ie.AbstractSequenceClassifier
  classIndex, featureFactories, flags, knownLCWords, pad, windowSize

Fields
Modifier and Type	Field and Description
`static java.lang.String`	`DEFAULT_CLASSIFIER` Default place to look in Jar file for classifier.

Constructor Summary

Constructors
Modifier Constructor and Description

protected CMMClassifier()

CMMClassifier(java.util.Properties props)

CMMClassifier(SeqClassifierFlags flags)

Constructors
Modifier	Constructor and Description
`protected`	`CMMClassifier()`
	`CMMClassifier(java.util.Properties props)`
	`CMMClassifier(SeqClassifierFlags flags)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`adapt(ObjectBank<java.util.List<IN>> featureLabels, Dataset<java.lang.String,java.lang.String> trainDataset)`
`void`	`adapt(java.lang.String filename, Dataset<java.lang.String,java.lang.String> trainDataset, DocumentReaderAndWriter<IN> readerWriter)`
`java.util.List<IN>`	`classify(java.util.List<IN> document)` Classify a `List` of `CoreLabel`s.
`java.util.List<IN>`	`classifyWithGlobalInformation(java.util.List<IN> tokenSeq, CoreMap doc, CoreMap sent)` Classify a `List` of something that extends `CoreMap` using as additional information whatever is stored in the document and sentence.
`protected java.lang.String`	`classOf(java.util.List<IN> lineInfos, int pos)` Returns the most likely class for the word at the given position.
`Dataset<java.lang.String,java.lang.String>`	`getBiasedDataset(ObjectBank<java.util.List<IN>> data, Index<java.lang.String> featureIndex, Index<java.lang.String> classIndex)`
`static CMMClassifier<? extends CoreLabel>`	`getClassifier(java.io.File file)`
`static CMMClassifier<? extends CoreLabel>`	`getClassifier(java.io.InputStream in)`
`static <INN extends CoreMap> CMMClassifier<? extends CoreLabel>`	`getClassifier(java.io.ObjectInputStream ois)`
`static <INN extends CoreMap> CMMClassifier<? extends CoreLabel>`	`getClassifier(java.io.ObjectInputStream ois, java.util.Properties props)`
`static CMMClassifier<? extends CoreLabel>`	`getClassifier(java.lang.String loadPath)`
`static CMMClassifier<? extends CoreLabel>`	`getClassifierNoExceptions(java.io.File file)`
`static CMMClassifier<? extends CoreLabel>`	`getClassifierNoExceptions(java.io.InputStream in)`
`static CMMClassifier<CoreLabel>`	`getClassifierNoExceptions(java.lang.String loadPath)`
`Dataset<java.lang.String,java.lang.String>`	`getDataset(java.util.Collection<java.util.List<IN>> data)` Build a Dataset from some data.
`Dataset<java.lang.String,java.lang.String>`	`getDataset(java.util.Collection<java.util.List<IN>> data, Index<java.lang.String> featureIndex, Index<java.lang.String> classIndex)` Build a Dataset from some data.
`Dataset<java.lang.String,java.lang.String>`	`getDataset(Dataset<java.lang.String,java.lang.String> oldData, Index<java.lang.String> goodFeatures)` Build a Dataset from some data.
`Dataset<java.lang.String,java.lang.String>`	`getDataset(ObjectBank<java.util.List<IN>> data, Dataset<java.lang.String,java.lang.String> origDataset)` Build a Dataset from some data.
`static CMMClassifier<? extends CoreLabel>`	`getDefaultClassifier()` Used to obtain the default classifier which is stored inside a jar file.
`SequenceModel`	`getSequenceModel(java.util.List<IN> document)`
`java.util.Set<java.lang.String>`	`getTags()` Returns the Set of entities recognized by this Classifier.
`void`	`loadClassifier(java.io.ObjectInputStream ois, java.util.Properties props)` Load a classifier from the given Stream.
`void`	`loadDefaultClassifier()` Used to load the default supplied classifier.
`double`	`loglikelihood(java.util.List<IN> lineInfos)` Returns the log conditional likelihood of the given dataset.
`static void`	`main(java.lang.String[] args)` Command-line version of the classifier.
`Datum<java.lang.String,java.lang.String>`	`makeDatum(java.util.List<IN> info, int loc, java.util.List<FeatureFactory<IN>> featureFactories)` Make an individual Datum out of the data list info, focused at position loc.
`Triple<Counter<java.lang.Integer>,Counter<java.lang.Integer>,TwoDimensionalCounter<java.lang.Integer,java.lang.String>>`	`printProbsDocument(java.util.List<IN> document)` Takes a `List` of `CoreLabel`s and prints the likelihood of each possible label at each point.
`void`	`retrain(ObjectBank<java.util.List<IN>> doc)`
`void`	`retrain(ObjectBank<java.util.List<IN>> featureLabels, Index<java.lang.String> featureIndex, Index<java.lang.String> labelIndex)`
`Counter<java.lang.String>`	`scoresOf(java.util.List<IN> lineInfos, int pos)`
`void`	`serializeClassifier(java.io.ObjectOutputStream oos)` Serialize a sequence classifier to an object output stream
`void`	`serializeClassifier(java.lang.String serializePath)` Serialize a sequence classifier to a file on the given path.
`void`	`train(java.util.Collection<java.util.List<IN>> wordInfos, DocumentReaderAndWriter<IN> readerAndWriter)` Trains a classifier from a Collection of sequences.
`void`	`trainSemiSup()`
`double`	`weight(java.lang.String feature, java.lang.String label)`
`double[][]`	`weights()`

Methods inherited from class edu.stanford.nlp.ie.AbstractSequenceClassifier
apply, backgroundSymbol, classify, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswersKBest, classifyAndWriteAnswersKBest, classifyAndWriteViterbiSearchGraph, classifyFile, classifyFilesAndWriteAnswers, classifyFilesAndWriteAnswers, classifyKBest, classifyRaw, classifySentence, classifySentenceWithGlobalInformation, classifyStdin, classifyStdin, classifyToCharacterOffsets, classifyToString, classifyToString, classifyWithInlineXML, countResults, countResultsSegmenter, defaultReaderAndWriter, dumpFeatures, finalizeClassification, getKnownLCWords, getSampler, labels, loadClassifier, loadClassifier, loadClassifier, loadClassifier, loadClassifier, loadClassifier, loadClassifierNoExceptions, loadClassifierNoExceptions, loadClassifierNoExceptions, loadClassifierNoExceptions, loadClassifierNoExceptions, makeObjectBankFromFile, makeObjectBankFromFile, makeObjectBankFromFiles, makeObjectBankFromFiles, makeObjectBankFromFiles, makeObjectBankFromReader, makeObjectBankFromString, makePlainTextReaderAndWriter, makePlainTextReaderAndWriter, makeReaderAndWriter, plainTextReaderAndWriter, printFeatureLists, printFeatures, printProbs, printProbs, printProbsDocuments, printResults, reinit, segmentString, segmentString, train, train, train, train, train, train, windowSize, writeAnswers

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface java.util.function.Function
andThen, compose, identity

- Field Detail
  - DEFAULT_CLASSIFIER
```
public static final java.lang.String DEFAULT_CLASSIFIER
```
    Default place to look in Jar file for classifier.
    
    See Also:
    
    Constant Field Values
- Constructor Detail
  - CMMClassifier
```
protected CMMClassifier()
```
  - CMMClassifier
```
public CMMClassifier(java.util.Properties props)
```
  - CMMClassifier
```
public CMMClassifier(SeqClassifierFlags flags)
```
- Method Detail
  - getTags
```
public java.util.Set<java.lang.String> getTags()
```
    Returns the Set of entities recognized by this Classifier.
    
    Returns:
    
    The Set of entities recognized by this Classifier.
  - classify
```
public java.util.List<IN> classify(java.util.List<IN> document)
```
    Classify a List of CoreLabels.
    
    Specified by:
    
    classify in class AbstractSequenceClassifier<IN extends CoreLabel>
    
    Parameters:
    
    document - A List of CoreLabels to be classified.
    
    Returns:
    
    The same List, but with the elements annotated with their answers (stored under the CoreAnnotations.AnswerAnnotation key). The answers will be the class labels defined by the CRF Classifier. They might be things like entity labels (in BIO notation or not) or something like "1" vs. "0" on whether to begin a new token here or not (in word segmentation).
  - classOf
```
protected java.lang.String classOf(java.util.List<IN> lineInfos,
                                   int pos)
```
    Returns the most likely class for the word at the given position.
  - loglikelihood
```
public double loglikelihood(java.util.List<IN> lineInfos)
```
    Returns the log conditional likelihood of the given dataset.
    
    Returns:
    
    The log conditional likelihood of the given dataset.
  - getSequenceModel
```
public SequenceModel getSequenceModel(java.util.List<IN> document)
```
    Overrides:
    
    getSequenceModel in class AbstractSequenceClassifier<IN extends CoreLabel>
  - adapt
```
public void adapt(java.lang.String filename,
                  Dataset<java.lang.String,java.lang.String> trainDataset,
                  DocumentReaderAndWriter<IN> readerWriter)
```
    Parameters:
    
    filename - adaptation file
    
    trainDataset - original dataset (used in training)
  - adapt
```
public void adapt(ObjectBank<java.util.List<IN>> featureLabels,
                  Dataset<java.lang.String,java.lang.String> trainDataset)
```
    Parameters:
    
    featureLabels - adaptation docs
    
    trainDataset - original dataset (used in training)
  - retrain
```
public void retrain(ObjectBank<java.util.List<IN>> featureLabels,
                    Index<java.lang.String> featureIndex,
                    Index<java.lang.String> labelIndex)
```
    Parameters:
    
    featureLabels - retrain docs
    
    featureIndex - featureIndex of original dataset (used in training)
    
    labelIndex - labelIndex of original dataset (used in training)
  - retrain
```
public void retrain(ObjectBank<java.util.List<IN>> doc)
```
  - train
```
public void train(java.util.Collection<java.util.List<IN>> wordInfos,
                  DocumentReaderAndWriter<IN> readerAndWriter)
```
    Description copied from class: AbstractSequenceClassifier
    
    Trains a classifier from a Collection of sequences. Note that the Collection can be (and usually is) an ObjectBank.
    
    Specified by:
    
    train in class AbstractSequenceClassifier<IN extends CoreLabel>
    
    Parameters:
    
    wordInfos - An ObjectBank or a collection of sequences of IN
    
    readerAndWriter - A DocumentReaderAndWriter to use when loading test files
  - getDataset
```
public Dataset<java.lang.String,java.lang.String> getDataset(java.util.Collection<java.util.List<IN>> data)
```
    Build a Dataset from some data. Used for training a classifier.
    
    Parameters:
    
    data - This variable is a list of lists of CoreLabel. That is, it is a collection of documents, each of which is represented as a sequence of CoreLabel objects.
    
    Returns:
    
    The Dataset which is an efficient encoding of the information in a List of Datums
  - getDataset
```
public Dataset<java.lang.String,java.lang.String> getDataset(java.util.Collection<java.util.List<IN>> data,
                                                             Index<java.lang.String> featureIndex,
                                                             Index<java.lang.String> classIndex)
```
    Build a Dataset from some data. Used for training a classifier. By passing in extra featureIndex and classIndex, you can get a Dataset based on featureIndex and classIndex.
    
    Parameters:
    
    data - This variable is a list of lists of CoreLabel. That is, it is a collection of documents, each of which is represented as a sequence of CoreLabel objects.
    
    classIndex - if you want to get a Dataset based on featureIndex and classIndex in an existing origDataset
    
    Returns:
    
    The Dataset which is an efficient encoding of the information in a List of Datums
  - getBiasedDataset
```
public Dataset<java.lang.String,java.lang.String> getBiasedDataset(ObjectBank<java.util.List<IN>> data,
                                                                   Index<java.lang.String> featureIndex,
                                                                   Index<java.lang.String> classIndex)
```
  - getDataset
```
public Dataset<java.lang.String,java.lang.String> getDataset(ObjectBank<java.util.List<IN>> data,
                                                             Dataset<java.lang.String,java.lang.String> origDataset)
```
    Build a Dataset from some data. Used for training a classifier. By passing in an extra origDataset, you can get a Dataset based on featureIndex and classIndex in an existing origDataset.
    
    Parameters:
    
    data - This variable is a list of lists of CoreLabel. That is, it is a collection of documents, each of which is represented as a sequence of CoreLabel objects.
    
    origDataset - if you want to get a Dataset based on featureIndex and classIndex in an existing origDataset
    
    Returns:
    
    The Dataset which is an efficient encoding of the information in a List of Datums
  - getDataset
```
public Dataset<java.lang.String,java.lang.String> getDataset(Dataset<java.lang.String,java.lang.String> oldData,
                                                             Index<java.lang.String> goodFeatures)
```
    Build a Dataset from some data.
    
    Parameters:
    
    oldData - This Dataset represents data for which we which to some features, specifically those features not in the Index goodFeatures.
    
    goodFeatures - An Index of features we wish to retain.
    
    Returns:
    
    A new Dataset wheres each data point contains only features which were in goodFeatures.
  - serializeClassifier
```
public void serializeClassifier(java.lang.String serializePath)
```
    Description copied from class: AbstractSequenceClassifier
    
    Serialize a sequence classifier to a file on the given path.
    
    Specified by:
    
    serializeClassifier in class AbstractSequenceClassifier<IN extends CoreLabel>
    
    Parameters:
    
    serializePath - The path/filename to write the classifier to.
  - serializeClassifier
```
public void serializeClassifier(java.io.ObjectOutputStream oos)
```
    Description copied from class: AbstractSequenceClassifier
    
    Serialize a sequence classifier to an object output stream
    
    Specified by:
    
    serializeClassifier in class AbstractSequenceClassifier<IN extends CoreLabel>
  - loadDefaultClassifier
```
public void loadDefaultClassifier()
```
    Used to load the default supplied classifier. **THIS FUNCTION WILL ONLY WORK IF RUN INSIDE A JAR FILE**
  - getDefaultClassifier
```
public static CMMClassifier<? extends CoreLabel> getDefaultClassifier()
```
    Used to obtain the default classifier which is stored inside a jar file. THIS FUNCTION WILL ONLY WORK IF RUN INSIDE A JAR FILE.
    
    Returns:
    
    A Default CMMClassifier from a jar file
  - loadClassifier
```
public void loadClassifier(java.io.ObjectInputStream ois,
                           java.util.Properties props)
                    throws java.lang.ClassCastException,
                           java.io.IOException,
                           java.lang.ClassNotFoundException
```
    Load a classifier from the given Stream. Implementation note: This method does not close the Stream that it reads from.
    
    Specified by:
    
    loadClassifier in class AbstractSequenceClassifier<IN extends CoreLabel>
    
    Parameters:
    
    ois - The ObjectInputStream to load the serialized classifier from
    
    props - This Properties object will be used to update the SeqClassifierFlags which are read from the serialized classifier
    
    Throws:
    
    java.io.IOException - If there are problems accessing the input stream
    
    java.lang.ClassCastException - If there are problems interpreting the serialized data
    
    java.lang.ClassNotFoundException - If there are problems interpreting the serialized data
  - getClassifierNoExceptions
```
public static CMMClassifier<? extends CoreLabel> getClassifierNoExceptions(java.io.File file)
```
  - getClassifier
```
public static CMMClassifier<? extends CoreLabel> getClassifier(java.io.File file)
                                                        throws java.io.IOException,
                                                               java.lang.ClassCastException,
                                                               java.lang.ClassNotFoundException
```
    Throws:
    
    java.io.IOException
    
    java.lang.ClassCastException
    
    java.lang.ClassNotFoundException
  - getClassifierNoExceptions
```
public static CMMClassifier<CoreLabel> getClassifierNoExceptions(java.lang.String loadPath)
```
  - getClassifier
```
public static CMMClassifier<? extends CoreLabel> getClassifier(java.lang.String loadPath)
                                                        throws java.io.IOException,
                                                               java.lang.ClassCastException,
                                                               java.lang.ClassNotFoundException
```
    Throws:
    
    java.io.IOException
    
    java.lang.ClassCastException
    
    java.lang.ClassNotFoundException
  - getClassifierNoExceptions
```
public static CMMClassifier<? extends CoreLabel> getClassifierNoExceptions(java.io.InputStream in)
```
  - getClassifier
```
public static <INN extends CoreMap> CMMClassifier<? extends CoreLabel> getClassifier(java.io.ObjectInputStream ois)
                                                                              throws java.io.IOException,
                                                                                     java.lang.ClassCastException,
                                                                                     java.lang.ClassNotFoundException
```
    Throws:
    
    java.io.IOException
    
    java.lang.ClassCastException
    
    java.lang.ClassNotFoundException
  - getClassifier
```
public static <INN extends CoreMap> CMMClassifier<? extends CoreLabel> getClassifier(java.io.ObjectInputStream ois,
                                                                                     java.util.Properties props)
                                                                              throws java.io.IOException,
                                                                                     java.lang.ClassCastException,
                                                                                     java.lang.ClassNotFoundException
```
    Throws:
    
    java.io.IOException
    
    java.lang.ClassCastException
    
    java.lang.ClassNotFoundException
  - getClassifier
```
public static CMMClassifier<? extends CoreLabel> getClassifier(java.io.InputStream in)
                                                        throws java.io.IOException,
                                                               java.lang.ClassCastException,
                                                               java.lang.ClassNotFoundException
```
    Throws:
    
    java.io.IOException
    
    java.lang.ClassCastException
    
    java.lang.ClassNotFoundException
  - makeDatum
```
public Datum<java.lang.String,java.lang.String> makeDatum(java.util.List<IN> info,
                                                          int loc,
                                                          java.util.List<FeatureFactory<IN>> featureFactories)
```
    Make an individual Datum out of the data list info, focused at position loc.
    
    Parameters:
    
    info - A List of IN objects
    
    loc - The position in the info list to focus feature creation on
    
    featureFactories - The factory that constructs features out of the item
    
    Returns:
    
    A Datum (BasicDatum) representing this data instance
  - trainSemiSup
```
public void trainSemiSup()
```
  - weight
```
public double weight(java.lang.String feature,
                     java.lang.String label)
```
  - weights
```
public double[][] weights()
```
  - classifyWithGlobalInformation
```
public java.util.List<IN> classifyWithGlobalInformation(java.util.List<IN> tokenSeq,
                                                        CoreMap doc,
                                                        CoreMap sent)
```
    Description copied from class: AbstractSequenceClassifier
    
    Classify a List of something that extends CoreMap using as additional information whatever is stored in the document and sentence. This is needed for SUTime (NumberSequenceClassifier), which requires the document date to resolve relative dates.
    
    Specified by:
    
    classifyWithGlobalInformation in class AbstractSequenceClassifier<IN extends CoreLabel>
    
    Parameters:
    
    tokenSeq - A List of something that extends CoreMap
    
    Returns:
    
    Classified version of the input tokenSequence
  - scoresOf
```
public Counter<java.lang.String> scoresOf(java.util.List<IN> lineInfos,
                                          int pos)
```
  - printProbsDocument
```
public Triple<Counter<java.lang.Integer>,Counter<java.lang.Integer>,TwoDimensionalCounter<java.lang.Integer,java.lang.String>> printProbsDocument(java.util.List<IN> document)
```
    Takes a List of CoreLabels and prints the likelihood of each possible label at each point. TODO: Write this method!
    
    Overrides:
    
    printProbsDocument in class AbstractSequenceClassifier<IN extends CoreLabel>
    
    Parameters:
    
    document - A List of CoreLabels.
  - main
```
public static void main(java.lang.String[] args)
                 throws java.lang.Exception
```
    Command-line version of the classifier. See the class comments for examples of use, and SeqClassifierFlags for more information on supported flags.
    
    Throws:
    
    java.lang.Exception

Class CMMClassifier<IN extends CoreLabel>

Field Summary

Fields inherited from class edu.stanford.nlp.ie.AbstractSequenceClassifier

Constructor Summary

Method Summary

Methods inherited from class edu.stanford.nlp.ie.AbstractSequenceClassifier

Methods inherited from class java.lang.Object

Methods inherited from interface java.util.function.Function

Field Detail

DEFAULT_CLASSIFIER

Constructor Detail

CMMClassifier

CMMClassifier

CMMClassifier

Method Detail

getTags

classify

classOf

loglikelihood

getSequenceModel

adapt

adapt

retrain

retrain

train

getDataset

getDataset

getBiasedDataset

getDataset

getDataset

serializeClassifier

serializeClassifier

loadDefaultClassifier

getDefaultClassifier

loadClassifier

getClassifierNoExceptions

getClassifier

getClassifierNoExceptions

getClassifier

getClassifierNoExceptions

getClassifier

getClassifier

getClassifier

makeDatum

trainSemiSup

weight

weights

classifyWithGlobalInformation

scoresOf

printProbsDocument

main