CRFClassifier (Stanford CoreNLP API)

java.lang.Object
- edu.stanford.nlp.ie.AbstractSequenceClassifier<IN>
- - edu.stanford.nlp.ie.crf.CRFClassifier<IN>

All Implemented Interfaces:

java.util.function.Function<String,String>

Direct Known Subclasses:

CRFBiasedClassifier, CRFClassifierFloat, CRFClassifierNoisyLabel, CRFClassifierNonlinear, CRFClassifierWithDropout, CRFClassifierWithLOP
```
public class CRFClassifier<IN extends CoreMap>
extends AbstractSequenceClassifier<IN>
```
Class for Sequence Classification using a Conditional Random Field model. The code has functionality for different document formats, but when using the standard ColumnDocumentReaderAndWriter for training or testing models, input files are expected to be one token per line with the columns indicating things like the word, POS, chunk, and answer class. The default for ColumnDocumentReaderAndWriter training data is 3 column input, with the columns containing a word, its POS, and its gold class, but this can be specified via the map property.
When run on a file with -textFile, the file is assumed to be plain English text (or perhaps simple HTML/XML), and a reasonable attempt is made at English tokenization by PlainTextDocumentReaderAndWriter. The class used to read the text can be changed with -plainTextDocumentReaderAndWriter. Extra options can be supplied to the tokenizer using the -tokenizeOptions flag.
To read from stdin, use the flag -readStdin. The same reader/writer will be used as for -textFile.
Typical command-line usage
For running a trained model with a provided serialized classifier on a text file:
java -mx500m edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier conll.ner.gz -textFile samplesentences.txt
When specifying all parameters in a properties file (train, test, or runtime):
java -mx1g edu.stanford.nlp.ie.crf.CRFClassifier -prop propFile
To train and test a simple NER model from the command line:
java -mx1000m edu.stanford.nlp.ie.crf.CRFClassifier -trainFile trainFile -testFile testFile -macro > output

To train with multiple files:
java -mx1000m edu.stanford.nlp.ie.crf.CRFClassifier -trainFileList file1,file2,... -testFile testFile -macro > output

To test on multiple files, use the -testFiles option and a comma separated list.
Features are defined by a FeatureFactory. NERFeatureFactory is used by default, and you should look there for feature templates and properties or flags that will cause certain features to be used when training an NER classifier. There are also various feature factories for Chinese word segmentation such as ChineseSegmenterFeatureFactory. Features are specified either by a Properties file (which is the recommended method) or by flags on the command line. The flags are read into a SeqClassifierFlags object, which the user need not be concerned with, unless wishing to add new features.
CRFClassifier may also be used programmatically. When creating a new instance, you must specify a Properties object. You may then call train methods to train a classifier, or load a classifier. The other way to get a CRFClassifier is to deserialize one via the static getClassifier(String) methods, which return a deserialized classifier. You may then tag (classify the items of) documents using either the assorted classify() or the assorted classify methods in AbstractSequenceClassifier. Probabilities assigned by the CRF can be interrogated using either the printProbsDocument() or getCliqueTrees() methods.

Author:

Jenny Finkel, Sonal Gupta (made the class generic), Mengqiu Wang (LOP implementation and non-linear CRF implementation) TODO(mengqiu) need to move the embedding lookup and capitalization features into a FeatureFactory

Field Summary

Fields
Modifier and Type Field and Description

static String DEFAULT_CLASSIFIER
Name of default serialized classifier resource to look for in a jar file.
- Fields inherited from class edu.stanford.nlp.ie.AbstractSequenceClassifier
  classIndex, CUT_LABEL, featureFactories, flags, knownLCWords, pad, windowSize

Fields
Modifier and Type	Field and Description
`static String`	`DEFAULT_CLASSIFIER` Name of default serialized classifier resource to look for in a jar file.

Constructor Summary

Constructors
Modifier	Constructor and Description
`protected`	`CRFClassifier()`
	`CRFClassifier(CRFClassifier<IN> crf)` Makes a copy of the crf classifier
	`CRFClassifier(Properties props)`
	`CRFClassifier(SeqClassifierFlags flags)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected void`	`addProcessedData(List<List<CRFDatum<Collection<String>,String>>> processedData, int[][][][] data, int[][] labels, double[][][][] featureVals, int offset)` Adds the List of Lists of CRFDatums to the data and labels arrays, treating each datum as if it were its own document.
`protected static Index<CRFLabel>`	`allLabels(int window, Index<String> classIndex)`
`List<IN>`	`classify(List<IN> document)` Classify a `List` of something that extends`CoreMap`.
`List<IN>`	`classifyGibbs(List<IN> document)`
`List<IN>`	`classifyGibbs(List<IN> document, Triple<int[][][],int[],double[][][]> documentDataAndLabels)`
`List<IN>`	`classifyMaxEnt(List<IN> document)` Do standard sequence inference, using either Viterbi or Beam inference depending on the value of `flags.inferenceType`.
`List<IN>`	`classifyWithGlobalInformation(List<IN> tokenSeq, CoreMap doc, CoreMap sent)` Classify a `List` of something that extends `CoreMap` using as additional information whatever is stored in the document and sentence.
`void`	`combine(CRFClassifier<IN> crf, double weight)` Combines weighted crf with this crf
`Triple<int[][][][],int[][],double[][][][]>`	`documentsToDataAndLabels(Collection<List<IN>> documents)` Convert an ObjectBank to arrays of data features and labels.
`List<Triple<int[][][],int[],double[][][]>>`	`documentsToDataAndLabelsList(Collection<List<IN>> documents)` Convert an ObjectBank to corresponding collection of data features and labels.
`Triple<int[][][],int[],double[][][]>`	`documentToDataAndLabels(List<IN> document)` Convert a document List into arrays storing the data features and labels.
`void`	`dropFeaturesBelowThreshold(double threshold)`
`void`	`dumpFeatures(Collection<List<IN>> docs)` Does nothing by default.
`protected List<CRFDatum<? extends Collection<String>,? extends CharSequence>>`	`extractDatumSequence(int[][][] allData, int beginPosition, int endPosition, List<IN> labeledWordInfos)` Creates a new CRFDatum from the preprocessed allData format, given the document number, position number, and a List of Object labels.
`static <INN extends CoreMap> CRFClassifier<INN>`	`getClassifier(File file)` Loads a CRF classifier from a filepath, and returns it.
`static <INN extends CoreMap> CRFClassifier<INN>`	`getClassifier(InputStream in)` Loads a CRF classifier from an InputStream, and returns it.
`static CRFClassifier<CoreLabel>`	`getClassifier(String loadPath)`
`static <INN extends CoreMap> CRFClassifier<INN>`	`getClassifier(String loadPath, Properties props)`
`static <INN extends CoreMap> CRFClassifier<INN>`	`getClassifierNoExceptions(String loadPath)`
`protected CliquePotentialFunction`	`getCliquePotentialFunctionForTest()`
`CRFCliqueTree<String>`	`getCliqueTree(List<IN> document)`
`CRFCliqueTree<String>`	`getCliqueTree(Triple<int[][][],int[],double[][][]> p)`
`List<CRFCliqueTree<String>>`	`getCliqueTrees(String filename, DocumentReaderAndWriter<IN> readerAndWriter)` Want to make arbitrary probability queries? Then this is the method for you.
`static <INN extends CoreMap> CRFClassifier<INN>`	`getDefaultClassifier()` Used to get the default supplied classifier inside the jar file.
`static <INN extends CoreMap> CRFClassifier<INN>`	`getDefaultClassifier(Properties props)` Used to get the default supplied classifier inside the jar file.
`static <INN extends CoreMap> CRFClassifier<INN>`	`getJarClassifier(String resourceName, Properties props)` Used to load a classifier stored as a resource inside a jar file.
`Minimizer<DiffFunction>`	`getMinimizer()`
`Minimizer<DiffFunction>`	`getMinimizer(int featurePruneIteration, Evaluator[] evaluators)`
`int`	`getNumWeights()` Returns the total number of weights associated with this classifier.
`protected CRFLogConditionalObjectiveFunction`	`getObjectiveFunction(int[][][][] data, int[][] labels)`
`SequenceModel`	`getSequenceModel(List<IN> doc)`
`protected Collection<List<IN>>`	`loadAuxiliaryData(Collection<List<IN>> docs, DocumentReaderAndWriter<IN> readerAndWriter)` Load auxiliary data to be used in constructing features and labels Intended to be overridden by subclasses
`void`	`loadClassifier(ObjectInputStream ois, Properties props)` Loads a classifier from the specified InputStream.
`static Index<String>`	`loadClassIndexFromFile(String serializePath)`
`void`	`loadDefaultClassifier()` This is used to load the default supplied classifier stored within the jar file.
`void`	`loadDefaultClassifier(Properties props)` This is used to load the default supplied classifier stored within the jar file.
`static Index<String>`	`loadFeatureIndexFromFile(String serializePath)`
`protected static List<List<CRFDatum<Collection<String>,String>>>`	`loadProcessedData(String filename)`
`void`	`loadTagIndex()`
`protected void`	`loadTextClassifier(BufferedReader br)`
`void`	`loadTextClassifier(String text, Properties props)`
`static double[][]`	`loadWeightsFromFile(String serializePath)`
`static void`	`main(String[] args)` The main method.
`protected void`	`makeAnswerArraysAndTagIndex(Collection<List<IN>> ob)` This routine builds the `labelIndices` which give the empirically legal label sequences (of length (order) at most `windowSize`) and the `classIndex`, which indexes known answer classes.
`CRFDatum<List<String>,CRFLabel>`	`makeDatum(List<IN> info, int loc, List<FeatureFactory<IN>> featureFactories)` Makes a CRFDatum by producing features and a label from input data at a specific position, using the provided factory.
`void`	`printFactorTable(String filename, DocumentReaderAndWriter<IN> readerAndWriter)` Takes the file, reads it in, and prints out the factor table at each position.
`void`	`printFactorTableDocument(List<IN> document)` Takes a `List` of something that extends `CoreMap` and prints the factor table at each point.
`void`	`printFactorTableDocuments(ObjectBank<List<IN>> documents)` Takes a `List` of documents and prints the factor table at each point.
`protected void`	`printFeatures()`
`void`	`printFirstOrderProbs(String filename, DocumentReaderAndWriter<IN> readerAndWriter)` Takes the file, reads it in, and prints out the likelihood of each possible label at each point.
`void`	`printFirstOrderProbsDocument(List<IN> document)` Takes a `List` of something that extends `CoreMap` and prints the likelihood of each possible label at each point.
`void`	`printFirstOrderProbsDocuments(ObjectBank<List<IN>> documents)` Takes a `List` of documents and prints the likelihood of each possible label at each point.
`void`	`printLabelInformation(String testFile, DocumentReaderAndWriter<IN> readerAndWriter)`
`void`	`printLabelValue(List<IN> document)`
`void`	`printProbsDocument(List<IN> document)` Takes a `List` of something that extends `CoreMap` and prints the likelihood of each possible label at each point.
`protected void`	`pruneNodeFeatureIndices(int totalNumOfFeatureSlices, int numOfFeatureSlices)`
`protected static void`	`saveProcessedData(List datums, String filename)`
`void`	`scaleWeights(double scale)` Scales the weights of this CRFClassifier by the specified weight.
`void`	`serializeClassifier(ObjectOutputStream oos)` Serialize the classifier to the given ObjectOutputStream.
`void`	`serializeClassifier(String serializePath)` Serialize a sequence classifier to a file on the given path.
`void`	`serializeClassIndex(String serializePath)`
`void`	`serializeFeatureIndex(String serializePath)`
`protected void`	`serializeTextClassifier(PrintWriter pw)`
`void`	`serializeTextClassifier(String serializePath)` Serialize the model to a human readable format.
`void`	`serializeWeights(String serializePath)`
`double[][]`	`to2D(double[] weights, List<Index<CRFLabel>> labelIndices, int[] map)`
`Map<String,Counter<String>>`	`topWeights()`
`void`	`train(Collection<List<IN>> objectBankWrapper, DocumentReaderAndWriter<IN> readerAndWriter)` Trains a classifier from a Collection of sequences.
`protected double[]`	`trainWeights(int[][][][] data, int[][] labels, Evaluator[] evaluators, int pruneFeatureItr, double[][][][] featureVals)`
`void`	`updateWeightsForTest(double[] x)`
`void`	`writeWeights(PrintStream p)`

Methods inherited from class edu.stanford.nlp.ie.AbstractSequenceClassifier
apply, backgroundSymbol, classify, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswersKBest, classifyAndWriteAnswersKBest, classifyAndWriteViterbiSearchGraph, classifyFile, classifyFilesAndWriteAnswers, classifyFilesAndWriteAnswers, classifyKBest, classifyRaw, classifySentence, classifySentenceWithGlobalInformation, classifyStdin, classifyStdin, classifyToCharacterOffsets, classifyToString, classifyToString, classifyWithInlineXML, countResults, countResults, countResultsIOB, countResultsIOB2, countResultsSegmenter, defaultReaderAndWriter, finalizeClassification, getSampler, getViterbiSearchGraph, labels, loadClassifier, loadClassifier, loadClassifier, loadClassifier, loadClassifier, loadClassifier, loadClassifierNoExceptions, loadClassifierNoExceptions, loadClassifierNoExceptions, loadClassifierNoExceptions, loadClassifierNoExceptions, loadJarClassifier, makeObjectBankFromFile, makeObjectBankFromFile, makeObjectBankFromFiles, makeObjectBankFromFiles, makeObjectBankFromFiles, makeObjectBankFromReader, makeObjectBankFromString, makePlainTextReaderAndWriter, makeReaderAndWriter, plainTextReaderAndWriter, printFeatureLists, printFeatures, printProbs, printProbsDocuments, printResults, reinit, segmentString, segmentString, tallyOneEntityIOB, train, train, train, train, train, train, windowSize, writeAnswers

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface java.util.function.Function
andThen, compose, identity

Field Detail
- DEFAULT_CLASSIFIER
```
public static final String DEFAULT_CLASSIFIER
```
  Name of default serialized classifier resource to look for in a jar file.
  
  See Also:
  
  Constant Field Values

Constructor Detail

CRFClassifier
```
protected CRFClassifier()
```

CRFClassifier

public CRFClassifier(Properties props)

CRFClassifier

public CRFClassifier(SeqClassifierFlags flags)

CRFClassifier
```
public CRFClassifier(CRFClassifier<IN> crf)
```
Makes a copy of the crf classifier

Method Detail

getNumWeights
```
public int getNumWeights()
```
Returns the total number of weights associated with this classifier.

Returns:

number of weights

scaleWeights
```
public void scaleWeights(double scale)
```
Scales the weights of this CRFClassifier by the specified weight.

Parameters:

scale - The scale to multiply by

combine

public void combine(CRFClassifier<IN> crf,
                    double weight)

Combines weighted crf with this crf

Parameters:: crf -; weight -

dropFeaturesBelowThreshold

public void dropFeaturesBelowThreshold(double threshold)

documentToDataAndLabels
```
public Triple<int[][][],int[],double[][][]> documentToDataAndLabels(List<IN> document)
```
Convert a document List into arrays storing the data features and labels. This is used at test time.

Parameters:

document - Testing documents

Returns:

A Triple, where the first element is an int[][][] representing the data, the second element is an int[] representing the labels, and the third element is a double[][][] representing the feature values (optionally null)

printLabelInformation

public void printLabelInformation(String testFile,
                                  DocumentReaderAndWriter<IN> readerAndWriter)
                           throws Exception

Throws:: Exception

printLabelValue

public void printLabelValue(List<IN> document)

documentsToDataAndLabels
```
public Triple<int[][][][],int[][],double[][][][]> documentsToDataAndLabels(Collection<List<IN>> documents)
```
Convert an ObjectBank to arrays of data features and labels. This version is used at training time.

Returns:

A Triple, where the first element is an int[][][][] representing the data, the second element is an int[][] representing the labels, and the third element is a double[][][][] representing the feature values which could be optionally left as null.

documentsToDataAndLabelsList
```
public List<Triple<int[][][],int[],double[][][]>> documentsToDataAndLabelsList(Collection<List<IN>> documents)
```
Convert an ObjectBank to corresponding collection of data features and labels. This version is used at test time.

Returns:

A List of pairs, one for each document, where the first element is an int[][][] representing the data and the second element is an int[] representing the labels.

printFeatures
```
protected void printFeatures()
```

makeAnswerArraysAndTagIndex
```
protected void makeAnswerArraysAndTagIndex(Collection<List<IN>> ob)
```
This routine builds the labelIndices which give the empirically legal label sequences (of length (order) at most windowSize) and the classIndex, which indexes known answer classes.

Parameters:

ob - The training data: Read from an ObjectBank, each item in it is a List<CoreLabel>.

allLabels

protected static Index<CRFLabel> allLabels(int window,
                                           Index<String> classIndex)

makeDatum
```
public CRFDatum<List<String>,CRFLabel> makeDatum(List<IN> info,
                                                 int loc,
                                                 List<FeatureFactory<IN>> featureFactories)
```
Makes a CRFDatum by producing features and a label from input data at a specific position, using the provided factory.

Parameters:

info - The input data

loc - The position to build a datum at

featureFactories - The FeatureFactories to use to extract features

Returns:

The constructed CRFDatum

dumpFeatures
```
public void dumpFeatures(Collection<List<IN>> docs)
```
Description copied from class: AbstractSequenceClassifier

Does nothing by default. Children classes can override if necessary

Overrides:

dumpFeatures in class AbstractSequenceClassifier<IN extends CoreMap>

classify
```
public List<IN> classify(List<IN> document)
```
Description copied from class: AbstractSequenceClassifier

Classify a List of something that extendsCoreMap. The classifications are added in place to the items of the document, which is also returned by this method

Specified by:

classify in class AbstractSequenceClassifier<IN extends CoreMap>

Parameters:

document - A List of something that extends CoreMap.

Returns:

The same List, but with the elements annotated with their answers (stored under the CoreAnnotations.AnswerAnnotation key).

getSequenceModel
```
public SequenceModel getSequenceModel(List<IN> doc)
```
Overrides:

getSequenceModel in class AbstractSequenceClassifier<IN extends CoreMap>

getCliquePotentialFunctionForTest

protected CliquePotentialFunction getCliquePotentialFunctionForTest()

updateWeightsForTest

public void updateWeightsForTest(double[] x)

classifyMaxEnt
```
public List<IN> classifyMaxEnt(List<IN> document)
```
Do standard sequence inference, using either Viterbi or Beam inference depending on the value of flags.inferenceType.

Parameters:

document - Document to classify. Classification happens in place. This document is modified.

Returns:

The classified document

classifyGibbs

public List<IN> classifyGibbs(List<IN> document)
                       throws ClassNotFoundException,
                              SecurityException,
                              NoSuchMethodException,
                              IllegalArgumentException,
                              InstantiationException,
                              IllegalAccessException,
                              InvocationTargetException

Throws:: ClassNotFoundException; SecurityException; NoSuchMethodException; IllegalArgumentException; InstantiationException; IllegalAccessException; InvocationTargetException

classifyGibbs

public List<IN> classifyGibbs(List<IN> document,
                              Triple<int[][][],int[],double[][][]> documentDataAndLabels)
                       throws ClassNotFoundException,
                              SecurityException,
                              NoSuchMethodException,
                              IllegalArgumentException,
                              InstantiationException,
                              IllegalAccessException,
                              InvocationTargetException

Throws:: ClassNotFoundException; SecurityException; NoSuchMethodException; IllegalArgumentException; InstantiationException; IllegalAccessException; InvocationTargetException

printProbsDocument
```
public void printProbsDocument(List<IN> document)
```
Takes a List of something that extends CoreMap and prints the likelihood of each possible label at each point.

Specified by:

printProbsDocument in class AbstractSequenceClassifier<IN extends CoreMap>

Parameters:

document - A List of something that extends CoreMap.

printFirstOrderProbs
```
public void printFirstOrderProbs(String filename,
                                 DocumentReaderAndWriter<IN> readerAndWriter)
```
Takes the file, reads it in, and prints out the likelihood of each possible label at each point. This gives a simple way to examine the probability distributions of the CRF. See getCliqueTrees() for more.

Parameters:

filename - The path to the specified file

printFirstOrderProbsDocuments
```
public void printFirstOrderProbsDocuments(ObjectBank<List<IN>> documents)
```
Takes a List of documents and prints the likelihood of each possible label at each point.

Parameters:

documents - A List of List of INs.

printFactorTable

public void printFactorTable(String filename,
                             DocumentReaderAndWriter<IN> readerAndWriter)

Takes the file, reads it in, and prints out the factor table at each position.

Parameters:: filename - The path to the specified file

printFactorTableDocuments
```
public void printFactorTableDocuments(ObjectBank<List<IN>> documents)
```
Takes a List of documents and prints the factor table at each point.

Parameters:

documents - A List of List of INs.

getCliqueTrees
```
public List<CRFCliqueTree<String>> getCliqueTrees(String filename,
                                                  DocumentReaderAndWriter<IN> readerAndWriter)
```
Want to make arbitrary probability queries? Then this is the method for you. Given the filename, it reads it in and breaks it into documents, and then makes a CRFCliqueTree for each document. you can then ask the clique tree for marginals and conditional probabilities of almost anything you want.

getCliqueTree

public CRFCliqueTree<String> getCliqueTree(Triple<int[][][],int[],double[][][]> p)

getCliqueTree

public CRFCliqueTree<String> getCliqueTree(List<IN> document)

printFactorTableDocument
```
public void printFactorTableDocument(List<IN> document)
```
Takes a List of something that extends CoreMap and prints the factor table at each point.

Parameters:

document - A List of something that extends CoreMap.

printFirstOrderProbsDocument
```
public void printFirstOrderProbsDocument(List<IN> document)
```
Takes a List of something that extends CoreMap and prints the likelihood of each possible label at each point.

Parameters:

document - A List of something that extends CoreMap.

loadAuxiliaryData

protected Collection<List<IN>> loadAuxiliaryData(Collection<List<IN>> docs,
                                                 DocumentReaderAndWriter<IN> readerAndWriter)

Load auxiliary data to be used in constructing features and labels Intended to be overridden by subclasses

train
```
public void train(Collection<List<IN>> objectBankWrapper,
                  DocumentReaderAndWriter<IN> readerAndWriter)
```
Trains a classifier from a Collection of sequences. Note that the Collection can be (and usually is) an ObjectBank.

Specified by:

train in class AbstractSequenceClassifier<IN extends CoreMap>

Parameters:

objectBankWrapper - An ObjectBank or a collection of sequences of IN

readerAndWriter - A DocumentReaderAndWriter to use when loading test files

to2D

public double[][] to2D(double[] weights,
                       List<Index<CRFLabel>> labelIndices,
                       int[] map)

pruneNodeFeatureIndices

protected void pruneNodeFeatureIndices(int totalNumOfFeatureSlices,
                                       int numOfFeatureSlices)

getObjectiveFunction

protected CRFLogConditionalObjectiveFunction getObjectiveFunction(int[][][][] data,
                                                                  int[][] labels)

trainWeights

protected double[] trainWeights(int[][][][] data,
                                int[][] labels,
                                Evaluator[] evaluators,
                                int pruneFeatureItr,
                                double[][][][] featureVals)

getMinimizer

public Minimizer<DiffFunction> getMinimizer()

getMinimizer

public Minimizer<DiffFunction> getMinimizer(int featurePruneIteration,
                                            Evaluator[] evaluators)

extractDatumSequence

protected List<CRFDatum<? extends Collection<String>,? extends CharSequence>> extractDatumSequence(int[][][] allData,
                                                                                                   int beginPosition,
                                                                                                   int endPosition,
                                                                                                   List<IN> labeledWordInfos)

Creates a new CRFDatum from the preprocessed allData format, given the document number, position number, and a List of Object labels.

Returns:: A new CRFDatum

addProcessedData
```
protected void addProcessedData(List<List<CRFDatum<Collection<String>,String>>> processedData,
                                int[][][][] data,
                                int[][] labels,
                                double[][][][] featureVals,
                                int offset)
```
Adds the List of Lists of CRFDatums to the data and labels arrays, treating each datum as if it were its own document. Adds context labels in addition to the target label for each datum, meaning that for a particular document, the number of labels will be windowSize-1 greater than the number of datums.

Parameters:

processedData - a List of Lists of CRFDatums

saveProcessedData

protected static void saveProcessedData(List datums,
                                        String filename)

loadProcessedData

protected static List<List<CRFDatum<Collection<String>,String>>> loadProcessedData(String filename)

loadTextClassifier

protected void loadTextClassifier(BufferedReader br)
                           throws Exception

Throws:: Exception

loadTextClassifier

public void loadTextClassifier(String text,
                               Properties props)
                        throws ClassCastException,
                               IOException,
                               ClassNotFoundException,
                               InstantiationException,
                               IllegalAccessException

Throws:: ClassCastException; IOException; ClassNotFoundException; InstantiationException; IllegalAccessException

serializeTextClassifier

protected void serializeTextClassifier(PrintWriter pw)
                                throws Exception

Throws:: Exception

serializeTextClassifier
```
public void serializeTextClassifier(String serializePath)
```
Serialize the model to a human readable format. It's not yet complete. It should now work for Chinese segmenter though. TODO: check things in serializeClassifier and add other necessary serialization back.

Parameters:

serializePath - File to write text format of classifier to.

serializeClassIndex

public void serializeClassIndex(String serializePath)

loadClassIndexFromFile

public static Index<String> loadClassIndexFromFile(String serializePath)

serializeWeights

public void serializeWeights(String serializePath)

loadWeightsFromFile

public static double[][] loadWeightsFromFile(String serializePath)

serializeFeatureIndex

public void serializeFeatureIndex(String serializePath)

loadFeatureIndexFromFile

public static Index<String> loadFeatureIndexFromFile(String serializePath)

serializeClassifier
```
public void serializeClassifier(String serializePath)
```
Serialize a sequence classifier to a file on the given path.

Specified by:

serializeClassifier in class AbstractSequenceClassifier<IN extends CoreMap>

Parameters:

serializePath - The path/filename to write the classifier to.

serializeClassifier
```
public void serializeClassifier(ObjectOutputStream oos)
```
Serialize the classifier to the given ObjectOutputStream.
(Since the classifier is a processor, we don't want to serialize the whole classifier but just the data that represents a classifier model.)

loadClassifier
```
public void loadClassifier(ObjectInputStream ois,
                           Properties props)
                    throws ClassCastException,
                           IOException,
                           ClassNotFoundException
```
Loads a classifier from the specified InputStream. This version works quietly (unless VERBOSE is true). If props is non-null then any properties it specifies override those in the serialized file. However, only some properties are sensible to change (you shouldn't change how features are defined).
Note: This method does not close the ObjectInputStream. (But earlier versions of the code used to, so beware....)

Specified by:

loadClassifier in class AbstractSequenceClassifier<IN extends CoreMap>

Parameters:

ois - The InputStream to load the serialized classifier from

props - This Properties object will be used to update the SeqClassifierFlags which are read from the serialized classifier

Throws:

ClassCastException - If there are problems interpreting the serialized data

IOException - If there are problems accessing the input stream

ClassNotFoundException - If there are problems interpreting the serialized data

loadDefaultClassifier
```
public void loadDefaultClassifier()
```
This is used to load the default supplied classifier stored within the jar file. THIS FUNCTION WILL ONLY WORK IF THE CODE WAS LOADED FROM A JAR FILE WHICH HAS A SERIALIZED CLASSIFIER STORED INSIDE IT.

loadTagIndex
```
public void loadTagIndex()
```

loadDefaultClassifier
```
public void loadDefaultClassifier(Properties props)
```
This is used to load the default supplied classifier stored within the jar file. THIS FUNCTION WILL ONLY WORK IF THE CODE WAS LOADED FROM A JAR FILE WHICH HAS A SERIALIZED CLASSIFIER STORED INSIDE IT.

getDefaultClassifier
```
public static <INN extends CoreMap> CRFClassifier<INN> getDefaultClassifier()
```
Used to get the default supplied classifier inside the jar file. THIS FUNCTION WILL ONLY WORK IF THE CODE WAS LOADED FROM A JAR FILE WHICH HAS A SERIALIZED CLASSIFIER STORED INSIDE IT.

Returns:

The default CRFClassifier in the jar file (if there is one)

getDefaultClassifier
```
public static <INN extends CoreMap> CRFClassifier<INN> getDefaultClassifier(Properties props)
```
Used to get the default supplied classifier inside the jar file. THIS FUNCTION WILL ONLY WORK IF THE CODE WAS LOADED FROM A JAR FILE WHICH HAS A SERIALIZED CLASSIFIER STORED INSIDE IT.

Returns:

The default CRFClassifier in the jar file (if there is one)

getJarClassifier
```
public static <INN extends CoreMap> CRFClassifier<INN> getJarClassifier(String resourceName,
                                                                        Properties props)
```
Used to load a classifier stored as a resource inside a jar file. THIS FUNCTION WILL ONLY WORK IF THE CODE WAS LOADED FROM A JAR FILE WHICH HAS A SERIALIZED CLASSIFIER STORED INSIDE IT.

Parameters:

resourceName - Name of classifier resource inside the jar file.

Returns:

A CRFClassifier stored in the jar file

getClassifier

public static <INN extends CoreMap> CRFClassifier<INN> getClassifier(File file)
                                                              throws IOException,
                                                                     ClassCastException,
                                                                     ClassNotFoundException

Loads a CRF classifier from a filepath, and returns it.

Parameters:: file - File to load classifier from
Returns:: The CRF classifier
Throws:: IOException - If there are problems accessing the input stream; ClassCastException - If there are problems interpreting the serialized data; ClassNotFoundException - If there are problems interpreting the serialized data

getClassifier

public static <INN extends CoreMap> CRFClassifier<INN> getClassifier(InputStream in)
                                                              throws IOException,
                                                                     ClassCastException,
                                                                     ClassNotFoundException

Loads a CRF classifier from an InputStream, and returns it. This method does not buffer the InputStream, so you should have buffered it before calling this method.

Parameters:: in - InputStream to load classifier from
Returns:: The CRF classifier
Throws:: IOException - If there are problems accessing the input stream; ClassCastException - If there are problems interpreting the serialized data; ClassNotFoundException - If there are problems interpreting the serialized data

getClassifierNoExceptions

public static <INN extends CoreMap> CRFClassifier<INN> getClassifierNoExceptions(String loadPath)

getClassifier

public static CRFClassifier<CoreLabel> getClassifier(String loadPath)
                                              throws IOException,
                                                     ClassCastException,
                                                     ClassNotFoundException

Throws:: IOException; ClassCastException; ClassNotFoundException

getClassifier

public static <INN extends CoreMap> CRFClassifier<INN> getClassifier(String loadPath,
                                                                     Properties props)
                                                              throws IOException,
                                                                     ClassCastException,
                                                                     ClassNotFoundException

Throws:: IOException; ClassCastException; ClassNotFoundException

main

public static void main(String[] args)
                 throws Exception

The main method. See the class documentation.

Throws:: Exception

classifyWithGlobalInformation
```
public List<IN> classifyWithGlobalInformation(List<IN> tokenSeq,
                                              CoreMap doc,
                                              CoreMap sent)
```
Description copied from class: AbstractSequenceClassifier

Classify a List of something that extends CoreMap using as additional information whatever is stored in the document and sentence. This is needed for SUTime (NumberSequenceClassifier), which requires the document date to resolve relative dates.

Specified by:

classifyWithGlobalInformation in class AbstractSequenceClassifier<IN extends CoreMap>

Returns:

Classified version of the input tokenSequence

writeWeights

public void writeWeights(PrintStream p)

topWeights

public Map<String,Counter<String>> topWeights()

Class CRFClassifier<IN extends CoreMap>

Field Summary

Fields inherited from class edu.stanford.nlp.ie.AbstractSequenceClassifier

Constructor Summary

Method Summary

Methods inherited from class edu.stanford.nlp.ie.AbstractSequenceClassifier

Methods inherited from class java.lang.Object

Methods inherited from interface java.util.function.Function

Field Detail

DEFAULT_CLASSIFIER

Constructor Detail

CRFClassifier

CRFClassifier

CRFClassifier

CRFClassifier

Method Detail

getNumWeights

scaleWeights

combine

dropFeaturesBelowThreshold

documentToDataAndLabels

printLabelInformation

printLabelValue

documentsToDataAndLabels

documentsToDataAndLabelsList

printFeatures

makeAnswerArraysAndTagIndex

allLabels

makeDatum

dumpFeatures

classify

getSequenceModel

getCliquePotentialFunctionForTest

updateWeightsForTest

classifyMaxEnt

classifyGibbs

classifyGibbs

printProbsDocument

printFirstOrderProbs

printFirstOrderProbsDocuments

printFactorTable

printFactorTableDocuments

getCliqueTrees

getCliqueTree

getCliqueTree

printFactorTableDocument

printFirstOrderProbsDocument

loadAuxiliaryData

train

to2D

pruneNodeFeatureIndices

getObjectiveFunction

trainWeights

getMinimizer

getMinimizer

extractDatumSequence

addProcessedData

saveProcessedData

loadProcessedData

loadTextClassifier

loadTextClassifier

serializeTextClassifier

serializeTextClassifier

serializeClassIndex

loadClassIndexFromFile

serializeWeights

loadWeightsFromFile

serializeFeatureIndex

loadFeatureIndexFromFile

serializeClassifier

serializeClassifier

loadClassifier

loadDefaultClassifier

loadTagIndex

loadDefaultClassifier

getDefaultClassifier

getDefaultClassifier

getJarClassifier

getClassifier

getClassifier