public class CRFClassifier<IN extends CoreMap> extends AbstractSequenceClassifier<IN>
ColumnDocumentReaderAndWriter
for training
or testing models, input files are expected to
be one token per line with the columns indicating things like the word,
POS, chunk, and answer class. The default for
ColumnDocumentReaderAndWriter
training data is 3 column input,
with the columns containing a word, its POS, and its gold class, but
this can be specified via the map
property.
When run on a file with -textFile
or -textFiles
,
the file is assumed to be plain English text (or perhaps simple HTML/XML),
and a reasonable attempt is made at English tokenization by
PlainTextDocumentReaderAndWriter
. The class used to read
the text can be changed with -plainTextDocumentReaderAndWriter.
Extra options can be supplied to the tokenizer using the
-tokenizerOptions flag.
To read from stdin, use the flag -readStdin. The same reader/writer will be used as for -textFile.
Typical command-line usage
For running a trained model with a provided serialized classifier on a text file:
java -mx500m edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier
conll.ner.gz -textFile sampleSentences.txt
When specifying all parameters in a properties file (train, test, or runtime):
java -mx1g edu.stanford.nlp.ie.crf.CRFClassifier -prop propFile
To train and test a simple NER model from the command line:
java -mx1g edu.stanford.nlp.ie.crf.CRFClassifier -trainFile trainFile -testFile testFile -macro > output
To train with multiple files:
java -mx1g edu.stanford.nlp.ie.crf.CRFClassifier -trainFileList file1,file2,... -testFile testFile -macro > output
To test on multiple files, use the -testFiles option and a comma separated list.
Features are defined by a FeatureFactory
.
NERFeatureFactory
is used by default, and you should look
there for feature templates and properties or flags that will cause
certain features to be used when training an NER classifier. There
are also various feature factories for Chinese word segmentation
such as ChineseSegmenterFeatureFactory
.
Features are specified either
by a Properties file (which is the recommended method) or by flags on the
command line. The flags are read into a SeqClassifierFlags
object,
which the user need not be concerned with, unless wishing to add new
features.
CRFClassifier may also be used programmatically. When creating
a new instance, you must specify a Properties object. You may then
call train methods to train a classifier, or load a classifier. The other way
to get a CRFClassifier is to deserialize one via the static
getClassifier(String)
methods, which return a
deserialized classifier. You may then tag (classify the items of) documents
using either the assorted classify()
methods here or the additional
ones in AbstractSequenceClassifier
.
Probabilities assigned by the CRF can be interrogated using either the
printProbsDocument()
or getCliqueTrees()
methods.
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
DEFAULT_CLASSIFIER
Name of default serialized classifier resource to look for in a jar file.
|
classIndex, featureFactories, flags, knownLCWords, pad, windowSize
Modifier | Constructor and Description |
---|---|
protected |
CRFClassifier() |
|
CRFClassifier(CRFClassifier<IN> crf)
Makes a copy of the crf classifier
|
|
CRFClassifier(java.util.Properties props) |
|
CRFClassifier(SeqClassifierFlags flags) |
Modifier and Type | Method and Description |
---|---|
protected void |
addProcessedData(java.util.List<java.util.List<CRFDatum<java.util.Collection<java.lang.String>,java.lang.String>>> processedData,
int[][][][] data,
int[][] labels,
double[][][][] featureVals,
int offset)
Adds the List of Lists of CRFDatums to the data and labels arrays, treating
each datum as if it were its own document.
|
protected static Index<CRFLabel> |
allLabels(int window,
Index<java.lang.String> classIndex) |
java.util.List<IN> |
classify(java.util.List<IN> document)
Classify a
List of something that extendsCoreMap . |
java.util.List<IN> |
classifyGibbs(java.util.List<IN> document) |
java.util.List<IN> |
classifyGibbs(java.util.List<IN> document,
Triple<int[][][],int[],double[][][]> documentDataAndLabels) |
java.util.List<IN> |
classifyMaxEnt(java.util.List<IN> document)
Do standard sequence inference, using either Viterbi or Beam inference
depending on the value of
flags.inferenceType . |
java.util.List<IN> |
classifyWithGlobalInformation(java.util.List<IN> tokenSeq,
CoreMap doc,
CoreMap sent)
Classify a
List of something that extends CoreMap using as
additional information whatever is stored in the document and sentence. |
void |
combine(CRFClassifier<IN> crf,
double weight)
Combines weighted crf with this crf.
|
Triple<int[][][][],int[][],double[][][][]> |
documentsToDataAndLabels(java.util.Collection<java.util.List<IN>> documents)
Convert an ObjectBank to arrays of data features and labels.
|
java.util.List<Triple<int[][][],int[],double[][][]>> |
documentsToDataAndLabelsList(java.util.Collection<java.util.List<IN>> documents)
Convert an ObjectBank to corresponding collection of data features and
labels.
|
Triple<int[][][],int[],double[][][]> |
documentToDataAndLabels(java.util.List<IN> document)
Convert a document List into arrays storing the data features and labels.
|
void |
dropFeaturesBelowThreshold(double threshold) |
void |
dumpFeatures(java.util.Collection<java.util.List<IN>> docs)
Does nothing by default.
|
protected java.util.List<CRFDatum<? extends java.util.Collection<java.lang.String>,? extends java.lang.CharSequence>> |
extractDatumSequence(int[][][] allData,
int beginPosition,
int endPosition,
java.util.List<IN> labeledWordInfos)
Creates a new CRFDatum from the preprocessed allData format, given the
document number, position number, and a List of Object labels.
|
static <INN extends CoreMap> |
getClassifier(java.io.File file)
Loads a CRF classifier from a filepath, and returns it.
|
static <INN extends CoreMap> |
getClassifier(java.io.InputStream in)
Loads a CRF classifier from an InputStream, and returns it.
|
static <INN extends CoreMap> |
getClassifier(java.io.ObjectInputStream ois) |
static <INN extends CoreMap> |
getClassifier(java.io.ObjectInputStream ois,
java.util.Properties props) |
static CRFClassifier<CoreLabel> |
getClassifier(java.lang.String loadPath) |
static <INN extends CoreMap> |
getClassifier(java.lang.String loadPath,
java.util.Properties props) |
static <INN extends CoreMap> |
getClassifierNoExceptions(java.lang.String loadPath) |
protected CliquePotentialFunction |
getCliquePotentialFunctionForTest() |
CRFCliqueTree<java.lang.String> |
getCliqueTree(java.util.List<IN> document) |
CRFCliqueTree<java.lang.String> |
getCliqueTree(Triple<int[][][],int[],double[][][]> p) |
java.util.List<CRFCliqueTree<java.lang.String>> |
getCliqueTrees(java.lang.String filename,
DocumentReaderAndWriter<IN> readerAndWriter)
Want to make arbitrary probability queries? Then this is the method for
you.
|
static <INN extends CoreMap> |
getDefaultClassifier()
Used to get the default supplied classifier inside the jar file.
|
static <INN extends CoreMap> |
getDefaultClassifier(java.util.Properties props)
Used to get the default supplied classifier inside the jar file.
|
Minimizer<DiffFunction> |
getMinimizer() |
Minimizer<DiffFunction> |
getMinimizer(int featurePruneIteration,
Evaluator[] evaluators) |
int |
getNumWeights()
Returns the total number of weights associated with this classifier.
|
protected CRFLogConditionalObjectiveFunction |
getObjectiveFunction(int[][][][] data,
int[][] labels) |
SequenceModel |
getSequenceModel(java.util.List<IN> doc) |
protected java.util.Collection<java.util.List<IN>> |
loadAuxiliaryData(java.util.Collection<java.util.List<IN>> docs,
DocumentReaderAndWriter<IN> readerAndWriter)
Load auxiliary data to be used in constructing features and labels
Intended to be overridden by subclasses
|
void |
loadClassifier(java.io.ObjectInputStream ois,
java.util.Properties props)
Loads a classifier from the specified InputStream.
|
static Index<java.lang.String> |
loadClassIndexFromFile(java.lang.String serializePath) |
void |
loadDefaultClassifier()
This is used to load the default supplied classifier stored within the jar
file.
|
void |
loadDefaultClassifier(java.util.Properties props)
This is used to load the default supplied classifier stored within the jar
file.
|
static Index<java.lang.String> |
loadFeatureIndexFromFile(java.lang.String serializePath) |
protected static java.util.List<java.util.List<CRFDatum<java.util.Collection<java.lang.String>,java.lang.String>>> |
loadProcessedData(java.lang.String filename) |
void |
loadTagIndex() |
protected void |
loadTextClassifier(java.io.BufferedReader br) |
void |
loadTextClassifier(java.lang.String text,
java.util.Properties props) |
static double[][] |
loadWeightsFromFile(java.lang.String serializePath) |
static void |
main(java.lang.String[] args)
The main method.
|
protected void |
makeAnswerArraysAndTagIndex(java.util.Collection<java.util.List<IN>> ob)
This routine builds the
labelIndices which give the
empirically legal label sequences (of length (order) at most
windowSize ) and the classIndex , which indexes
known answer classes. |
CRFDatum<java.util.Collection<java.lang.String>,CRFLabel> |
makeDatum(java.util.List<IN> info,
int loc,
java.util.List<FeatureFactory<IN>> featureFactories)
Makes a CRFDatum by producing features and a label from input data at a
specific position, using the provided factory.
|
void |
printFactorTable(java.lang.String filename,
DocumentReaderAndWriter<IN> readerAndWriter)
Takes the file, reads it in, and prints out the factor table at each position.
|
void |
printFactorTableDocument(java.util.List<IN> document)
Takes a
List of something that extends CoreMap and prints
the factor table at each point. |
void |
printFactorTableDocuments(ObjectBank<java.util.List<IN>> documents)
Takes a
List of documents and prints the factor table
at each point. |
protected void |
printFeatures() |
void |
printFirstOrderProbs(java.lang.String filename,
DocumentReaderAndWriter<IN> readerAndWriter)
Takes the file, reads it in, and prints out the likelihood of each possible
label at each point.
|
void |
printFirstOrderProbsDocument(java.util.List<IN> document)
Takes a
List of something that extends CoreMap and prints
the likelihood of each possible label at each point. |
void |
printFirstOrderProbsDocuments(ObjectBank<java.util.List<IN>> documents)
Takes a
List of documents and prints the likelihood of each
possible label at each point. |
void |
printLabelInformation(java.lang.String testFile,
DocumentReaderAndWriter<IN> readerAndWriter) |
void |
printLabelValue(java.util.List<IN> document) |
Triple<Counter<java.lang.Integer>,Counter<java.lang.Integer>,TwoDimensionalCounter<java.lang.Integer,java.lang.String>> |
printProbsDocument(java.util.List<IN> document)
Takes a
List of something that extends CoreMap and prints
the likelihood of each possible label at each point. |
protected void |
pruneNodeFeatureIndices(int totalNumOfFeatureSlices,
int numOfFeatureSlices) |
protected static void |
saveProcessedData(java.util.List<?> datums,
java.lang.String filename) |
void |
scaleWeights(double scale)
Scales the weights of this CRFClassifier by the specified weight.
|
void |
serializeClassifier(java.io.ObjectOutputStream oos)
Serialize the classifier to the given ObjectOutputStream.
|
void |
serializeClassifier(java.lang.String serializePath)
Serialize a sequence classifier to a file on the given path.
|
void |
serializeClassIndex(java.lang.String serializePath) |
void |
serializeFeatureIndex(java.lang.String serializePath) |
protected void |
serializeTextClassifier(java.io.PrintWriter pw) |
void |
serializeTextClassifier(java.lang.String serializePath)
Serialize the model to a human readable format.
|
void |
serializeWeights(java.lang.String serializePath) |
static float[][] |
to2D(double[] weights,
java.util.List<Index<CRFLabel>> labelIndices,
int[] map) |
java.util.Map<java.lang.String,Counter<java.lang.String>> |
topWeights() |
java.lang.String |
toString() |
void |
train(java.util.Collection<java.util.List<IN>> objectBankWrapper,
DocumentReaderAndWriter<IN> readerAndWriter)
Trains a classifier from a Collection of sequences.
|
protected double[] |
trainWeights(int[][][][] data,
int[][] labels,
Evaluator[] evaluators,
int pruneFeatureItr,
double[][][][] featureVals) |
void |
updateWeightsForTest(double[] x) |
void |
writeWeights(java.io.PrintStream p) |
java.util.List<Counter<java.lang.String>> |
zeroOrderProbabilities(java.util.List<IN> document) |
apply, backgroundSymbol, classify, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswersKBest, classifyAndWriteAnswersKBest, classifyAndWriteViterbiSearchGraph, classifyFile, classifyFilesAndWriteAnswers, classifyFilesAndWriteAnswers, classifyKBest, classifyRaw, classifySentence, classifySentenceWithGlobalInformation, classifyStdin, classifyStdin, classifyToCharacterOffsets, classifyToString, classifyToString, classifyWithInlineXML, countResults, countResultsSegmenter, defaultReaderAndWriter, finalizeClassification, getKnownLCWords, getSampler, labels, loadClassifier, loadClassifier, loadClassifier, loadClassifier, loadClassifier, loadClassifier, loadClassifierNoExceptions, loadClassifierNoExceptions, loadClassifierNoExceptions, loadClassifierNoExceptions, loadClassifierNoExceptions, makeObjectBankFromFile, makeObjectBankFromFile, makeObjectBankFromFiles, makeObjectBankFromFiles, makeObjectBankFromFiles, makeObjectBankFromReader, makeObjectBankFromString, makePlainTextReaderAndWriter, makePlainTextReaderAndWriter, makeReaderAndWriter, plainTextReaderAndWriter, printFeatureLists, printFeatures, printProbs, printProbs, printProbsDocuments, printResults, reinit, segmentString, segmentString, train, train, train, train, train, train, windowSize, writeAnswers
public static final java.lang.String DEFAULT_CLASSIFIER
protected CRFClassifier()
public CRFClassifier(java.util.Properties props)
public CRFClassifier(SeqClassifierFlags flags)
public CRFClassifier(CRFClassifier<IN> crf)
public int getNumWeights()
public void scaleWeights(double scale)
scale
- The scale to multiply bypublic void combine(CRFClassifier<IN> crf, double weight)
crf
- Other CRF whose weights to combine into this CRFweight
- Amount to scale the other CRF's weights bypublic void dropFeaturesBelowThreshold(double threshold)
public Triple<int[][][],int[],double[][][]> documentToDataAndLabels(java.util.List<IN> document)
document
- Testing documentspublic void printLabelInformation(java.lang.String testFile, DocumentReaderAndWriter<IN> readerAndWriter) throws java.lang.Exception
java.lang.Exception
public void printLabelValue(java.util.List<IN> document)
public Triple<int[][][][],int[][],double[][][][]> documentsToDataAndLabels(java.util.Collection<java.util.List<IN>> documents)
public java.util.List<Triple<int[][][],int[],double[][][]>> documentsToDataAndLabelsList(java.util.Collection<java.util.List<IN>> documents)
protected void printFeatures()
protected void makeAnswerArraysAndTagIndex(java.util.Collection<java.util.List<IN>> ob)
labelIndices
which give the
empirically legal label sequences (of length (order) at most
windowSize
) and the classIndex
, which indexes
known answer classes.ob
- The training data: Read from an ObjectBank, each item in it is a
List<CoreLabel>
.protected static Index<CRFLabel> allLabels(int window, Index<java.lang.String> classIndex)
public CRFDatum<java.util.Collection<java.lang.String>,CRFLabel> makeDatum(java.util.List<IN> info, int loc, java.util.List<FeatureFactory<IN>> featureFactories)
info
- The input data. Particular feature factories might look for arbitrary keys in the IN items.loc
- The position to build a datum atfeatureFactories
- The FeatureFactories to use to extract featurespublic void dumpFeatures(java.util.Collection<java.util.List<IN>> docs)
AbstractSequenceClassifier
dumpFeatures
in class AbstractSequenceClassifier<IN extends CoreMap>
public java.util.List<IN> classify(java.util.List<IN> document)
AbstractSequenceClassifier
List
of something that extendsCoreMap
.
The classifications are added in place to the items of the document,
which is also returned by this method.
Warning: In many circumstances, you should not call this method directly.
In particular, if you call this method directly, your document will not be preprocessed
to add things like word distributional similarity class or word shape features that your
classifier may rely on to work correctly. In such cases, you should call
classifySentence
instead.classify
in class AbstractSequenceClassifier<IN extends CoreMap>
document
- A List
of something that extends CoreMap
.List
, but with the elements annotated with their
answers (stored under the
CoreAnnotations.AnswerAnnotation
key). The answers will be the class labels defined by the CRF
Classifier. They might be things like entity labels (in BIO
notation or not) or something like "1" vs. "0" on whether to
begin a new token here or not (in word segmentation).public SequenceModel getSequenceModel(java.util.List<IN> doc)
getSequenceModel
in class AbstractSequenceClassifier<IN extends CoreMap>
protected CliquePotentialFunction getCliquePotentialFunctionForTest()
public void updateWeightsForTest(double[] x)
public java.util.List<IN> classifyMaxEnt(java.util.List<IN> document)
flags.inferenceType
.document
- Document to classify. Classification happens in place.
This document is modified.public java.util.List<IN> classifyGibbs(java.util.List<IN> document) throws java.lang.ClassNotFoundException, java.lang.SecurityException, java.lang.NoSuchMethodException, java.lang.IllegalArgumentException, java.lang.InstantiationException, java.lang.IllegalAccessException, java.lang.reflect.InvocationTargetException
java.lang.ClassNotFoundException
java.lang.SecurityException
java.lang.NoSuchMethodException
java.lang.IllegalArgumentException
java.lang.InstantiationException
java.lang.IllegalAccessException
java.lang.reflect.InvocationTargetException
public java.util.List<IN> classifyGibbs(java.util.List<IN> document, Triple<int[][][],int[],double[][][]> documentDataAndLabels) throws java.lang.ClassNotFoundException, java.lang.SecurityException, java.lang.NoSuchMethodException, java.lang.IllegalArgumentException, java.lang.InstantiationException, java.lang.IllegalAccessException, java.lang.reflect.InvocationTargetException
java.lang.ClassNotFoundException
java.lang.SecurityException
java.lang.NoSuchMethodException
java.lang.IllegalArgumentException
java.lang.InstantiationException
java.lang.IllegalAccessException
java.lang.reflect.InvocationTargetException
public Triple<Counter<java.lang.Integer>,Counter<java.lang.Integer>,TwoDimensionalCounter<java.lang.Integer,java.lang.String>> printProbsDocument(java.util.List<IN> document)
List
of something that extends CoreMap
and prints
the likelihood of each possible label at each point.printProbsDocument
in class AbstractSequenceClassifier<IN extends CoreMap>
document
- A List
of something that extends CoreMap.public java.util.List<Counter<java.lang.String>> zeroOrderProbabilities(java.util.List<IN> document)
public void printFirstOrderProbs(java.lang.String filename, DocumentReaderAndWriter<IN> readerAndWriter)
getCliqueTrees()
for more.filename
- The path to the specified filepublic void printFirstOrderProbsDocuments(ObjectBank<java.util.List<IN>> documents)
List
of documents and prints the likelihood of each
possible label at each point.documents
- A List
of List
of INs.public void printFactorTable(java.lang.String filename, DocumentReaderAndWriter<IN> readerAndWriter)
filename
- The path to the specified filepublic void printFactorTableDocuments(ObjectBank<java.util.List<IN>> documents)
List
of documents and prints the factor table
at each point.documents
- A List
of List
of INs.public java.util.List<CRFCliqueTree<java.lang.String>> getCliqueTrees(java.lang.String filename, DocumentReaderAndWriter<IN> readerAndWriter)
public CRFCliqueTree<java.lang.String> getCliqueTree(Triple<int[][][],int[],double[][][]> p)
public CRFCliqueTree<java.lang.String> getCliqueTree(java.util.List<IN> document)
public void printFactorTableDocument(java.util.List<IN> document)
List
of something that extends CoreMap
and prints
the factor table at each point.document
- A List
of something that extends CoreMap
.public void printFirstOrderProbsDocument(java.util.List<IN> document)
List
of something that extends CoreMap
and prints
the likelihood of each possible label at each point.document
- A List
of something that extends CoreMap
.protected java.util.Collection<java.util.List<IN>> loadAuxiliaryData(java.util.Collection<java.util.List<IN>> docs, DocumentReaderAndWriter<IN> readerAndWriter)
public void train(java.util.Collection<java.util.List<IN>> objectBankWrapper, DocumentReaderAndWriter<IN> readerAndWriter)
train
in class AbstractSequenceClassifier<IN extends CoreMap>
objectBankWrapper
- An ObjectBank or a collection of sequences of INreaderAndWriter
- A DocumentReaderAndWriter to use when loading test filespublic static float[][] to2D(double[] weights, java.util.List<Index<CRFLabel>> labelIndices, int[] map)
protected void pruneNodeFeatureIndices(int totalNumOfFeatureSlices, int numOfFeatureSlices)
protected CRFLogConditionalObjectiveFunction getObjectiveFunction(int[][][][] data, int[][] labels)
protected double[] trainWeights(int[][][][] data, int[][] labels, Evaluator[] evaluators, int pruneFeatureItr, double[][][][] featureVals)
public Minimizer<DiffFunction> getMinimizer()
public Minimizer<DiffFunction> getMinimizer(int featurePruneIteration, Evaluator[] evaluators)
protected java.util.List<CRFDatum<? extends java.util.Collection<java.lang.String>,? extends java.lang.CharSequence>> extractDatumSequence(int[][][] allData, int beginPosition, int endPosition, java.util.List<IN> labeledWordInfos)
protected void addProcessedData(java.util.List<java.util.List<CRFDatum<java.util.Collection<java.lang.String>,java.lang.String>>> processedData, int[][][][] data, int[][] labels, double[][][][] featureVals, int offset)
processedData
- A List of Lists of CRFDatumsprotected static void saveProcessedData(java.util.List<?> datums, java.lang.String filename)
protected static java.util.List<java.util.List<CRFDatum<java.util.Collection<java.lang.String>,java.lang.String>>> loadProcessedData(java.lang.String filename)
protected void loadTextClassifier(java.io.BufferedReader br) throws java.lang.Exception
java.lang.Exception
public void loadTextClassifier(java.lang.String text, java.util.Properties props) throws java.lang.ClassCastException, java.io.IOException, java.lang.ClassNotFoundException, java.lang.InstantiationException, java.lang.IllegalAccessException
java.lang.ClassCastException
java.io.IOException
java.lang.ClassNotFoundException
java.lang.InstantiationException
java.lang.IllegalAccessException
protected void serializeTextClassifier(java.io.PrintWriter pw) throws java.lang.Exception
java.lang.Exception
public void serializeTextClassifier(java.lang.String serializePath)
serializePath
- File to write text format of classifier to.public void serializeClassIndex(java.lang.String serializePath)
public static Index<java.lang.String> loadClassIndexFromFile(java.lang.String serializePath)
public void serializeWeights(java.lang.String serializePath)
public static double[][] loadWeightsFromFile(java.lang.String serializePath)
public void serializeFeatureIndex(java.lang.String serializePath)
public static Index<java.lang.String> loadFeatureIndexFromFile(java.lang.String serializePath)
public void serializeClassifier(java.lang.String serializePath)
serializeClassifier
in class AbstractSequenceClassifier<IN extends CoreMap>
serializePath
- The path/filename to write the classifier to.public void serializeClassifier(java.io.ObjectOutputStream oos)
serializeClassifier
in class AbstractSequenceClassifier<IN extends CoreMap>
public void loadClassifier(java.io.ObjectInputStream ois, java.util.Properties props) throws java.lang.ClassCastException, java.io.IOException, java.lang.ClassNotFoundException
Note: This method does not close the ObjectInputStream. (But earlier versions of the code used to, so beware....)
loadClassifier
in class AbstractSequenceClassifier<IN extends CoreMap>
ois
- The InputStream to load the serialized classifier fromprops
- This Properties object will be used to update the
SeqClassifierFlags which are read from the serialized classifierjava.lang.ClassCastException
- If there are problems interpreting the serialized datajava.io.IOException
- If there are problems accessing the input streamjava.lang.ClassNotFoundException
- If there are problems interpreting the serialized datapublic void loadDefaultClassifier()
public void loadTagIndex()
public void writeWeights(java.io.PrintStream p)
public java.util.Map<java.lang.String,Counter<java.lang.String>> topWeights()
public java.util.List<IN> classifyWithGlobalInformation(java.util.List<IN> tokenSeq, CoreMap doc, CoreMap sent)
AbstractSequenceClassifier
List
of something that extends CoreMap
using as
additional information whatever is stored in the document and sentence.
This is needed for SUTime (NumberSequenceClassifier), which requires
the document date to resolve relative dates.classifyWithGlobalInformation
in class AbstractSequenceClassifier<IN extends CoreMap>
tokenSeq
- A List
of something that extends CoreMap
public void loadDefaultClassifier(java.util.Properties props)
public static <INN extends CoreMap> CRFClassifier<INN> getDefaultClassifier()
public static <INN extends CoreMap> CRFClassifier<INN> getDefaultClassifier(java.util.Properties props)
public static <INN extends CoreMap> CRFClassifier<INN> getClassifier(java.io.File file) throws java.io.IOException, java.lang.ClassCastException, java.lang.ClassNotFoundException
file
- File to load classifier fromjava.io.IOException
- If there are problems accessing the input streamjava.lang.ClassCastException
- If there are problems interpreting the serialized datajava.lang.ClassNotFoundException
- If there are problems interpreting the serialized datapublic static <INN extends CoreMap> CRFClassifier<INN> getClassifier(java.io.InputStream in) throws java.io.IOException, java.lang.ClassCastException, java.lang.ClassNotFoundException
in
- InputStream to load classifier fromjava.io.IOException
- If there are problems accessing the input streamjava.lang.ClassCastException
- If there are problems interpreting the serialized datajava.lang.ClassNotFoundException
- If there are problems interpreting the serialized datapublic static <INN extends CoreMap> CRFClassifier<INN> getClassifier(java.io.ObjectInputStream ois) throws java.io.IOException, java.lang.ClassCastException, java.lang.ClassNotFoundException
java.io.IOException
java.lang.ClassCastException
java.lang.ClassNotFoundException
public static <INN extends CoreMap> CRFClassifier<INN> getClassifierNoExceptions(java.lang.String loadPath)
public static CRFClassifier<CoreLabel> getClassifier(java.lang.String loadPath) throws java.io.IOException, java.lang.ClassCastException, java.lang.ClassNotFoundException
java.io.IOException
java.lang.ClassCastException
java.lang.ClassNotFoundException
public static <INN extends CoreMap> CRFClassifier<INN> getClassifier(java.lang.String loadPath, java.util.Properties props) throws java.io.IOException, java.lang.ClassCastException, java.lang.ClassNotFoundException
java.io.IOException
java.lang.ClassCastException
java.lang.ClassNotFoundException
public static <INN extends CoreMap> CRFClassifier<INN> getClassifier(java.io.ObjectInputStream ois, java.util.Properties props) throws java.io.IOException, java.lang.ClassCastException, java.lang.ClassNotFoundException
java.io.IOException
java.lang.ClassCastException
java.lang.ClassNotFoundException
public java.lang.String toString()
toString
in class java.lang.Object
public static void main(java.lang.String[] args) throws java.lang.Exception
java.lang.Exception