edu.stanford.nlp.ie
Class ClassifierCombiner<IN extends CoreMap & HasWord>

java.lang.Object
  extended by edu.stanford.nlp.ie.AbstractSequenceClassifier<IN>
      extended by edu.stanford.nlp.ie.ClassifierCombiner<IN>
All Implemented Interfaces:
Function<String,String>

public class ClassifierCombiner<IN extends CoreMap & HasWord>
extends AbstractSequenceClassifier<IN>

Merges the outputs of two or more AbstractSequenceClassifiers according to a simple precedence scheme: any given base classifier contributes only classifications of labels that do not exist in the base classifiers specified before, and that do not have any token overlap with labels assigned by higher priority classifiers.

This is a pure AbstractSequenceClassifier, i.e., it sets the AnswerAnnotation label. If you work with NER classifiers, you should use NERClassifierCombiner. This class inherits from ClassifierCombiner, and takes care that all AnswerAnnotations are also copied to NERAnnotation.

You can specify up to 10 base classifiers using the -loadClassifier1 to -loadClassifier10 properties. We also maintain the older usage when only two base classifiers were accepted, specified using -loadClassifier and -loadAuxClassifier.

ms 2009: removed all NER functionality (see NERClassifierCombiner), changed code so it accepts an arbitrary number of base classifiers, removed dead code.

Author:
Chris Cox, Mihai Surdeanu

Field Summary
 
Fields inherited from class edu.stanford.nlp.ie.AbstractSequenceClassifier
classIndex, featureFactory, flags, knownLCWords, pad, windowSize
 
Constructor Summary
ClassifierCombiner(AbstractSequenceClassifier<IN>... classifiers)
          Combines a series of base classifiers
ClassifierCombiner(Properties p)
           
ClassifierCombiner(String... loadPaths)
          Loads a series of base classifiers from the paths specified.
 
Method Summary
 List<IN> classify(List<IN> tokens)
          Generates the AnswerAnnotation labels of the combined model for the given tokens, storing them in place in the tokens.
 List<IN> classifyWithGlobalInformation(List<IN> tokenSeq, CoreMap doc, CoreMap sent)
          Classify a List of something that extends CoreMap using as additional information whatever is stored in the document and sentence.
 Set<String> labels()
           
 void loadClassifier(ObjectInputStream in, Properties props)
          Load a classifier from the specified input stream.
static
<INN extends CoreMap & HasWord>
AbstractSequenceClassifier<INN>
loadClassifierFromPath(String path)
           
static void main(String[] args)
          Some basic testing of the ClassifierCombiner.
 void printProbsDocument(List<IN> document)
           
 void serializeClassifier(String serializePath)
          Serialize a sequence classifier to a file on the given path.
 void train(Collection<List<IN>> docs, DocumentReaderAndWriter<IN> readerAndWriter)
          Trains a classifier from a Collection of sequences.
 
Methods inherited from class edu.stanford.nlp.ie.AbstractSequenceClassifier
apply, backgroundSymbol, classify, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswersKBest, classifyAndWriteAnswersKBest, classifyAndWriteViterbiSearchGraph, classifyFile, classifyKBest, classifyRaw, classifySentence, classifySentenceWithGlobalInformation, classifyStdin, classifyStdin, classifyToCharacterOffsets, classifyToString, classifyToString, classifyWithInlineXML, countResults, countResultsIOB, defaultReaderAndWriter, getSampler, getSequenceModel, getViterbiSearchGraph, loadClassifier, loadClassifier, loadClassifier, loadClassifier, loadClassifier, loadClassifier, loadClassifierNoExceptions, loadClassifierNoExceptions, loadClassifierNoExceptions, loadClassifierNoExceptions, loadClassifierNoExceptions, loadJarClassifier, makeObjectBankFromFile, makeObjectBankFromFiles, makeObjectBankFromFiles, makeObjectBankFromFiles, makeObjectBankFromReader, makeObjectBankFromString, makePlainTextReaderAndWriter, makeReaderAndWriter, plainTextReaderAndWriter, printFeatureLists, printFeatures, printProbs, printProbsDocuments, printResults, reinit, segmentString, segmentString, tallyOneEntity, train, train, train, train, train, train, writeAnswers
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ClassifierCombiner

public ClassifierCombiner(Properties p)
                   throws FileNotFoundException
Parameters:
p - Properties File that specifies loadClassifier and loadAuxClassifier properties or, alternatively, loadClassifier[1-10] properties.
Throws:
FileNotFoundException - If classifier files not found

ClassifierCombiner

public ClassifierCombiner(String... loadPaths)
                   throws FileNotFoundException
Loads a series of base classifiers from the paths specified.

Parameters:
loadPaths - Paths to the base classifiers
Throws:
FileNotFoundException - If classifier files not found

ClassifierCombiner

public ClassifierCombiner(AbstractSequenceClassifier<IN>... classifiers)
Combines a series of base classifiers

Parameters:
classifiers - The base classifiers
Method Detail

loadClassifierFromPath

public static <INN extends CoreMap & HasWord> AbstractSequenceClassifier<INN> loadClassifierFromPath(String path)
                                                                                       throws FileNotFoundException
Throws:
FileNotFoundException

labels

public Set<String> labels()
Overrides:
labels in class AbstractSequenceClassifier<IN extends CoreMap & HasWord>

classify

public List<IN> classify(List<IN> tokens)
Generates the AnswerAnnotation labels of the combined model for the given tokens, storing them in place in the tokens.

Specified by:
classify in class AbstractSequenceClassifier<IN extends CoreMap & HasWord>
Parameters:
tokens - A List of IN
Returns:
The passed in parameters, which will have the AnswerAnnotation field added/overwritten

train

public void train(Collection<List<IN>> docs,
                  DocumentReaderAndWriter<IN> readerAndWriter)
Description copied from class: AbstractSequenceClassifier
Trains a classifier from a Collection of sequences. Note that the Collection can be (and usually is) an ObjectBank.

Specified by:
train in class AbstractSequenceClassifier<IN extends CoreMap & HasWord>
Parameters:
docs - An Objectbank or a collection of sequences of IN
readerAndWriter - A DocumentReaderAndWriter to use when loading test files

printProbsDocument

public void printProbsDocument(List<IN> document)
Specified by:
printProbsDocument in class AbstractSequenceClassifier<IN extends CoreMap & HasWord>

serializeClassifier

public void serializeClassifier(String serializePath)
Description copied from class: AbstractSequenceClassifier
Serialize a sequence classifier to a file on the given path.

Specified by:
serializeClassifier in class AbstractSequenceClassifier<IN extends CoreMap & HasWord>
Parameters:
serializePath - The path/filename to write the classifier to.

loadClassifier

public void loadClassifier(ObjectInputStream in,
                           Properties props)
                    throws IOException,
                           ClassCastException,
                           ClassNotFoundException
Description copied from class: AbstractSequenceClassifier
Load a classifier from the specified input stream. The classifier is reinitialized from the flags serialized in the classifier.

Specified by:
loadClassifier in class AbstractSequenceClassifier<IN extends CoreMap & HasWord>
Parameters:
in - The InputStream to load the serialized classifier from
props - This Properties object will be used to update the SeqClassifierFlags which are read from the serialized classifier
Throws:
IOException - If there are problems accessing the input stream
ClassCastException - If there are problems interpreting the serialized data
ClassNotFoundException - If there are problems interpreting the serialized data

classifyWithGlobalInformation

public List<IN> classifyWithGlobalInformation(List<IN> tokenSeq,
                                              CoreMap doc,
                                              CoreMap sent)
Description copied from class: AbstractSequenceClassifier
Classify a List of something that extends CoreMap using as additional information whatever is stored in the document and sentence. This is needed for SUTime (NumberSequenceClassifier), which requires the document date to resolve relative dates.

Specified by:
classifyWithGlobalInformation in class AbstractSequenceClassifier<IN extends CoreMap & HasWord>
Returns:
Classified version of the input tokenSequence

main

public static void main(String[] args)
                 throws Exception
Some basic testing of the ClassifierCombiner.

Parameters:
args - Command-line arguments as properties: -loadClassifier1 serializedFile -loadClassifier2 serializedFile
Throws:
Exception - If IO or serialization error loading classifiers


Stanford NLP Group