edu.stanford.nlp.ie
Class AbstractSequenceClassifier

java.lang.Object
  extended by edu.stanford.nlp.ie.AbstractSequenceClassifier
All Implemented Interfaces:
Function, Serializable
Direct Known Subclasses:
CRFClassifier

public abstract class AbstractSequenceClassifier
extends Object
implements Function

This class provides common functionality for (probabilistic) sequence models. It is a superclass of our CMM and CRF sequence classifiers, and is even used in the (deterministic) NumberSequenceClassifier. See implementing classes for more information.

Author:
Jenny Finkel, Dan Klein, Christopher Manning, Dan Cer
See Also:
Serialized Form

Field Summary
 Index<String> classIndex
           
 FeatureFactory featureFactory
           
 SeqClassifierFlags flags
           
static String JAR_CLASSIFIER_PATH
           
protected  Set<String> knownLCWords
           
protected  FeatureLabel pad
           
protected  DocumentReaderAndWriter readerAndWriter
           
 int windowSize
           
 
Constructor Summary
AbstractSequenceClassifier()
          This does nothing.
 
Method Summary
 Object apply(Object in)
          Maps a String input to an XML-formatted rendition of applying NER to the String.
 String backgroundSymbol()
           
 Sampler<List<FeatureLabel>> getSampler(List<FeatureLabel> input)
           
 SequenceModel getSequenceModel(List<FeatureLabel> doc)
           
protected  void init(Properties props)
           
protected  void init(SeqClassifierFlags flags)
           
 Set<String> labels()
           
 void loadClassifier(File file)
           
 void loadClassifier(File file, Properties props)
          Loads a classifier from the file specified by loadPath.
 void loadClassifier(InputStream in)
           
abstract  void loadClassifier(InputStream in, Properties props)
          Load a classsifier from the specified input stream.
 void loadClassifier(String loadPath)
          Loads a classifier from the file specified by loadPath.
 void loadClassifierNoExceptions(BufferedInputStream in)
          Loads a classifier from the given input stream.
 void loadClassifierNoExceptions(File file)
           
 void loadClassifierNoExceptions(File file, Properties props)
           
 void loadClassifierNoExceptions(String loadPath)
           
 void loadClassifierNoExceptions(String loadPath, Properties props)
           
 void loadJarClassifier(String modelName, Properties props)
          This function will load a classifier that is stored inside a jar file (if it is so stored).
 ObjectBank<List<FeatureLabel>> makeObjectBank(BufferedReader in)
           
protected  ObjectBank<List<FeatureLabel>> makeObjectBank(BufferedReader in, boolean quietly)
          Set up an ObjectBank that will allow one to iterate over a collection of documents obtained from the passed in Reader.
 ObjectBank<List<FeatureLabel>> makeObjectBank(String filenameOrString)
           
 ObjectBank<List<FeatureLabel>> makeObjectBank(String filenameOrString, boolean quietly)
           
 void printProbs(String filename)
          Takes the file, reads it in, and prints out the likelihood of each possible label at each point.
abstract  void printProbsDocument(List<FeatureLabel> document)
           
 void printProbsDocuments(ObjectBank<List<FeatureLabel>> documents)
          Takes a List of documents and prints the likelihood of each possible label at each point.
protected  void reinit()
          This method should be called after there have been changes to the flags (SeqClassifierFlags) variable, such as after deserializing a classifier.
 List<String> segmentString(String sentence)
          ONLY USE IF LOADED A CHINESE WORD SEGMENTER!!!!!
abstract  void serializeClassifier(String serializePath)
           
abstract  List<FeatureLabel> test(List<FeatureLabel> document)
          Classify a List of FeatureLabels.
 void testAndWriteAnswers(String testFile)
          Load a test file, run the classifier on it, and then print the answers to stdout (with timing to stderr).
 void testAndWriteAnswersKBest(String testFile, int k)
          Load a test file, run the classifier on it, and then print the answers to stdout (with timing to stderr).
 List<List<FeatureLabel>> testFile(String filename)
          Classify a Sentence.
 Counter<List<FeatureLabel>> testKBest(List<FeatureLabel> doc, String answerField, int k)
           
 List<FeatureLabel> testSentence(List<? extends HasWord> sentence)
          Classify a Sentence.
 List<List<FeatureLabel>> testSentences(String sentences)
          Classify a Sentence.
 List<FeatureLabel> testSentenceWithCasing(List<FeatureLabel> sentence)
          Classify a List of FeatureLabels using a TrueCasingDocumentReader.
 String testString(String sentences)
          Classify the contents of a String.
 String testStringInlineXML(String sentences)
          Classify the contents of a String.
 String testStringXML(String sentences)
          Classify the contents of a String.
 void train()
           
abstract  void train(ObjectBank<List<FeatureLabel>> docs)
           
 void train(String filename)
           
 void writeAnswers(List<FeatureLabel> doc)
          Write the classifications of the Sequence classifier out in a format determined by the DocumentReaderAndWriter used.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

JAR_CLASSIFIER_PATH

public static final String JAR_CLASSIFIER_PATH
See Also:
Constant Field Values

flags

public SeqClassifierFlags flags

classIndex

public Index<String> classIndex

readerAndWriter

protected DocumentReaderAndWriter readerAndWriter

featureFactory

public FeatureFactory featureFactory

pad

protected FeatureLabel pad

windowSize

public int windowSize

knownLCWords

protected Set<String> knownLCWords
Constructor Detail

AbstractSequenceClassifier

public AbstractSequenceClassifier()
This does nothing. An implementing class should call init() in its constructor.

Method Detail

init

protected void init(Properties props)

init

protected void init(SeqClassifierFlags flags)

reinit

protected void reinit()
This method should be called after there have been changes to the flags (SeqClassifierFlags) variable, such as after deserializing a classifier. It is called inside the loadClassifier methods. It assumes that the flags variable and the pad variable exist, but reinitializes things like the pad variable, featureFactory and readerAndWriter based on the flags.

Implementation note: At the moment this variable doesn't set windowSize or featureFactory, since they are being serialized separately in the file, but we should probably stop serializing them and just reinitialize them from the flags?


backgroundSymbol

public String backgroundSymbol()

labels

public Set<String> labels()

testSentence

public List<FeatureLabel> testSentence(List<? extends HasWord> sentence)
Classify a Sentence.

Parameters:
sentence - The Sentence to be classified.
Returns:
The classified Sentence, where the classifier output for each token is stored in its "answer" field.

getSequenceModel

public SequenceModel getSequenceModel(List<FeatureLabel> doc)

getSampler

public Sampler<List<FeatureLabel>> getSampler(List<FeatureLabel> input)

testKBest

public Counter<List<FeatureLabel>> testKBest(List<FeatureLabel> doc,
                                             String answerField,
                                             int k)

testSentenceWithCasing

public List<FeatureLabel> testSentenceWithCasing(List<FeatureLabel> sentence)
Classify a List of FeatureLabels using a TrueCasingDocumentReader.

Parameters:
sentence - a list of featureLabels to be classifierd
Returns:
The classified list}.

testSentences

public List<List<FeatureLabel>> testSentences(String sentences)
Classify a Sentence.

Parameters:
sentences - The sentence(s) to be classified.
Returns:
List of classified Sentences.

testFile

public List<List<FeatureLabel>> testFile(String filename)
Classify a Sentence.

Parameters:
filename - Contains the sentence(s) to be classified.
Returns:
List of classified Sentences.

apply

public Object apply(Object in)
Maps a String input to an XML-formatted rendition of applying NER to the String. Implements the Function interface. Calls testStringInlineXML(Stringa) [q.v.].

Specified by:
apply in interface Function

testStringInlineXML

public String testStringInlineXML(String sentences)
Classify the contents of a String. Plain text or XML is expected and the PlainTextDocumentReaderAndWriter is used. Output is in inline XML format (e.g. <PERSON>Bill Smith</PERSON> went to <LOCATION>Paris</LOCATION> .)

Parameters:
sentences - The string to be classified
Returns:
A String with annotated with classification information.

testStringXML

public String testStringXML(String sentences)
Classify the contents of a String. Plain text or XML is expected and the PlainTextDocumentReaderAndWriter is used. Output is in XML format.

Parameters:
sentences - The string to be classified
Returns:
A String with annotated with classification information.

testString

public String testString(String sentences)
Classify the contents of a String. Plain text or XML is expected and the PlainTextDocumentReaderAndWriter is used. Output looks like: My/O name/O is/O Bill/PERSON Smith/PERSON ./O

Parameters:
sentences - The string to be classified
Returns:
A String with annotated with classification information.

segmentString

public List<String> segmentString(String sentence)
ONLY USE IF LOADED A CHINESE WORD SEGMENTER!!!!!

Parameters:
sentence - The string to be classified
Returns:
List of words

test

public abstract List<FeatureLabel> test(List<FeatureLabel> document)
Classify a List of FeatureLabels.

Parameters:
document - A List of FeatureLabels.
Returns:
the same List, but with the elements annotated with their answers (with setAnswer()).

train

public void train()

train

public void train(String filename)

train

public abstract void train(ObjectBank<List<FeatureLabel>> docs)

makeObjectBank

public ObjectBank<List<FeatureLabel>> makeObjectBank(String filenameOrString)

makeObjectBank

public ObjectBank<List<FeatureLabel>> makeObjectBank(String filenameOrString,
                                                     boolean quietly)

makeObjectBank

protected ObjectBank<List<FeatureLabel>> makeObjectBank(BufferedReader in,
                                                        boolean quietly)
Set up an ObjectBank that will allow one to iterate over a collection of documents obtained from the passed in Reader. Each document will be represented as a list of FeatureLabel. If the ObjectBank iterator() is called until hasNext() returns false, then the Reader will be read till end of file, but no reading is done at the time of this call. Reading is done using the reading method specified in flags.documentReader, and for some reader choices, the column mapping given in flags.map.

Parameters:
in - Input data
addNEWLCWords - do we add new lowercase words from this data to the word shape classifier
quietly - Print less messages if this is true (use when calling it repeatedly on small bits of text)
Returns:
The list of documents

makeObjectBank

public ObjectBank<List<FeatureLabel>> makeObjectBank(BufferedReader in)

printProbs

public void printProbs(String filename)
Takes the file, reads it in, and prints out the likelihood of each possible label at each point.

Parameters:
filename - The path to the specified file

printProbsDocuments

public void printProbsDocuments(ObjectBank<List<FeatureLabel>> documents)
Takes a List of documents and prints the likelihood of each possible label at each point.

Parameters:
documents - A List of List of FeatureLabels.

printProbsDocument

public abstract void printProbsDocument(List<FeatureLabel> document)

testAndWriteAnswers

public void testAndWriteAnswers(String testFile)
                         throws Exception
Load a test file, run the classifier on it, and then print the answers to stdout (with timing to stderr). This uses the value of flags.documentReader to determine testFile format.

Parameters:
testFile - The file to test on.
Throws:
Exception

testAndWriteAnswersKBest

public void testAndWriteAnswersKBest(String testFile,
                                     int k)
                              throws Exception
Load a test file, run the classifier on it, and then print the answers to stdout (with timing to stderr). This uses the value of flags.documentReader to determine testFile format.

Parameters:
testFile - The file to test on.
Throws:
Exception

writeAnswers

public void writeAnswers(List<FeatureLabel> doc)
                  throws Exception
Write the classifications of the Sequence classifier out in a format determined by the DocumentReaderAndWriter used.

Throws:
Exception

serializeClassifier

public abstract void serializeClassifier(String serializePath)

loadClassifierNoExceptions

public void loadClassifierNoExceptions(BufferedInputStream in)
Loads a classifier from the given input stream.


loadClassifier

public void loadClassifier(InputStream in)
                    throws IOException,
                           ClassCastException,
                           ClassNotFoundException
Throws:
IOException
ClassCastException
ClassNotFoundException

loadClassifier

public abstract void loadClassifier(InputStream in,
                                    Properties props)
                             throws IOException,
                                    ClassCastException,
                                    ClassNotFoundException
Load a classsifier from the specified input stream. The classifier is reinitialized from the flags serialized in the classifier.

Parameters:
in - The InputStream to load the serialized classifier from
props - This Properties object will be used to update the SeqClassifierFlags which are read from the serialized classifier
Throws:
IOException
ClassCastException
ClassNotFoundException

loadClassifier

public void loadClassifier(String loadPath)
                    throws ClassCastException,
                           IOException,
                           ClassNotFoundException
Loads a classifier from the file specified by loadPath. If loadPath ends in .gz, uses a GZIPInputStream, else uses a regular FileInputStream.

Throws:
ClassCastException
IOException
ClassNotFoundException

loadClassifierNoExceptions

public void loadClassifierNoExceptions(String loadPath)

loadClassifierNoExceptions

public void loadClassifierNoExceptions(String loadPath,
                                       Properties props)

loadClassifier

public void loadClassifier(File file)
                    throws ClassCastException,
                           IOException,
                           ClassNotFoundException
Throws:
ClassCastException
IOException
ClassNotFoundException

loadClassifier

public void loadClassifier(File file,
                           Properties props)
                    throws ClassCastException,
                           IOException,
                           ClassNotFoundException
Loads a classifier from the file specified by loadPath. If loadPath ends in .gz, uses a GZIPInputStream, else uses a regular FileInputStream.

Throws:
ClassCastException
IOException
ClassNotFoundException

loadClassifierNoExceptions

public void loadClassifierNoExceptions(File file)

loadClassifierNoExceptions

public void loadClassifierNoExceptions(File file,
                                       Properties props)

loadJarClassifier

public void loadJarClassifier(String modelName,
                              Properties props)
This function will load a classifier that is stored inside a jar file (if it is so stored). The classifier should be specified as its full filename, but the path in the jar file (/classifiers/) is coded in this class. If the classifier is not stored in the jar file or this is not run from inside a jar file, then this function will throw a RuntimeException.

Parameters:
modelName - The name of the model file. Iff it ends in .gz, then it is assumed to be gzip compressed.
props - A Properties object which can override certain properties in the serialized file, such as the DocumentReaderAndWriter. You can pass in null to override nothing.


Stanford NLP Group