edu.stanford.nlp.ie
Class AbstractSequenceClassifier

java.lang.Object
  extended by edu.stanford.nlp.ie.AbstractSequenceClassifier
All Implemented Interfaces:
Function<String,String>
Direct Known Subclasses:
CRFClassifier

public abstract class AbstractSequenceClassifier
extends Object
implements Function<String,String>

This class provides common functionality for (probabilistic) sequence models. It is a superclass of our CMM and CRF sequence classifiers, and is even used in the (deterministic) NumberSequenceClassifier. See implementing classes for more information.

Author:
Jenny Finkel, Dan Klein, Christopher Manning, Dan Cer

Field Summary
 Index<String> classIndex
           
 FeatureFactory featureFactory
           
 SeqClassifierFlags flags
           
static String JAR_CLASSIFIER_PATH
           
protected  Set<String> knownLCWords
           
protected  CoreLabel pad
           
protected  DocumentReaderAndWriter readerAndWriter
           
 int windowSize
           
 
Constructor Summary
AbstractSequenceClassifier()
          This does nothing.
 
Method Summary
 String apply(String in)
          Maps a String input to an XML-formatted rendition of applying NER to the String.
 String backgroundSymbol()
           
 Sampler<List<CoreLabel>> getSampler(List<? extends CoreLabel> input)
           
 SequenceModel getSequenceModel(List<? extends CoreLabel> doc)
           
 DFSA getViterbiSearchGraph(List<CoreLabel> doc, Class<? extends CoreAnnotation<String>> answerField)
           
protected  void init(Properties props)
           
protected  void init(SeqClassifierFlags flags)
           
 Set<String> labels()
           
 void loadClassifier(File file)
           
 void loadClassifier(File file, Properties props)
          Loads a classifier from the file specified by loadPath.
 void loadClassifier(InputStream in)
           
abstract  void loadClassifier(InputStream in, Properties props)
          Load a classsifier from the specified input stream.
 void loadClassifier(String loadPath)
          Loads a classifier from the file specified by loadPath.
 void loadClassifierNoExceptions(BufferedInputStream in)
          Loads a classifier from the given input stream.
 void loadClassifierNoExceptions(File file)
           
 void loadClassifierNoExceptions(File file, Properties props)
           
 void loadClassifierNoExceptions(String loadPath)
           
 void loadClassifierNoExceptions(String loadPath, Properties props)
           
 void loadJarClassifier(String modelName, Properties props)
          This function will load a classifier that is stored inside a jar file (if it is so stored).
 ObjectBank<List<CoreLabel>> makeObjectBank(BufferedReader in)
           
protected  ObjectBank<List<CoreLabel>> makeObjectBank(BufferedReader in, boolean quietly)
          Set up an ObjectBank that will allow one to iterate over a collection of documents obtained from the passed in Reader.
 ObjectBank<List<CoreLabel>> makeObjectBank(Collection<File> files)
           
 ObjectBank<List<CoreLabel>> makeObjectBank(String filenameOrString)
           
 ObjectBank<List<CoreLabel>> makeObjectBank(String[] trainFileList, boolean quitely)
           
 ObjectBank<List<CoreLabel>> makeObjectBank(String filenameOrString, boolean quietly)
           
 ObjectBank<List<CoreLabel>> makeObjectBank(String baseDir, String filePattern, boolean quietly)
           
 void printProbs(String filename)
          Takes the file, reads it in, and prints out the likelihood of each possible label at each point.
abstract  void printProbsDocument(List<CoreLabel> document)
           
 void printProbsDocuments(ObjectBank<List<CoreLabel>> documents)
          Takes a List of documents and prints the likelihood of each possible label at each point.
protected  void reinit()
          This method should be called after there have been changes to the flags (SeqClassifierFlags) variable, such as after deserializing a classifier.
 List<String> segmentString(String sentence)
          ONLY USE IF LOADED A CHINESE WORD SEGMENTER!!!!!
abstract  void serializeClassifier(String serializePath)
           
abstract  List<CoreLabel> test(List<CoreLabel> document)
          Classify a List of CoreLabels.
 void testAndWriteAnswers(Collection<File> testFiles)
           
 void testAndWriteAnswers(String testFile)
          Load a test file, run the classifier on it, and then print the answers to stdout (with timing to stderr).
 void testAndWriteAnswers(String baseDir, String filePattern)
           
 void testAndWriteAnswersKBest(String testFile, int k)
          Load a test file, run the classifier on it, and then print the answers to stdout (with timing to stderr).
 void testAndWriteViterbiSearchGraph(String testFile, String searchGraphPrefix)
          Load a test file, run the classifier on it, and then write a Viterbi search graph for each sequence.
 List<List<CoreLabel>> testFile(String filename)
          Classify a Sentence.
 Counter<List<CoreLabel>> testKBest(List<CoreLabel> doc, Class<? extends CoreAnnotation<String>> answerField, int k)
           
 List<CoreLabel> testSentence(List<? extends HasWord> sentence)
          Classify a Sentence.
 List<List<CoreLabel>> testSentences(String sentences)
          Classify a Sentence.
 List<CoreLabel> testSentenceWithCasing(List<CoreLabel> sentence)
          Classify a List of CoreLabels using a TrueCasingDocumentReader.
 String testString(String sentences)
          Classify the contents of a String.
 List<Triple<String,Integer,Integer>> testStringAndGetCharacterOffsets(String sentences)
          Classify the contents of a String.
 String testStringInlineXML(String sentences)
          Classify the contents of a String.
 String testStringXML(String sentences)
          Classify the contents of a String.
 void train()
           
abstract  void train(ObjectBank<List<CoreLabel>> docs)
           
 void train(String filename)
           
 void train(String[] trainFileList)
           
 void train(String baseTrainDir, String trainFiles)
           
 void writeAnswers(List<CoreLabel> doc)
          Write the classifications of the Sequence classifier out to stdout in a format determined by the DocumentReaderAndWriter used.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

JAR_CLASSIFIER_PATH

public static final String JAR_CLASSIFIER_PATH
See Also:
Constant Field Values

flags

public SeqClassifierFlags flags

classIndex

public Index<String> classIndex

readerAndWriter

protected DocumentReaderAndWriter readerAndWriter

featureFactory

public FeatureFactory featureFactory

pad

protected CoreLabel pad

windowSize

public int windowSize

knownLCWords

protected Set<String> knownLCWords
Constructor Detail

AbstractSequenceClassifier

public AbstractSequenceClassifier()
This does nothing. An implementing class should call init() in its constructor.

Method Detail

init

protected void init(Properties props)

init

protected void init(SeqClassifierFlags flags)

reinit

protected void reinit()
This method should be called after there have been changes to the flags (SeqClassifierFlags) variable, such as after deserializing a classifier. It is called inside the loadClassifier methods. It assumes that the flags variable and the pad variable exist, but reinitializes things like the pad variable, featureFactory and readerAndWriter based on the flags.

Implementation note: At the moment this variable doesn't set windowSize or featureFactory, since they are being serialized separately in the file, but we should probably stop serializing them and just reinitialize them from the flags?


backgroundSymbol

public String backgroundSymbol()

labels

public Set<String> labels()

testSentence

public List<CoreLabel> testSentence(List<? extends HasWord> sentence)
Classify a Sentence.

Parameters:
sentence - The Sentence to be classified.
Returns:
The classified Sentence, where the classifier output for each token is stored in its "answer" field.

getSequenceModel

public SequenceModel getSequenceModel(List<? extends CoreLabel> doc)

getSampler

public Sampler<List<CoreLabel>> getSampler(List<? extends CoreLabel> input)

testKBest

public Counter<List<CoreLabel>> testKBest(List<CoreLabel> doc,
                                          Class<? extends CoreAnnotation<String>> answerField,
                                          int k)

getViterbiSearchGraph

public DFSA getViterbiSearchGraph(List<CoreLabel> doc,
                                  Class<? extends CoreAnnotation<String>> answerField)

testSentenceWithCasing

public List<CoreLabel> testSentenceWithCasing(List<CoreLabel> sentence)
Classify a List of CoreLabels using a TrueCasingDocumentReader.

Parameters:
sentence - a list of CoreLabels to be classifierd
Returns:
The classified list}.

testSentences

public List<List<CoreLabel>> testSentences(String sentences)
Classify a Sentence.

Parameters:
sentences - The sentence(s) to be classified.
Returns:
List of classified Sentences.

testFile

public List<List<CoreLabel>> testFile(String filename)
Classify a Sentence.

Parameters:
filename - Contains the sentence(s) to be classified.
Returns:
List of classified Sentences.

apply

public String apply(String in)
Maps a String input to an XML-formatted rendition of applying NER to the String. Implements the Function interface. Calls testStringInlineXML(Stringa) [q.v.].

Specified by:
apply in interface Function<String,String>
Parameters:
in - The function's argument
Returns:
The function's evaluated value

testStringInlineXML

public String testStringInlineXML(String sentences)
Classify the contents of a String. Plain text or XML is expected and the PlainTextDocumentReaderAndWriter is used. Output is in inline XML format (e.g. <PERSON>Bill Smith</PERSON> went to <LOCATION>Paris</LOCATION> .)

Parameters:
sentences - The string to be classified
Returns:
A String with annotated with classification information.

testStringXML

public String testStringXML(String sentences)
Classify the contents of a String. Plain text or XML is expected and the PlainTextDocumentReaderAndWriter is used. Output is in XML format.

Parameters:
sentences - The string to be classified
Returns:
A String with annotated with classification information.

testString

public String testString(String sentences)
Classify the contents of a String. Plain text or XML is expected and the PlainTextDocumentReaderAndWriter is used. Output looks like: My/O name/O is/O Bill/PERSON Smith/PERSON ./O

Parameters:
sentences - The string to be classified
Returns:
A String with annotated with classification information.

testStringAndGetCharacterOffsets

public List<Triple<String,Integer,Integer>> testStringAndGetCharacterOffsets(String sentences)
Classify the contents of a String. Plain text or XML is expected and the PlainTextDocumentReaderAndWriter is used. Output looks like: My/O name/O is/O Bill/PERSON Smith/PERSON ./O

Parameters:
sentences - The string to be classified
Returns:
A String with annotated with classification information.

segmentString

public List<String> segmentString(String sentence)
ONLY USE IF LOADED A CHINESE WORD SEGMENTER!!!!!

Parameters:
sentence - The string to be classified
Returns:
List of words

test

public abstract List<CoreLabel> test(List<CoreLabel> document)
Classify a List of CoreLabels.

Parameters:
document - A List of CoreLabels.
Returns:
the same List, but with the elements annotated with their answers (with setAnswer()).

train

public void train()

train

public void train(String filename)

train

public void train(String baseTrainDir,
                  String trainFiles)

train

public void train(String[] trainFileList)

train

public abstract void train(ObjectBank<List<CoreLabel>> docs)

makeObjectBank

public ObjectBank<List<CoreLabel>> makeObjectBank(String filenameOrString)

makeObjectBank

public ObjectBank<List<CoreLabel>> makeObjectBank(String filenameOrString,
                                                  boolean quietly)

makeObjectBank

public ObjectBank<List<CoreLabel>> makeObjectBank(String[] trainFileList,
                                                  boolean quitely)

makeObjectBank

public ObjectBank<List<CoreLabel>> makeObjectBank(String baseDir,
                                                  String filePattern,
                                                  boolean quietly)

makeObjectBank

public ObjectBank<List<CoreLabel>> makeObjectBank(Collection<File> files)

makeObjectBank

protected ObjectBank<List<CoreLabel>> makeObjectBank(BufferedReader in,
                                                     boolean quietly)
Set up an ObjectBank that will allow one to iterate over a collection of documents obtained from the passed in Reader. Each document will be represented as a list of CoreLabel. If the ObjectBank iterator() is called until hasNext() returns false, then the Reader will be read till end of file, but no reading is done at the time of this call. Reading is done using the reading method specified in flags.documentReader, and for some reader choices, the column mapping given in flags.map.

Parameters:
in - Input data addNEWLCWords do we add new lowercase words from this data to the word shape classifier
quietly - Print less messages if this is true (use when calling it repeatedly on small bits of text)
Returns:
The list of documents

makeObjectBank

public ObjectBank<List<CoreLabel>> makeObjectBank(BufferedReader in)

printProbs

public void printProbs(String filename)
Takes the file, reads it in, and prints out the likelihood of each possible label at each point.

Parameters:
filename - The path to the specified file

printProbsDocuments

public void printProbsDocuments(ObjectBank<List<CoreLabel>> documents)
Takes a List of documents and prints the likelihood of each possible label at each point.

Parameters:
documents - A List of List of CoreLabels.

printProbsDocument

public abstract void printProbsDocument(List<CoreLabel> document)

testAndWriteAnswers

public void testAndWriteAnswers(String testFile)
                         throws Exception
Load a test file, run the classifier on it, and then print the answers to stdout (with timing to stderr). This uses the value of flags.documentReader to determine testFile format.

Parameters:
testFile - The file to test on.
Throws:
Exception

testAndWriteAnswers

public void testAndWriteAnswers(String baseDir,
                                String filePattern)
                         throws Exception
Throws:
Exception

testAndWriteAnswers

public void testAndWriteAnswers(Collection<File> testFiles)
                         throws Exception
Throws:
Exception

testAndWriteAnswersKBest

public void testAndWriteAnswersKBest(String testFile,
                                     int k)
                              throws Exception
Load a test file, run the classifier on it, and then print the answers to stdout (with timing to stderr). This uses the value of flags.documentReader to determine testFile format.

Parameters:
testFile - The file to test on.
Throws:
Exception

testAndWriteViterbiSearchGraph

public void testAndWriteViterbiSearchGraph(String testFile,
                                           String searchGraphPrefix)
                                    throws Exception
Load a test file, run the classifier on it, and then write a Viterbi search graph for each sequence.

Parameters:
testFile - The file to test on.
Throws:
Exception

writeAnswers

public void writeAnswers(List<CoreLabel> doc)
                  throws Exception
Write the classifications of the Sequence classifier out to stdout in a format determined by the DocumentReaderAndWriter used. If the flag outputEncoding is defined, the output is written in that character encoding, otherwise in the system default character encoding.

Parameters:
doc - Documents to write out
Throws:
Exception - If an IO problem

serializeClassifier

public abstract void serializeClassifier(String serializePath)

loadClassifierNoExceptions

public void loadClassifierNoExceptions(BufferedInputStream in)
Loads a classifier from the given input stream.


loadClassifier

public void loadClassifier(InputStream in)
                    throws IOException,
                           ClassCastException,
                           ClassNotFoundException
Throws:
IOException
ClassCastException
ClassNotFoundException

loadClassifier

public abstract void loadClassifier(InputStream in,
                                    Properties props)
                             throws IOException,
                                    ClassCastException,
                                    ClassNotFoundException
Load a classsifier from the specified input stream. The classifier is reinitialized from the flags serialized in the classifier.

Parameters:
in - The InputStream to load the serialized classifier from
props - This Properties object will be used to update the SeqClassifierFlags which are read from the serialized classifier
Throws:
IOException
ClassCastException
ClassNotFoundException

loadClassifier

public void loadClassifier(String loadPath)
                    throws ClassCastException,
                           IOException,
                           ClassNotFoundException
Loads a classifier from the file specified by loadPath. If loadPath ends in .gz, uses a GZIPInputStream, else uses a regular FileInputStream.

Throws:
ClassCastException
IOException
ClassNotFoundException

loadClassifierNoExceptions

public void loadClassifierNoExceptions(String loadPath)

loadClassifierNoExceptions

public void loadClassifierNoExceptions(String loadPath,
                                       Properties props)

loadClassifier

public void loadClassifier(File file)
                    throws ClassCastException,
                           IOException,
                           ClassNotFoundException
Throws:
ClassCastException
IOException
ClassNotFoundException

loadClassifier

public void loadClassifier(File file,
                           Properties props)
                    throws ClassCastException,
                           IOException,
                           ClassNotFoundException
Loads a classifier from the file specified by loadPath. If loadPath ends in .gz, uses a GZIPInputStream, else uses a regular FileInputStream.

Throws:
ClassCastException
IOException
ClassNotFoundException

loadClassifierNoExceptions

public void loadClassifierNoExceptions(File file)

loadClassifierNoExceptions

public void loadClassifierNoExceptions(File file,
                                       Properties props)

loadJarClassifier

public void loadJarClassifier(String modelName,
                              Properties props)
This function will load a classifier that is stored inside a jar file (if it is so stored). The classifier should be specified as its full filename, but the path in the jar file (/classifiers/) is coded in this class. If the classifier is not stored in the jar file or this is not run from inside a jar file, then this function will throw a RuntimeException.

Parameters:
modelName - The name of the model file. Iff it ends in .gz, then it is assumed to be gzip compressed.
props - A Properties object which can override certain properties in the serialized file, such as the DocumentReaderAndWriter. You can pass in null to override nothing.


Stanford NLP Group