edu.stanford.nlp.ie
Class AbstractSequenceClassifier<IN extends CoreMap>

java.lang.Object
  extended by edu.stanford.nlp.ie.AbstractSequenceClassifier<IN>
All Implemented Interfaces:
Function<String,String>
Direct Known Subclasses:
ClassifierCombiner, CMMClassifier, CRFClassifier

public abstract class AbstractSequenceClassifier<IN extends CoreMap>
extends Object
implements Function<String,String>

This class provides common functionality for (probabilistic) sequence models. It is a superclass of our CMM and CRF sequence classifiers, and is even used in the (deterministic) NumberSequenceClassifier. See implementing classes for more information.

A full implementation should implement these 5 abstract methods: List<CoreLabel> classify(List<CoreLabel> document); void train(Collection<List<CoreLabel>> docs); printProbsDocument(List<CoreLabel> document); void serializeClassifier(String serializePath); void loadClassifier(ObjectInputStream in, Properties props) throws IOException, ClassCastException, ClassNotFoundException; but a runtime (or rule-based) implementation can usefully implement just the first.

Author:
Jenny Finkel, Dan Klein, Christopher Manning, Dan Cer, sonalg (made the class generic)

Field Summary
 Index<String> classIndex
           
 FeatureFactory<IN> featureFactory
           
 SeqClassifierFlags flags
           
protected  Set<String> knownLCWords
           
protected  IN pad
           
protected  int windowSize
           
 
Constructor Summary
AbstractSequenceClassifier(Properties props)
          Construct a SeqClassifierFlags object based on the passed in properties, and then call the other constructor.
AbstractSequenceClassifier(SeqClassifierFlags flags)
          Initialize the featureFactory and other variables based on the passed in flags.
 
Method Summary
 String apply(String in)
          Maps a String input to an XML-formatted rendition of applying NER to the String.
 String backgroundSymbol()
          Returns the background class for the classifier.
abstract  List<IN> classify(List<IN> document)
          Classify a List of something that extendsCoreMap.
 List<List<IN>> classify(String str)
          Classify the tokens in a String.
 void classifyAndWriteAnswers(Collection<File> testFiles)
           
 void classifyAndWriteAnswers(Collection<File> testFiles, DocumentReaderAndWriter<IN> readerWriter)
           
 void classifyAndWriteAnswers(ObjectBank<List<IN>> documents, PrintWriter printWriter, DocumentReaderAndWriter<IN> readerWriter)
           
 void classifyAndWriteAnswers(String testFile)
          Load a test file, run the classifier on it, and then print the answers to stdout (with timing to stderr).
 void classifyAndWriteAnswers(String testFile, DocumentReaderAndWriter<IN> readerWriter)
          Load a test file, run the classifier on it, and then print the answers to stdout (with timing to stderr).
 void classifyAndWriteAnswers(String testFile, OutputStream outStream, DocumentReaderAndWriter<IN> readerWriter)
          If the flag outputEncoding is defined, the output is written in that character encoding, otherwise in the system default character encoding.
 void classifyAndWriteAnswers(String baseDir, String filePattern, DocumentReaderAndWriter<IN> readerWriter)
           
 void classifyAndWriteAnswersKBest(ObjectBank<List<IN>> documents, int k, PrintWriter printWriter, DocumentReaderAndWriter<IN> readerAndWriter)
          Run the classifier on the documents in an ObjectBank, and print the answers to a given PrintWriter (with timing to stderr).
 void classifyAndWriteAnswersKBest(String testFile, int k, DocumentReaderAndWriter<IN> readerAndWriter)
          Load a test file, run the classifier on it, and then print the answers to stdout (with timing to stderr).
 void classifyAndWriteViterbiSearchGraph(String testFile, String searchGraphPrefix, DocumentReaderAndWriter<IN> readerAndWriter)
          Load a test file, run the classifier on it, and then write a Viterbi search graph for each sequence.
 List<List<IN>> classifyFile(String filename)
          Classify the contents of a file.
 Counter<List<IN>> classifyKBest(List<IN> doc, Class<? extends CoreAnnotation<String>> answerField, int k)
           
 List<List<IN>> classifyRaw(String str, DocumentReaderAndWriter<IN> readerAndWriter)
          Classify the tokens in a String.
 List<IN> classifySentence(List<? extends HasWord> sentence)
          Classify a List of IN.
 List<IN> classifySentenceWithGlobalInformation(List<? extends HasWord> tokenSequence, CoreMap doc, CoreMap sentence)
          Classify a List of IN using whatever additional information is passed in globalInfo.
 void classifyStdin()
           
 void classifyStdin(DocumentReaderAndWriter<IN> readerWriter)
           
 List<Triple<String,Integer,Integer>> classifyToCharacterOffsets(String sentences)
          Classify the contents of a String to classified character offset spans.
 String classifyToString(String sentences)
          Classify the contents of a String to a tagged word/class String.
 String classifyToString(String sentences, String outputFormat, boolean preserveSpacing)
          Classify the contents of a String to one of several String representations that shows the classes.
abstract  List<IN> classifyWithGlobalInformation(List<IN> tokenSequence, CoreMap document, CoreMap sentence)
          Classify a List of something that extends CoreMap using as additional information whatever is stored in the document and sentence.
 String classifyWithInlineXML(String sentences)
          Classify the contents of a String.
static boolean countResults(List<? extends CoreMap> doc, Counter<String> entityTP, Counter<String> entityFP, Counter<String> entityFN)
          Count the successes and failures of the model on the given document.
static boolean countResultsIOB(List<? extends CoreMap> doc, Counter<String> entityTP, Counter<String> entityFP, Counter<String> entityFN)
           
 DocumentReaderAndWriter<IN> defaultReaderAndWriter()
           
 Sampler<List<IN>> getSampler(List<IN> input)
           
 SequenceModel getSequenceModel(List<IN> doc)
           
 DFSA<String,Integer> getViterbiSearchGraph(List<IN> doc, Class<? extends CoreAnnotation<String>> answerField)
           
 Set<String> labels()
           
 void loadClassifier(File file)
           
 void loadClassifier(File file, Properties props)
          Loads a classifier from the file specified.
 void loadClassifier(InputStream in)
          Load a classifier from the specified InputStream.
 void loadClassifier(InputStream in, Properties props)
          Load a classifier from the specified InputStream.
abstract  void loadClassifier(ObjectInputStream in, Properties props)
          Load a classifier from the specified input stream.
 void loadClassifier(String loadPath)
          Loads a classifier from the file specified by loadPath.
 void loadClassifier(String loadPath, Properties props)
          Loads a classifier from the file specified by loadPath.
 void loadClassifierNoExceptions(File file)
           
 void loadClassifierNoExceptions(File file, Properties props)
           
 void loadClassifierNoExceptions(InputStream in, Properties props)
          Loads a classifier from the given input stream.
 void loadClassifierNoExceptions(String loadPath)
           
 void loadClassifierNoExceptions(String loadPath, Properties props)
           
 void loadJarClassifier(String modelName, Properties props)
          This function will load a classifier that is stored inside a jar file (if it is so stored).
 ObjectBank<List<IN>> makeObjectBankFromFile(String filename, DocumentReaderAndWriter<IN> readerAndWriter)
           
 ObjectBank<List<IN>> makeObjectBankFromFiles(Collection<File> files, DocumentReaderAndWriter<IN> readerAndWriter)
           
 ObjectBank<List<IN>> makeObjectBankFromFiles(String[] trainFileList, DocumentReaderAndWriter<IN> readerAndWriter)
           
 ObjectBank<List<IN>> makeObjectBankFromFiles(String baseDir, String filePattern, DocumentReaderAndWriter<IN> readerAndWriter)
           
 ObjectBank<List<IN>> makeObjectBankFromReader(BufferedReader in, DocumentReaderAndWriter<IN> readerAndWriter)
          Set up an ObjectBank that will allow one to iterate over a collection of documents obtained from the passed in Reader.
 ObjectBank<List<IN>> makeObjectBankFromString(String string, DocumentReaderAndWriter<IN> readerAndWriter)
          Reads a String into an ObjectBank object.
 DocumentReaderAndWriter<IN> makePlainTextReaderAndWriter()
          Makes a DocumentReaderAndWriter based on flags.plainTextReaderAndWriter.
 DocumentReaderAndWriter<IN> makeReaderAndWriter()
          Makes a DocumentReaderAndWriter based on the flags the CRFClassifier was constructed with.
 DocumentReaderAndWriter<IN> plainTextReaderAndWriter()
           
protected  void printFeatureLists(IN wi, Collection<List<String>> features)
          Print the String features generated from a token
protected  void printFeatures(IN wi, Collection<String> features)
          Print the String features generated from a IN
 void printProbs(String filename, DocumentReaderAndWriter<IN> readerAndWriter)
          Takes the file, reads it in, and prints out the likelihood of each possible label at each point.
abstract  void printProbsDocument(List<IN> document)
           
 void printProbsDocuments(ObjectBank<List<IN>> documents)
          Takes a List of documents and prints the likelihood of each possible label at each point.
static void printResults(Counter<String> entityTP, Counter<String> entityFP, Counter<String> entityFN)
          Given counters of true positives, false positives, and false negatives, prints out precision, recall, and f1 for each key.
protected  void reinit()
          This method should be called after there have been changes to the flags (SeqClassifierFlags) variable, such as after deserializing a classifier.
 List<String> segmentString(String sentence)
          ONLY USE IF LOADED A CHINESE WORD SEGMENTER!!!!!
 List<String> segmentString(String sentence, DocumentReaderAndWriter<IN> readerAndWriter)
           
abstract  void serializeClassifier(String serializePath)
          Serialize a sequence classifier to a file on the given path.
static int tallyOneEntity(List<? extends CoreMap> doc, int index, Class<? extends CoreAnnotation<String>> source, Class<? extends CoreAnnotation<String>> target, Counter<String> positive, Counter<String> negative)
           
 void train()
          Train the classifier based on values in flags.
 void train(Collection<List<IN>> docs)
          Trains a classifier from a Collection of sequences.
abstract  void train(Collection<List<IN>> docs, DocumentReaderAndWriter<IN> readerAndWriter)
          Trains a classifier from a Collection of sequences.
 void train(String filename)
           
 void train(String[] trainFileList, DocumentReaderAndWriter<IN> readerAndWriter)
           
 void train(String filename, DocumentReaderAndWriter<IN> readerAndWriter)
           
 void train(String baseTrainDir, String trainFiles, DocumentReaderAndWriter<IN> readerAndWriter)
           
 void writeAnswers(List<IN> doc, PrintWriter printWriter, DocumentReaderAndWriter<IN> readerAndWriter)
          Write the classifications of the Sequence classifier out to a writer in a format determined by the DocumentReaderAndWriter used.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

flags

public SeqClassifierFlags flags

classIndex

public Index<String> classIndex

featureFactory

public FeatureFactory<IN extends CoreMap> featureFactory

pad

protected IN extends CoreMap pad

windowSize

protected int windowSize

knownLCWords

protected Set<String> knownLCWords
Constructor Detail

AbstractSequenceClassifier

public AbstractSequenceClassifier(Properties props)
Construct a SeqClassifierFlags object based on the passed in properties, and then call the other constructor.

Parameters:
props - See SeqClassifierFlags for known properties.

AbstractSequenceClassifier

public AbstractSequenceClassifier(SeqClassifierFlags flags)
Initialize the featureFactory and other variables based on the passed in flags.

Parameters:
flags - A specification of the AbstractSequenceClassifier to construct.
Method Detail

defaultReaderAndWriter

public DocumentReaderAndWriter<IN> defaultReaderAndWriter()

plainTextReaderAndWriter

public DocumentReaderAndWriter<IN> plainTextReaderAndWriter()

reinit

protected final void reinit()
This method should be called after there have been changes to the flags (SeqClassifierFlags) variable, such as after deserializing a classifier. It is called inside the loadClassifier methods. It assumes that the flags variable and the pad variable exist, but reinitializes things like the pad variable, featureFactory and readerAndWriter based on the flags.

Implementation note: At the moment this variable doesn't set windowSize or featureFactory, since they are being serialized separately in the file, but we should probably stop serializing them and just reinitialize them from the flags?


makeReaderAndWriter

public DocumentReaderAndWriter<IN> makeReaderAndWriter()
Makes a DocumentReaderAndWriter based on the flags the CRFClassifier was constructed with. Will create the flags.readerAndWriter and initialize it with the CRFClassifier's flags.


makePlainTextReaderAndWriter

public DocumentReaderAndWriter<IN> makePlainTextReaderAndWriter()
Makes a DocumentReaderAndWriter based on flags.plainTextReaderAndWriter. Useful for reading in untokenized text documents or reading plain text from the command line. An example of a way to use this would be to return a edu.stanford.nlp.wordseg.Sighan2005DocumentReaderAndWriter for the Chinese Segmenter.


backgroundSymbol

public String backgroundSymbol()
Returns the background class for the classifier.

Returns:
The background class name

labels

public Set<String> labels()

classifySentence

public List<IN> classifySentence(List<? extends HasWord> sentence)
Classify a List of IN. This method returns a new list of tokens, not the list of tokens passed in, and runs the new tokens through ObjectBankWrapper. (Both these behaviors are different from that of the classify(List) method.

Parameters:
sentence - The List of IN to be classified.
Returns:
The classified List of IN, where the classifier output for each token is stored in its CoreAnnotations.AnswerAnnotation field.

classifySentenceWithGlobalInformation

public List<IN> classifySentenceWithGlobalInformation(List<? extends HasWord> tokenSequence,
                                                      CoreMap doc,
                                                      CoreMap sentence)
Classify a List of IN using whatever additional information is passed in globalInfo. Used by SUTime (NumberSequenceClassifier), which requires the doc date to resolve relative dates

Parameters:
tokenSequence - The List of IN to be classified.
Returns:
The classified List of IN, where the classifier output for each token is stored in its "answer" field.

getSequenceModel

public SequenceModel getSequenceModel(List<IN> doc)

getSampler

public Sampler<List<IN>> getSampler(List<IN> input)

classifyKBest

public Counter<List<IN>> classifyKBest(List<IN> doc,
                                       Class<? extends CoreAnnotation<String>> answerField,
                                       int k)

getViterbiSearchGraph

public DFSA<String,Integer> getViterbiSearchGraph(List<IN> doc,
                                                  Class<? extends CoreAnnotation<String>> answerField)

classify

public List<List<IN>> classify(String str)
Classify the tokens in a String. Each sentence becomes a separate document.

Parameters:
str - A String with tokens in one or more sentences of text to be classified.
Returns:
List of classified sentences (each a List of something that extends CoreMap).

classifyRaw

public List<List<IN>> classifyRaw(String str,
                                  DocumentReaderAndWriter<IN> readerAndWriter)
Classify the tokens in a String. Each sentence becomes a separate document. Doesn't override default readerAndWriter.

Parameters:
str - A String with tokens in one or more sentences of text to be classified.
Returns:
List of classified sentences (each a List of something that extends CoreMap).

classifyFile

public List<List<IN>> classifyFile(String filename)
Classify the contents of a file.

Parameters:
filename - Contains the sentence(s) to be classified.
Returns:
List of classified List of IN.

apply

public String apply(String in)
Maps a String input to an XML-formatted rendition of applying NER to the String. Implements the Function interface. Calls classifyWithInlineXML(String) [q.v.].

Specified by:
apply in interface Function<String,String>
Parameters:
in - The function's argument
Returns:
The function's evaluated value

classifyToString

public String classifyToString(String sentences,
                               String outputFormat,
                               boolean preserveSpacing)
Classify the contents of a String to one of several String representations that shows the classes. Plain text or XML input is expected and the PlainTextDocumentReaderAndWriter is used. The classifier will tokenize the text and treat each sentence as a separate document. The output can be specified to be in a choice of three formats: slashTags (e.g., Bill/PERSON Smith/PERSON died/O ./O), inlineXML (e.g., <PERSON>Bill Smith</PERSON> went to <LOCATION>Paris</LOCATION> .), or xml, for stand-off XML (e.g., <wi num="0" entity="PERSON">Sue</wi> <wi num="1" entity="O">shouted</wi> ). There is also a binary choice as to whether the spacing between tokens of the original is preserved or whether the (tagged) tokens are printed with a single space (for inlineXML or slashTags) or a single newline (for xml) between each one.

Fine points: The slashTags and xml formats show tokens as transformed by any normalization processes inside the tokenizer, while inlineXML shows the tokens exactly as they appeared in the source text. When a period counts as both part of an abbreviation and as an end of sentence marker, it is included twice in the output String for slashTags or xml, but only once for inlineXML, where it is not counted as part of the abbreviation (or any named entity it is part of). For slashTags with preserveSpacing=true, there will be two successive periods such as "Jr.." The tokenized (preserveSpacing=false) output will have a space or a newline after the last token.

Parameters:
sentences - The String to be classified. It will be tokenized and divided into documents according to (heuristically determined) sentence boundaries.
outputFormat - The format to put the output in: one of "slashTags", "xml", or "inlineXML"
preserveSpacing - Whether to preserve the input spacing between tokens, which may sometimes be none (true) or whether to tokenize the text and print it with one space between each token (false)
Returns:
A String with annotated with classification information.

classifyWithInlineXML

public String classifyWithInlineXML(String sentences)
Classify the contents of a String. Plain text or XML is expected and the PlainTextDocumentReaderAndWriter is used by default. The classifier will treat each sentence as a separate document. The output can be specified to be in a choice of formats: Output is in inline XML format (e.g. <PERSON>Bill Smith</PERSON> went to <LOCATION>Paris</LOCATION> .)

Parameters:
sentences - The string to be classified
Returns:
A String with annotated with classification information.

classifyToString

public String classifyToString(String sentences)
Classify the contents of a String to a tagged word/class String. Plain text or XML input is expected and the PlainTextDocumentReaderAndWriter is used by default. Output looks like: My/O name/O is/O Bill/PERSON Smith/PERSON ./O

Parameters:
sentences - The String to be classified
Returns:
A String annotated with classification information.

classifyToCharacterOffsets

public List<Triple<String,Integer,Integer>> classifyToCharacterOffsets(String sentences)
Classify the contents of a String to classified character offset spans. Plain text or XML input text is expected and the PlainTextDocumentReaderAndWriter is used by default. Output is a (possibly empty, but not null) List of Triples. Each Triple is an entity name, followed by beginning and ending character offsets in the original String. Character offsets can be thought of as fenceposts between the characters, or, like certain methods in the Java String class, as character positions, numbered starting from 0, with the end index pointing to the position AFTER the entity ends. That is, end - start is the length of the entity in characters.

Fine points: Token offsets are true wrt the source text, even though the tokenizer may internally normalize certain tokens to String representations of different lengths (e.g., " becoming `` or ''). When a period counts as both part of an abbreviation and as an end of sentence marker, and that abbreviation is part of a named entity, the reported entity string excludes the period.

Parameters:
sentences - The string to be classified
Returns:
A List of Triples, each of which gives an entity type and the beginning and ending character offsets.

segmentString

public List<String> segmentString(String sentence)
ONLY USE IF LOADED A CHINESE WORD SEGMENTER!!!!!

Parameters:
sentence - The string to be classified
Returns:
List of words

segmentString

public List<String> segmentString(String sentence,
                                  DocumentReaderAndWriter<IN> readerAndWriter)

classify

public abstract List<IN> classify(List<IN> document)
Classify a List of something that extendsCoreMap. The classifications are added in place to the items of the document, which is also returned by this method

Parameters:
document - A List of something that extends CoreMap.
Returns:
The same List, but with the elements annotated with their answers (stored under the CoreAnnotations.AnswerAnnotation key).

classifyWithGlobalInformation

public abstract List<IN> classifyWithGlobalInformation(List<IN> tokenSequence,
                                                       CoreMap document,
                                                       CoreMap sentence)
Classify a List of something that extends CoreMap using as additional information whatever is stored in the document and sentence. This is needed for SUTime (NumberSequenceClassifier), which requires the document date to resolve relative dates.

Parameters:
tokenSequence -
document -
sentence -
Returns:
Classified version of the input tokenSequence

train

public void train()
Train the classifier based on values in flags. It will use the first of these variables that is defined: trainFiles (and baseTrainDir), trainFileList, trainFile.


train

public void train(String filename)

train

public void train(String filename,
                  DocumentReaderAndWriter<IN> readerAndWriter)

train

public void train(String baseTrainDir,
                  String trainFiles,
                  DocumentReaderAndWriter<IN> readerAndWriter)

train

public void train(String[] trainFileList,
                  DocumentReaderAndWriter<IN> readerAndWriter)

train

public void train(Collection<List<IN>> docs)
Trains a classifier from a Collection of sequences. Note that the Collection can be (and usually is) an ObjectBank.

Parameters:
docs - An Objectbank or a collection of sequences of IN

train

public abstract void train(Collection<List<IN>> docs,
                           DocumentReaderAndWriter<IN> readerAndWriter)
Trains a classifier from a Collection of sequences. Note that the Collection can be (and usually is) an ObjectBank.

Parameters:
docs - An Objectbank or a collection of sequences of IN
readerAndWriter - A DocumentReaderAndWriter to use when loading test files

makeObjectBankFromString

public ObjectBank<List<IN>> makeObjectBankFromString(String string,
                                                     DocumentReaderAndWriter<IN> readerAndWriter)
Reads a String into an ObjectBank object. NOTE: that the current implementation of ReaderIteratorFactory will first try to interpret each string as a filename, so this method will yield unwanted results if it applies to a string that is at the same time a filename. It prints out a warning, at least.

Parameters:
string - The String which will be the content of the ObjectBank
Returns:
The ObjectBank

makeObjectBankFromFile

public ObjectBank<List<IN>> makeObjectBankFromFile(String filename,
                                                   DocumentReaderAndWriter<IN> readerAndWriter)

makeObjectBankFromFiles

public ObjectBank<List<IN>> makeObjectBankFromFiles(String[] trainFileList,
                                                    DocumentReaderAndWriter<IN> readerAndWriter)

makeObjectBankFromFiles

public ObjectBank<List<IN>> makeObjectBankFromFiles(String baseDir,
                                                    String filePattern,
                                                    DocumentReaderAndWriter<IN> readerAndWriter)

makeObjectBankFromFiles

public ObjectBank<List<IN>> makeObjectBankFromFiles(Collection<File> files,
                                                    DocumentReaderAndWriter<IN> readerAndWriter)

makeObjectBankFromReader

public ObjectBank<List<IN>> makeObjectBankFromReader(BufferedReader in,
                                                     DocumentReaderAndWriter<IN> readerAndWriter)
Set up an ObjectBank that will allow one to iterate over a collection of documents obtained from the passed in Reader. Each document will be represented as a list of IN. If the ObjectBank iterator() is called until hasNext() returns false, then the Reader will be read till end of file, but no reading is done at the time of this call. Reading is done using the reading method specified in flags.documentReader, and for some reader choices, the column mapping given in flags.map.

Parameters:
in - Input data addNEWLCWords do we add new lowercase words from this data to the word shape classifier
Returns:
The list of documents

printProbs

public void printProbs(String filename,
                       DocumentReaderAndWriter<IN> readerAndWriter)
Takes the file, reads it in, and prints out the likelihood of each possible label at each point.

Parameters:
filename - The path to the specified file

printProbsDocuments

public void printProbsDocuments(ObjectBank<List<IN>> documents)
Takes a List of documents and prints the likelihood of each possible label at each point.

Parameters:
documents - A List of List of something that extends CoreMap.

classifyStdin

public void classifyStdin()
                   throws IOException
Throws:
IOException

classifyStdin

public void classifyStdin(DocumentReaderAndWriter<IN> readerWriter)
                   throws IOException
Throws:
IOException

printProbsDocument

public abstract void printProbsDocument(List<IN> document)

classifyAndWriteAnswers

public void classifyAndWriteAnswers(String testFile)
                             throws IOException
Load a test file, run the classifier on it, and then print the answers to stdout (with timing to stderr). This uses the value of flags.documentReader to determine testFile format.

Parameters:
testFile - The file to test on.
Throws:
IOException

classifyAndWriteAnswers

public void classifyAndWriteAnswers(String testFile,
                                    DocumentReaderAndWriter<IN> readerWriter)
                             throws IOException
Load a test file, run the classifier on it, and then print the answers to stdout (with timing to stderr). This uses the value of flags.documentReader to determine testFile format.

Parameters:
testFile - The file to test on.
readerWriter - A reader and writer to use for the output
Throws:
IOException

classifyAndWriteAnswers

public void classifyAndWriteAnswers(String testFile,
                                    OutputStream outStream,
                                    DocumentReaderAndWriter<IN> readerWriter)
                             throws IOException
If the flag outputEncoding is defined, the output is written in that character encoding, otherwise in the system default character encoding.

Throws:
IOException

classifyAndWriteAnswers

public void classifyAndWriteAnswers(String baseDir,
                                    String filePattern,
                                    DocumentReaderAndWriter<IN> readerWriter)
                             throws IOException
Throws:
IOException

classifyAndWriteAnswers

public void classifyAndWriteAnswers(Collection<File> testFiles)
                             throws IOException
Throws:
IOException

classifyAndWriteAnswers

public void classifyAndWriteAnswers(Collection<File> testFiles,
                                    DocumentReaderAndWriter<IN> readerWriter)
                             throws IOException
Throws:
IOException

classifyAndWriteAnswers

public void classifyAndWriteAnswers(ObjectBank<List<IN>> documents,
                                    PrintWriter printWriter,
                                    DocumentReaderAndWriter<IN> readerWriter)
                             throws IOException
Throws:
IOException

classifyAndWriteAnswersKBest

public void classifyAndWriteAnswersKBest(String testFile,
                                         int k,
                                         DocumentReaderAndWriter<IN> readerAndWriter)
                                  throws IOException
Load a test file, run the classifier on it, and then print the answers to stdout (with timing to stderr). This uses the value of flags.documentReader to determine testFile format.

Parameters:
testFile - The filename to test on.
Throws:
IOException

classifyAndWriteAnswersKBest

public void classifyAndWriteAnswersKBest(ObjectBank<List<IN>> documents,
                                         int k,
                                         PrintWriter printWriter,
                                         DocumentReaderAndWriter<IN> readerAndWriter)
                                  throws IOException
Run the classifier on the documents in an ObjectBank, and print the answers to a given PrintWriter (with timing to stderr). The value of flags.documentReader is used to determine testFile format.

Parameters:
documents - The ObjectBank to test on.
Throws:
IOException

classifyAndWriteViterbiSearchGraph

public void classifyAndWriteViterbiSearchGraph(String testFile,
                                               String searchGraphPrefix,
                                               DocumentReaderAndWriter<IN> readerAndWriter)
                                        throws IOException
Load a test file, run the classifier on it, and then write a Viterbi search graph for each sequence.

Parameters:
testFile - The file to test on.
Throws:
IOException

writeAnswers

public void writeAnswers(List<IN> doc,
                         PrintWriter printWriter,
                         DocumentReaderAndWriter<IN> readerAndWriter)
                  throws IOException
Write the classifications of the Sequence classifier out to a writer in a format determined by the DocumentReaderAndWriter used.

Parameters:
doc - Documents to write out
printWriter - Writer to use for output
Throws:
IOException - If an IO problem

countResultsIOB

public static boolean countResultsIOB(List<? extends CoreMap> doc,
                                      Counter<String> entityTP,
                                      Counter<String> entityFP,
                                      Counter<String> entityFN)

tallyOneEntity

public static int tallyOneEntity(List<? extends CoreMap> doc,
                                 int index,
                                 Class<? extends CoreAnnotation<String>> source,
                                 Class<? extends CoreAnnotation<String>> target,
                                 Counter<String> positive,
                                 Counter<String> negative)

countResults

public static boolean countResults(List<? extends CoreMap> doc,
                                   Counter<String> entityTP,
                                   Counter<String> entityFP,
                                   Counter<String> entityFN)
Count the successes and failures of the model on the given document. Fills numbers in to counters for true positives, false positives, and false negatives, and also keeps track of the entities seen.
Returns false if we ever encounter null for gold or guess.


printResults

public static void printResults(Counter<String> entityTP,
                                Counter<String> entityFP,
                                Counter<String> entityFN)
Given counters of true positives, false positives, and false negatives, prints out precision, recall, and f1 for each key.


serializeClassifier

public abstract void serializeClassifier(String serializePath)
Serialize a sequence classifier to a file on the given path.

Parameters:
serializePath - The path/filename to write the classifier to.

loadClassifierNoExceptions

public void loadClassifierNoExceptions(InputStream in,
                                       Properties props)
Loads a classifier from the given input stream. The JVM shuts down (System.exit(1)) if there is an exception. This does not close the InputStream.

Parameters:
in - The InputStream to read from

loadClassifier

public void loadClassifier(InputStream in)
                    throws IOException,
                           ClassCastException,
                           ClassNotFoundException
Load a classifier from the specified InputStream. No extra properties are supplied. This does not close the InputStream.

Parameters:
in - The InputStream to load the serialized classifier from
Throws:
IOException - If there are problems accessing the input stream
ClassCastException - If there are problems interpreting the serialized data
ClassNotFoundException - If there are problems interpreting the serialized data

loadClassifier

public void loadClassifier(InputStream in,
                           Properties props)
                    throws IOException,
                           ClassCastException,
                           ClassNotFoundException
Load a classifier from the specified InputStream. The classifier is reinitialized from the flags serialized in the classifier. This does not close the InputStream.

Parameters:
in - The InputStream to load the serialized classifier from
props - This Properties object will be used to update the SeqClassifierFlags which are read from the serialized classifier
Throws:
IOException - If there are problems accessing the input stream
ClassCastException - If there are problems interpreting the serialized data
ClassNotFoundException - If there are problems interpreting the serialized data

loadClassifier

public abstract void loadClassifier(ObjectInputStream in,
                                    Properties props)
                             throws IOException,
                                    ClassCastException,
                                    ClassNotFoundException
Load a classifier from the specified input stream. The classifier is reinitialized from the flags serialized in the classifier.

Parameters:
in - The InputStream to load the serialized classifier from
props - This Properties object will be used to update the SeqClassifierFlags which are read from the serialized classifier
Throws:
IOException - If there are problems accessing the input stream
ClassCastException - If there are problems interpreting the serialized data
ClassNotFoundException - If there are problems interpreting the serialized data

loadClassifier

public void loadClassifier(String loadPath)
                    throws ClassCastException,
                           IOException,
                           ClassNotFoundException
Loads a classifier from the file specified by loadPath. If loadPath ends in .gz, uses a GZIPInputStream, else uses a regular FileInputStream.

Throws:
ClassCastException
IOException
ClassNotFoundException

loadClassifier

public void loadClassifier(String loadPath,
                           Properties props)
                    throws ClassCastException,
                           IOException,
                           ClassNotFoundException
Loads a classifier from the file specified by loadPath. If loadPath ends in .gz, uses a GZIPInputStream, else uses a regular FileInputStream.

Throws:
ClassCastException
IOException
ClassNotFoundException

loadClassifierNoExceptions

public void loadClassifierNoExceptions(String loadPath)

loadClassifierNoExceptions

public void loadClassifierNoExceptions(String loadPath,
                                       Properties props)

loadClassifier

public void loadClassifier(File file)
                    throws ClassCastException,
                           IOException,
                           ClassNotFoundException
Throws:
ClassCastException
IOException
ClassNotFoundException

loadClassifier

public void loadClassifier(File file,
                           Properties props)
                    throws ClassCastException,
                           IOException,
                           ClassNotFoundException
Loads a classifier from the file specified. If the file's name ends in .gz, uses a GZIPInputStream, else uses a regular FileInputStream. This method closes the File when done.

Parameters:
file - Loads a classifier from this file.
props - Properties in this object will be used to overwrite those specified in the serialized classifier
Throws:
IOException - If there are problems accessing the input stream
ClassCastException - If there are problems interpreting the serialized data
ClassNotFoundException - If there are problems interpreting the serialized data

loadClassifierNoExceptions

public void loadClassifierNoExceptions(File file)

loadClassifierNoExceptions

public void loadClassifierNoExceptions(File file,
                                       Properties props)

loadJarClassifier

public void loadJarClassifier(String modelName,
                              Properties props)
This function will load a classifier that is stored inside a jar file (if it is so stored). The classifier should be specified as its full filename, but the path in the jar file (/classifiers/) is coded in this class. If the classifier is not stored in the jar file or this is not run from inside a jar file, then this function will throw a RuntimeException.

Parameters:
modelName - The name of the model file. Iff it ends in .gz, then it is assumed to be gzip compressed.
props - A Properties object which can override certain properties in the serialized file, such as the DocumentReaderAndWriter. You can pass in null to override nothing.

printFeatures

protected void printFeatures(IN wi,
                             Collection<String> features)
Print the String features generated from a IN


printFeatureLists

protected void printFeatureLists(IN wi,
                                 Collection<List<String>> features)
Print the String features generated from a token



Stanford NLP Group