|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.stanford.nlp.ie.AbstractSequenceClassifier
public abstract class AbstractSequenceClassifier
This class provides common functionality for (probabilistic) sequence models. It is a superclass of our CMM and CRF sequence classifiers, and is even used in the (deterministic) NumberSequenceClassifier. See implementing classes for more information.
Field Summary | |
---|---|
Index<String> |
classIndex
|
FeatureFactory |
featureFactory
|
SeqClassifierFlags |
flags
|
static String |
JAR_CLASSIFIER_PATH
|
protected Set<String> |
knownLCWords
|
protected CoreLabel |
pad
|
protected DocumentReaderAndWriter |
readerAndWriter
|
int |
windowSize
|
Constructor Summary | |
---|---|
AbstractSequenceClassifier()
This does nothing. |
Method Summary | |
---|---|
String |
apply(String in)
Maps a String input to an XML-formatted rendition of applying NER to the String. |
String |
backgroundSymbol()
|
Sampler<List<CoreLabel>> |
getSampler(List<? extends CoreLabel> input)
|
SequenceModel |
getSequenceModel(List<? extends CoreLabel> doc)
|
DFSA |
getViterbiSearchGraph(List<CoreLabel> doc,
Class<? extends CoreAnnotation<String>> answerField)
|
protected void |
init(Properties props)
|
protected void |
init(SeqClassifierFlags flags)
|
Set<String> |
labels()
|
void |
loadClassifier(File file)
|
void |
loadClassifier(File file,
Properties props)
Loads a classifier from the file specified by loadPath. |
void |
loadClassifier(InputStream in)
|
abstract void |
loadClassifier(InputStream in,
Properties props)
Load a classsifier from the specified input stream. |
void |
loadClassifier(String loadPath)
Loads a classifier from the file specified by loadPath. |
void |
loadClassifierNoExceptions(BufferedInputStream in)
Loads a classifier from the given input stream. |
void |
loadClassifierNoExceptions(File file)
|
void |
loadClassifierNoExceptions(File file,
Properties props)
|
void |
loadClassifierNoExceptions(String loadPath)
|
void |
loadClassifierNoExceptions(String loadPath,
Properties props)
|
void |
loadJarClassifier(String modelName,
Properties props)
This function will load a classifier that is stored inside a jar file (if it is so stored). |
ObjectBank<List<CoreLabel>> |
makeObjectBank(BufferedReader in)
|
protected ObjectBank<List<CoreLabel>> |
makeObjectBank(BufferedReader in,
boolean quietly)
Set up an ObjectBank that will allow one to iterate over a collection of documents obtained from the passed in Reader. |
ObjectBank<List<CoreLabel>> |
makeObjectBank(Collection<File> files)
|
ObjectBank<List<CoreLabel>> |
makeObjectBank(String filenameOrString)
|
ObjectBank<List<CoreLabel>> |
makeObjectBank(String[] trainFileList,
boolean quitely)
|
ObjectBank<List<CoreLabel>> |
makeObjectBank(String filenameOrString,
boolean quietly)
|
ObjectBank<List<CoreLabel>> |
makeObjectBank(String baseDir,
String filePattern,
boolean quietly)
|
void |
printProbs(String filename)
Takes the file, reads it in, and prints out the likelihood of each possible label at each point. |
abstract void |
printProbsDocument(List<CoreLabel> document)
|
void |
printProbsDocuments(ObjectBank<List<CoreLabel>> documents)
Takes a List of documents and prints the likelihood
of each possible label at each point. |
protected void |
reinit()
This method should be called after there have been changes to the flags (SeqClassifierFlags) variable, such as after deserializing a classifier. |
List<String> |
segmentString(String sentence)
ONLY USE IF LOADED A CHINESE WORD SEGMENTER!!!!! |
abstract void |
serializeClassifier(String serializePath)
|
abstract List<CoreLabel> |
test(List<CoreLabel> document)
Classify a List of CoreLabel s. |
void |
testAndWriteAnswers(Collection<File> testFiles)
|
void |
testAndWriteAnswers(String testFile)
Load a test file, run the classifier on it, and then print the answers to stdout (with timing to stderr). |
void |
testAndWriteAnswers(String baseDir,
String filePattern)
|
void |
testAndWriteAnswersKBest(String testFile,
int k)
Load a test file, run the classifier on it, and then print the answers to stdout (with timing to stderr). |
void |
testAndWriteViterbiSearchGraph(String testFile,
String searchGraphPrefix)
Load a test file, run the classifier on it, and then write a Viterbi search graph for each sequence. |
List<List<CoreLabel>> |
testFile(String filename)
Classify a Sentence . |
Counter<List<CoreLabel>> |
testKBest(List<CoreLabel> doc,
Class<? extends CoreAnnotation<String>> answerField,
int k)
|
List<CoreLabel> |
testSentence(List<? extends HasWord> sentence)
Classify a Sentence . |
List<List<CoreLabel>> |
testSentences(String sentences)
Classify a Sentence . |
List<CoreLabel> |
testSentenceWithCasing(List<CoreLabel> sentence)
Classify a List of CoreLabels using a TrueCasingDocumentReader. |
String |
testString(String sentences)
Classify the contents of a String . |
List<Triple<String,Integer,Integer>> |
testStringAndGetCharacterOffsets(String sentences)
Classify the contents of a String . |
String |
testStringInlineXML(String sentences)
Classify the contents of a String . |
String |
testStringXML(String sentences)
Classify the contents of a String . |
void |
train()
|
abstract void |
train(ObjectBank<List<CoreLabel>> docs)
|
void |
train(String filename)
|
void |
train(String[] trainFileList)
|
void |
train(String baseTrainDir,
String trainFiles)
|
void |
writeAnswers(List<CoreLabel> doc)
Write the classifications of the Sequence classifier out to stdout in a format determined by the DocumentReaderAndWriter used. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String JAR_CLASSIFIER_PATH
public SeqClassifierFlags flags
public Index<String> classIndex
protected DocumentReaderAndWriter readerAndWriter
public FeatureFactory featureFactory
protected CoreLabel pad
public int windowSize
protected Set<String> knownLCWords
Constructor Detail |
---|
public AbstractSequenceClassifier()
Method Detail |
---|
protected void init(Properties props)
protected void init(SeqClassifierFlags flags)
protected void reinit()
Implementation note: At the moment this variable doesn't set windowSize or featureFactory, since they are being serialized separately in the file, but we should probably stop serializing them and just reinitialize them from the flags?
public String backgroundSymbol()
public Set<String> labels()
public List<CoreLabel> testSentence(List<? extends HasWord> sentence)
Sentence
.
sentence
- The Sentence
to be classified.
Sentence
, where the classifier output for
each token is stored in its "answer" field.public SequenceModel getSequenceModel(List<? extends CoreLabel> doc)
public Sampler<List<CoreLabel>> getSampler(List<? extends CoreLabel> input)
public Counter<List<CoreLabel>> testKBest(List<CoreLabel> doc, Class<? extends CoreAnnotation<String>> answerField, int k)
public DFSA getViterbiSearchGraph(List<CoreLabel> doc, Class<? extends CoreAnnotation<String>> answerField)
public List<CoreLabel> testSentenceWithCasing(List<CoreLabel> sentence)
sentence
- a list of CoreLabels to be classifierd
public List<List<CoreLabel>> testSentences(String sentences)
Sentence
.
sentences
- The sentence(s) to be classified.
List
of classified Sentence
s.public List<List<CoreLabel>> testFile(String filename)
Sentence
.
filename
- Contains the sentence(s) to be classified.
List
of classified Sentence
s.public String apply(String in)
apply
in interface Function<String,String>
in
- The function's argument
public String testStringInlineXML(String sentences)
String
. Plain text or XML is
expected and the PlainTextDocumentReaderAndWriter
is used. Output
is in inline XML format (e.g. <PERSON>Bill Smith</PERSON>
went to <LOCATION>Paris</LOCATION> .)
sentences
- The string to be classified
String
with annotated with classification
information.public String testStringXML(String sentences)
String
. Plain text or XML is
expected and the PlainTextDocumentReaderAndWriter
is used. Output
is in XML format.
sentences
- The string to be classified
String
with annotated with classification
information.public String testString(String sentences)
String
. Plain text or XML is
expected and the PlainTextDocumentReaderAndWriter
is used. Output
looks like: My/O name/O is/O Bill/PERSON Smith/PERSON ./O
sentences
- The string to be classified
String
with annotated with classification
information.public List<Triple<String,Integer,Integer>> testStringAndGetCharacterOffsets(String sentences)
String
. Plain text or XML is
expected and the PlainTextDocumentReaderAndWriter
is used. Output
looks like: My/O name/O is/O Bill/PERSON Smith/PERSON ./O
sentences
- The string to be classified
String
with annotated with classification
information.public List<String> segmentString(String sentence)
sentence
- The string to be classified
public abstract List<CoreLabel> test(List<CoreLabel> document)
List
of CoreLabel
s.
document
- A List
of CoreLabel
s.
List
, but with the elements annotated
with their answers (with setAnswer()
).public void train()
public void train(String filename)
public void train(String baseTrainDir, String trainFiles)
public void train(String[] trainFileList)
public abstract void train(ObjectBank<List<CoreLabel>> docs)
public ObjectBank<List<CoreLabel>> makeObjectBank(String filenameOrString)
public ObjectBank<List<CoreLabel>> makeObjectBank(String filenameOrString, boolean quietly)
public ObjectBank<List<CoreLabel>> makeObjectBank(String[] trainFileList, boolean quitely)
public ObjectBank<List<CoreLabel>> makeObjectBank(String baseDir, String filePattern, boolean quietly)
public ObjectBank<List<CoreLabel>> makeObjectBank(Collection<File> files)
protected ObjectBank<List<CoreLabel>> makeObjectBank(BufferedReader in, boolean quietly)
flags.documentReader
,
and for some reader choices, the column mapping given in
flags.map
.
in
- Input data
addNEWLCWords do we add new lowercase words from this data to the word shape classifierquietly
- Print less messages if this is true (use when calling
it repeatedly on small bits of text)
public ObjectBank<List<CoreLabel>> makeObjectBank(BufferedReader in)
public void printProbs(String filename)
filename
- The path to the specified filepublic void printProbsDocuments(ObjectBank<List<CoreLabel>> documents)
List
of documents and prints the likelihood
of each possible label at each point.
documents
- A List
of List
of CoreLabel
s.public abstract void printProbsDocument(List<CoreLabel> document)
public void testAndWriteAnswers(String testFile) throws Exception
testFile
- The file to test on.
Exception
public void testAndWriteAnswers(String baseDir, String filePattern) throws Exception
Exception
public void testAndWriteAnswers(Collection<File> testFiles) throws Exception
Exception
public void testAndWriteAnswersKBest(String testFile, int k) throws Exception
testFile
- The file to test on.
Exception
public void testAndWriteViterbiSearchGraph(String testFile, String searchGraphPrefix) throws Exception
testFile
- The file to test on.
Exception
public void writeAnswers(List<CoreLabel> doc) throws Exception
outputEncoding
is defined, the output
is written in that character encoding, otherwise in the system default
character encoding.
doc
- Documents to write out
Exception
- If an IO problempublic abstract void serializeClassifier(String serializePath)
public void loadClassifierNoExceptions(BufferedInputStream in)
public void loadClassifier(InputStream in) throws IOException, ClassCastException, ClassNotFoundException
IOException
ClassCastException
ClassNotFoundException
public abstract void loadClassifier(InputStream in, Properties props) throws IOException, ClassCastException, ClassNotFoundException
in
- The InputStream to load the serialized classifier fromprops
- This Properties object will be used to update the SeqClassifierFlags which
are read from the serialized classifier
IOException
ClassCastException
ClassNotFoundException
public void loadClassifier(String loadPath) throws ClassCastException, IOException, ClassNotFoundException
ClassCastException
IOException
ClassNotFoundException
public void loadClassifierNoExceptions(String loadPath)
public void loadClassifierNoExceptions(String loadPath, Properties props)
public void loadClassifier(File file) throws ClassCastException, IOException, ClassNotFoundException
ClassCastException
IOException
ClassNotFoundException
public void loadClassifier(File file, Properties props) throws ClassCastException, IOException, ClassNotFoundException
ClassCastException
IOException
ClassNotFoundException
public void loadClassifierNoExceptions(File file)
public void loadClassifierNoExceptions(File file, Properties props)
public void loadJarClassifier(String modelName, Properties props)
/classifiers/
) is
coded in this class. If the classifier is not stored in the jar file
or this is not run from inside a jar file, then this function will
throw a RuntimeException.
modelName
- The name of the model file. Iff it ends in .gz, then
it is assumed to be gzip compressed.props
- A Properties object which can override certain properties
in the serialized file, such as the DocumentReaderAndWriter.
You can pass in null
to override nothing.
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |