|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.stanford.nlp.ie.AbstractSequenceClassifier
edu.stanford.nlp.ie.crf.CRFClassifier
public class CRFClassifier
Does Sequence Classification using a Conditional Random Field model.
The code has functionality for different document encodings, but when
using the standard ColumnDocumentReaderAndWriter
for training
or testing models, input files are expected to
be one word per line with the columns indicating things like the word,
POS, chunk, and class. When run on a file with -textFile
,
the file is assumed to be plain English text (or perhaps HTML/XML),
and a reasonable attempt is made at tokenization by
PlainTextDocumentReaderAndWriter
.
For running a trained model with a provided serialized classifier on a text file:
java -server -mx500m edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier
conll.ner.gz -textFile samplesentences.txt
When specifying all parameters in a properties file (train, test, or runtime):
java -server -mx1000m edu.stanford.nlp.ie.crf.CRFClassifier -prop propFile
To train and test a model from the command line:
java -mx1000m edu.stanford.nlp.ie.crf.CRFClassifier
-trainFile trainFile -testFile testFile -macro > output
FeatureFactory
.
NERFeatureFactory
is used by default, and
you should look there for feature templates and properties or flags that
will cause certain features to be used when training an NER classifier.
There is also
a ChineseFeatureFactory
, which is used for Chinese
word segmentation.
Features are specified either by a Properties file (which is the
recommended method) or on the command line. The features are read into
a SeqClassifierFlags
object, which the
user need not concern himself with unless he wishes to add new features.
CRFClassifier may also be used programatically. When creating a new
instance, you must
specify a properties file. The other way to get a CRFClassifier is to
deserialize one via getClassifier(String)
, which
returns a deserialized
classifier. You may then tag sentences using either the assorted
test
or testSentence
methods.
Nested Class Summary | |
---|---|
static class |
CRFClassifier.TestSequenceModel
|
Field Summary | |
---|---|
static String |
DEFAULT_CLASSIFIER
|
Fields inherited from class edu.stanford.nlp.ie.AbstractSequenceClassifier |
---|
classIndex, featureFactory, flags, JAR_CLASSIFIER_PATH, knownLCWords, pad, readerAndWriter, windowSize |
Constructor Summary | |
---|---|
protected |
CRFClassifier()
|
|
CRFClassifier(Properties props)
|
Method Summary | |
---|---|
protected void |
addProcessedData(List processedData,
int[][][][] data,
int[][] labels,
int offset)
Adds the List of Lists of CRFDatums to the data and labels arrays, treating each datum as if it were its own document. |
protected Index |
allLabels(int window,
Index classIndex)
|
Pair<int[][][][],int[][]> |
documentsToDataAndLabels(ObjectBank<List<FeatureLabel>> documents)
Convert an ObjectBank to arrays of data features and labels. |
Pair<int[][][],int[]> |
documentToDataAndLabels(List<FeatureLabel> document)
Convert a document List into arrays storing the data features and labels. |
void |
dropFeaturesBelowThreshold(double threshold)
|
protected List<CRFDatum> |
extractDatumSequence(int[][][] allData,
int beginPosition,
int endPosition,
List labeledWordInfos)
Creates a new CRFDatum from the preprocessed allData format, given the document number, position number, and a List of Object labels |
static CRFClassifier |
getClassifier(File file)
|
static CRFClassifier |
getClassifier(InputStream in)
|
static CRFClassifier |
getClassifier(String loadPath)
|
static CRFClassifier |
getClassifierNoExceptions(File file)
|
static CRFClassifier |
getClassifierNoExceptions(InputStream in)
|
static CRFClassifier |
getClassifierNoExceptions(String loadPath)
|
static CRFClassifier |
getDefaultClassifier()
Used to get the default supplied classifier. |
static CRFClassifier |
getJarClassifier(String resourceName,
Properties props)
Used to load a classifier stored as a resource inside a jar file. |
SequenceModel |
getSequenceModel(List<FeatureLabel> doc)
|
void |
loadClassifier(InputStream in,
Properties props)
Loads a classifier from the specified InputStream. |
void |
loadDefaultClassifier()
This is used to load the default supplied classifier stored within the jar file. |
protected List |
loadProcessedData(String filename)
|
static void |
main(String[] args)
The main method. |
CRFDatum |
makeDatum(List<FeatureLabel> info,
int loc)
|
CRFDatum |
makeDatum(List<FeatureLabel> info,
int loc,
FeatureFactory featureFactory)
|
void |
printFirstOrderProbs(String filename)
Takes the file, reads it in, and prints out the likelihood of each possible label at each point. |
void |
printFirstOrderProbsDocument(List<FeatureLabel> document)
Takes a List of FeatureLabel s and prints the likelihood
of each possible label at each point. |
void |
printFirstOrderProbsDocuments(ObjectBank<List<FeatureLabel>> documents)
Takes a List of documents and prints the likelihood
of each possible label at each point. |
void |
printProbsDocument(List<FeatureLabel> document)
Takes a List of FeatureLabel s and prints the likelihood
of each possible label at each point. |
protected void |
saveProcessedData(List datums,
String filename)
|
void |
serializeClassifier(String serializePath)
|
List<FeatureLabel> |
test(List<FeatureLabel> document)
Classify a List of FeatureLabel s. |
List<FeatureLabel> |
testGibbs(List<FeatureLabel> document)
|
List<FeatureLabel> |
testMaxEnt(List<FeatureLabel> document)
|
void |
train(ObjectBank<List<FeatureLabel>> docs)
Train a classifier: |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String DEFAULT_CLASSIFIER
Constructor Detail |
---|
protected CRFClassifier()
public CRFClassifier(Properties props)
Method Detail |
---|
public void dropFeaturesBelowThreshold(double threshold)
public Pair<int[][][],int[]> documentToDataAndLabels(List<FeatureLabel> document)
document
-
public Pair<int[][][][],int[][]> documentsToDataAndLabels(ObjectBank<List<FeatureLabel>> documents)
documents
-
protected Index allLabels(int window, Index classIndex)
public CRFDatum makeDatum(List<FeatureLabel> info, int loc)
public CRFDatum makeDatum(List<FeatureLabel> info, int loc, FeatureFactory featureFactory)
public List<FeatureLabel> test(List<FeatureLabel> document)
AbstractSequenceClassifier
List
of FeatureLabel
s.
test
in class AbstractSequenceClassifier
document
- A List
of FeatureLabel
s.
List
, but with the elements annotated
with their answers (with setAnswer()
).public SequenceModel getSequenceModel(List<FeatureLabel> doc)
getSequenceModel
in class AbstractSequenceClassifier
public List<FeatureLabel> testMaxEnt(List<FeatureLabel> document)
public List<FeatureLabel> testGibbs(List<FeatureLabel> document) throws ClassNotFoundException, SecurityException, NoSuchMethodException, IllegalArgumentException, InstantiationException, IllegalAccessException, InvocationTargetException
ClassNotFoundException
SecurityException
NoSuchMethodException
IllegalArgumentException
InstantiationException
IllegalAccessException
InvocationTargetException
public void printProbsDocument(List<FeatureLabel> document)
List
of FeatureLabel
s and prints the likelihood
of each possible label at each point.
printProbsDocument
in class AbstractSequenceClassifier
document
- A List
of FeatureLabel
s.public void printFirstOrderProbs(String filename)
filename
- The path to the specified filepublic void printFirstOrderProbsDocuments(ObjectBank<List<FeatureLabel>> documents)
List
of documents and prints the likelihood
of each possible label at each point.
documents
- A List
of List
of FeatureLabel
s.public void printFirstOrderProbsDocument(List<FeatureLabel> document)
List
of FeatureLabel
s and prints the likelihood
of each possible label at each point.
document
- A List
of FeatureLabel
s.public void train(ObjectBank<List<FeatureLabel>> docs)
train
in class AbstractSequenceClassifier
protected List<CRFDatum> extractDatumSequence(int[][][] allData, int beginPosition, int endPosition, List labeledWordInfos)
allData
- beginPosition
- endPosition
- labeledWordInfos
-
protected void addProcessedData(List processedData, int[][][][] data, int[][] labels, int offset)
processedData
- a List of Lists of CRFDatumsdata
- labels
- offset
- protected void saveProcessedData(List datums, String filename)
protected List loadProcessedData(String filename)
public void serializeClassifier(String serializePath)
serializeClassifier
in class AbstractSequenceClassifier
public void loadClassifier(InputStream in, Properties props) throws ClassCastException, IOException, ClassNotFoundException
loadClassifier
in class AbstractSequenceClassifier
in
- The InputStream to load the serialized classifier fromprops
- This Properties object will be used to update the SeqClassifierFlags which
are read from the serialized classifier
ClassCastException
IOException
ClassNotFoundException
public void loadDefaultClassifier()
public static CRFClassifier getDefaultClassifier()
public static CRFClassifier getJarClassifier(String resourceName, Properties props)
public static CRFClassifier getClassifierNoExceptions(File file)
public static CRFClassifier getClassifier(File file) throws IOException, ClassCastException, ClassNotFoundException
IOException
ClassCastException
ClassNotFoundException
public static CRFClassifier getClassifierNoExceptions(String loadPath)
public static CRFClassifier getClassifier(String loadPath) throws IOException, ClassCastException, ClassNotFoundException
IOException
ClassCastException
ClassNotFoundException
public static CRFClassifier getClassifierNoExceptions(InputStream in)
public static CRFClassifier getClassifier(InputStream in) throws IOException, ClassCastException, ClassNotFoundException
IOException
ClassCastException
ClassNotFoundException
public static void main(String[] args) throws Exception
Exception
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |