public class ChineseNumberSequenceClassifier extends AbstractSequenceClassifier<CoreLabel>
NumberSequenceClassifier
(without using SUTime) and works on Chinese sequence.
TODO: An interface needs to be used to reuse code for NumberSequenceClassifier
TODO: Ideally a Chinese version of SUTime needs to be used to provide more flexibility and accuracy.Modifier and Type | Field and Description |
---|---|
static java.util.regex.Pattern |
CURRENCY_WORD_PATTERN |
static java.lang.String[] |
CURRENCY_WORDS_VALUES |
static java.util.regex.Pattern |
DATE_PATTERN1 |
static java.util.regex.Pattern |
DATE_PATTERN2 |
static java.util.regex.Pattern |
DATE_PATTERN3 |
static java.util.regex.Pattern |
DATE_PATTERN4 |
static java.util.regex.Pattern |
DATE_PATTERN5 |
static java.lang.String |
DATE_TAG |
static java.util.HashSet<java.lang.String> |
DATE_WORDS |
static java.lang.String[] |
DATE_WORDS_VALUES |
static java.lang.String |
MONEY_TAG |
static java.lang.String |
NUMBER_TAG |
static java.lang.String |
ORDINAL_TAG |
static java.lang.String |
PERCENT_TAG |
static java.util.regex.Pattern |
PERCENT_WORD_PATTERN1 |
static java.util.regex.Pattern |
PERCENT_WORD_PATTERN2 |
static java.lang.String |
SUTIME_PROPERTY |
static java.util.regex.Pattern |
TIME_PATTERN1 |
static java.lang.String |
TIME_TAG |
static java.util.HashSet<java.lang.String> |
TIME_WORDS |
static java.lang.String[] |
TIME_WORDS_VALUES |
static boolean |
USE_SUTIME_DEFAULT |
static java.lang.String |
USE_SUTIME_PROPERTY |
static java.lang.String |
USE_SUTIME_PROPERTY_BASE |
classIndex, featureFactories, flags, knownLCWords, pad, windowSize
Constructor and Description |
---|
ChineseNumberSequenceClassifier() |
ChineseNumberSequenceClassifier(boolean useSUTime) |
ChineseNumberSequenceClassifier(java.util.Properties props,
boolean useSUTime,
java.util.Properties sutimeProps) |
Modifier and Type | Method and Description |
---|---|
java.util.List<CoreLabel> |
classify(java.util.List<CoreLabel> document)
Use a set of heuristic rules to assign NER tags to tokens.
|
java.util.List<CoreLabel> |
classifyWithGlobalInformation(java.util.List<CoreLabel> tokenSequence,
CoreMap document,
CoreMap sentence)
Classify a
List of something that extends CoreMap using as
additional information whatever is stored in the document and sentence. |
void |
loadClassifier(java.io.ObjectInputStream in,
java.util.Properties props)
Load a classifier from the specified input stream.
|
static void |
main(java.lang.String[] args) |
void |
serializeClassifier(java.io.ObjectOutputStream oos)
Serialize a sequence classifier to an object output stream
|
void |
serializeClassifier(java.lang.String serializePath)
Serialize a sequence classifier to a file on the given path.
|
void |
train(java.util.Collection<java.util.List<CoreLabel>> docs,
DocumentReaderAndWriter<CoreLabel> readerAndWriter)
Trains a classifier from a Collection of sequences.
|
apply, backgroundSymbol, classify, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswers, classifyAndWriteAnswersKBest, classifyAndWriteAnswersKBest, classifyAndWriteViterbiSearchGraph, classifyFile, classifyFilesAndWriteAnswers, classifyFilesAndWriteAnswers, classifyKBest, classifyRaw, classifySentence, classifySentenceWithGlobalInformation, classifyStdin, classifyStdin, classifyToCharacterOffsets, classifyToString, classifyToString, classifyWithInlineXML, countResults, countResultsSegmenter, defaultReaderAndWriter, dumpFeatures, finalizeClassification, getKnownLCWords, getSampler, getSequenceModel, labels, loadClassifier, loadClassifier, loadClassifier, loadClassifier, loadClassifier, loadClassifier, loadClassifierNoExceptions, loadClassifierNoExceptions, loadClassifierNoExceptions, loadClassifierNoExceptions, loadClassifierNoExceptions, makeObjectBankFromFile, makeObjectBankFromFile, makeObjectBankFromFiles, makeObjectBankFromFiles, makeObjectBankFromFiles, makeObjectBankFromReader, makeObjectBankFromString, makePlainTextReaderAndWriter, makePlainTextReaderAndWriter, makeReaderAndWriter, plainTextReaderAndWriter, printFeatureLists, printFeatures, printProbs, printProbs, printProbsDocument, printProbsDocuments, printResults, reinit, segmentString, segmentString, train, train, train, train, train, train, windowSize, writeAnswers
public static final boolean USE_SUTIME_DEFAULT
public static final java.lang.String USE_SUTIME_PROPERTY
public static final java.lang.String USE_SUTIME_PROPERTY_BASE
public static final java.lang.String SUTIME_PROPERTY
public static final java.lang.String NUMBER_TAG
public static final java.lang.String DATE_TAG
public static final java.lang.String TIME_TAG
public static final java.lang.String MONEY_TAG
public static final java.lang.String ORDINAL_TAG
public static final java.lang.String PERCENT_TAG
public static final java.util.regex.Pattern CURRENCY_WORD_PATTERN
public static final java.util.regex.Pattern PERCENT_WORD_PATTERN1
public static final java.util.regex.Pattern PERCENT_WORD_PATTERN2
public static final java.util.regex.Pattern DATE_PATTERN1
public static final java.util.regex.Pattern DATE_PATTERN2
public static final java.util.regex.Pattern DATE_PATTERN3
public static final java.util.regex.Pattern DATE_PATTERN4
public static final java.util.regex.Pattern DATE_PATTERN5
public static final java.util.regex.Pattern TIME_PATTERN1
public static final java.lang.String[] CURRENCY_WORDS_VALUES
public static final java.lang.String[] DATE_WORDS_VALUES
public static final java.util.HashSet<java.lang.String> DATE_WORDS
public static final java.lang.String[] TIME_WORDS_VALUES
public static final java.util.HashSet<java.lang.String> TIME_WORDS
public ChineseNumberSequenceClassifier()
public ChineseNumberSequenceClassifier(boolean useSUTime)
public ChineseNumberSequenceClassifier(java.util.Properties props, boolean useSUTime, java.util.Properties sutimeProps)
public java.util.List<CoreLabel> classify(java.util.List<CoreLabel> document)
classify
in class AbstractSequenceClassifier<CoreLabel>
document
- A List
of something that extends CoreMap
.public java.util.List<CoreLabel> classifyWithGlobalInformation(java.util.List<CoreLabel> tokenSequence, CoreMap document, CoreMap sentence)
AbstractSequenceClassifier
List
of something that extends CoreMap
using as
additional information whatever is stored in the document and sentence.
This is needed for SUTime (NumberSequenceClassifier), which requires
the document date to resolve relative dates.classifyWithGlobalInformation
in class AbstractSequenceClassifier<CoreLabel>
tokenSequence
- A List
of something that extends CoreMap
public void train(java.util.Collection<java.util.List<CoreLabel>> docs, DocumentReaderAndWriter<CoreLabel> readerAndWriter)
AbstractSequenceClassifier
train
in class AbstractSequenceClassifier<CoreLabel>
docs
- An ObjectBank or a collection of sequences of INreaderAndWriter
- A DocumentReaderAndWriter to use when loading test filespublic void serializeClassifier(java.lang.String serializePath)
AbstractSequenceClassifier
serializeClassifier
in class AbstractSequenceClassifier<CoreLabel>
serializePath
- The path/filename to write the classifier to.public void serializeClassifier(java.io.ObjectOutputStream oos)
AbstractSequenceClassifier
serializeClassifier
in class AbstractSequenceClassifier<CoreLabel>
public void loadClassifier(java.io.ObjectInputStream in, java.util.Properties props) throws java.io.IOException, java.lang.ClassCastException, java.lang.ClassNotFoundException
AbstractSequenceClassifier
loadClassifier
in class AbstractSequenceClassifier<CoreLabel>
in
- The InputStream to load the serialized classifier fromprops
- This Properties object will be used to update the
SeqClassifierFlags which are read from the serialized classifierjava.io.IOException
- If there are problems accessing the input streamjava.lang.ClassCastException
- If there are problems interpreting the serialized datajava.lang.ClassNotFoundException
- If there are problems interpreting the serialized datapublic static void main(java.lang.String[] args) throws java.io.IOException
java.io.IOException