ConstantsAndVariables (Stanford JavaNLP API)

java.lang.Object
- edu.stanford.nlp.patterns.ConstantsAndVariables

All Implemented Interfaces:: java.io.Serializable

public class ConstantsAndVariables
extends java.lang.Object
implements java.io.Serializable

See Also:: Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`ConstantsAndVariables.DataSentsIterator`
`static class`	`ConstantsAndVariables.PatternForEachTokenWay`
`static class`	`ConstantsAndVariables.PatternIndexWay`
`static class`	`ConstantsAndVariables.ScorePhraseMeasures`

Field Summary

Fields
Modifier and Type	Field and Description
`boolean`	`addIndvWordsFromPhrasesExceptLastAsNeg` For example, if positive seed dict contains "cancer" and "breast cancer" then "breast" is included as negative
`java.util.Map<java.lang.String,java.util.Set<java.lang.String>>`	`allowedNERsforLabels`
`java.util.Map<java.lang.String,java.util.Set<java.lang.String>>`	`allowedTagsInitials`
`java.lang.String`	`allPatternsDir` Cached file of all patterns for all tokens
`static java.lang.String`	`backgroundSymbol`
`boolean`	`batchProcessSents` Use this option if you are limited by memory ; ignored if fileFormat is ser.
`boolean`	`clubNeighboringLabeledWords`
`java.lang.String`	`commonWordsPatternFiles` Words to be ignored when learning phrases if `removePhrasesWithStopWords` or `removeStopWordsFromSelectedPhrases` is true.
`boolean`	`computeAllPatterns` If all patterns should be computed.
`int`	`debug` Debug flag for learning patterns.
`java.util.Map<java.lang.String,Counter<CandidatePhrase>>`	`dictOddsWeights`
`java.util.Map<java.lang.String,Counter<java.lang.Integer>>`	`distSimWeights`
`boolean`	`doNotApplyPatterns`
`boolean`	`doNotExtractPhraseAnyWordLabeledOtherClass` Especially useful for multi word phrase extraction.
`java.lang.String`	`englishWordsFiles` English words that are not labeled when labeling using seed dictionaries
`java.util.Map<java.lang.String,Env>`	`env` Environment for `TokenSequencePattern`
`boolean`	`evaluate`
`boolean`	`expandNegativesWhenSampling`
`int`	`expandPhrasesNumTopSimilar`
`boolean`	`expandPositivesWhenSampling`
`java.lang.String`	`externalFeatureWeightsDir`
`static java.lang.String`	`extremedebug`
`int`	`featureCountThreshold`
`java.util.List<java.lang.String>`	`functionWords`
`boolean`	`fuzzyMatch` Whether to do a fuzzy matching when matching seeds to text.
`static Env`	`globalEnv`
`java.lang.String`	`goldEntitiesEvalFiles`
`java.lang.String`	`identifier` Save this run as ...
`java.util.Map<java.lang.String,java.lang.String>`	`ignoreCaseSeedMatch` Ignore case when matching seed words.
`SentenceIndex`	`invertedIndex`
`java.lang.Class<? extends SentenceIndex>`	`invertedIndexClass`
`java.lang.String`	`invertedIndexDirectory` Where the inverted index (either in memory or lucene) is stored
`boolean`	`justify`
`boolean`	`learn`
`boolean`	`loadInvertedIndex` You can load the inverted index using this file.
`double`	`LRSigma` Sigma for L2 regularization in Logisitic regression, if a classifier is used to score phrases
`static boolean`	`matchLowerCaseContext` Lowercase the context words/lemmas
`int`	`maxExtractNumWords` Maximum number of words to learn
`static java.lang.String`	`minimaldebug`
`int`	`minLen4FuzzyForPattern` Minimum length of words that can be matched fuzzily
`int`	`minPosPhraseSupportForPat` Remove patterns that have number of positive words less than this.
`int`	`minUnlabPhraseSupportForPat` Remove patterns that have number of unlabeled words is less than this.
`java.lang.Integer`	`numIterationsForPatterns` Maximum number of iterations to run
`int`	`numPatterns` Maximum number of patterns learned in each iteration
`int`	`numThreads` Number of threads
`int`	`numWordsToAdd` Number of words to learn in each iteration
`java.lang.String`	`otherSemanticClassesFiles` List of dictionary phrases that are negative for all labels to be learned.
`java.lang.String`	`outDir` The output directory where the justifications of learning patterns and phrases would be saved.
`GetPatternsFromDataMultiClass.PatternScoring`	`patternScoring` Pattern Scoring mechanism.
`PatternFactory.PatternType`	`patternType`
`double`	`perSelectNeg` These are used to learn weights for features if using logistic regression.
`double`	`perSelectRand` These are used to learn weights for features if using logistic regression.
`double`	`positiveSimilarityThresholdLowPrecision`
`boolean`	`removeOverLappingLabelsFromSeed` Keeps only one label for each token, whichever has the longest
`boolean`	`removePhrasesWithStopWords`
`boolean`	`removeStopWordsFromSelectedPhrases`
`boolean`	`restrictToMatched` Currently, does not work correctly.
`boolean`	`saveInvertedIndex` You can save the inverted index.
`boolean`	`savePatternsWordsDir`
`java.lang.String`	`sentsOutFile`
`double`	`similarityThresholdHighPrecision`
`boolean`	`sqrtPatScore` If score for a pattern is square rooted
`java.lang.String`	`stopWordsPatternFiles` Words that are not learned.
`ConstantsAndVariables.PatternForEachTokenWay`	`storePatsForEachToken`
`boolean`	`subsampleUnkAsNegUsingSim`
`java.lang.String`	`targetAllowedNERs` Allowed NERs for labels.
`java.lang.String`	`targetAllowedTagsInitialsStr` Initials of all POS tags to use if `usePOS4Pattern` is true, separated by comma.
`double`	`thresholdNumPatternsApplied`
`double`	`thresholdSelectPattern` Threshold for learning a pattern
`double`	`thresholdWordExtract`
`boolean`	`tuneThresholdKeepRunning` Reduce pattern threshold (=0.8*current_value) to extract as many patterns as possible (still restricted by `numPatterns`)
`boolean`	`useMatchingPhrase` Use the actual dictionary matching phrase(s) instead of the token word or lemma in calculating the stats
`boolean`	`useOtherLabelsWordsasNegative` use the seed dictionaries and the new words learned for the other labels in the previous iterations as negative
`boolean`	`usePatternEvalBOW` use bag of words
`boolean`	`usePatternEvalDomainNgram` Used only if `patternScoring` is `PhEvalInPat` or `PhEvalInPat`.
`boolean`	`usePatternEvalEditDistOther` Used only if `patternScoring` is `PhEvalInPat` or `PhEvalInPatLogP`.
`boolean`	`usePatternEvalEditDistSame` Used only if `patternScoring` is `PhEvalInPat` or `PhEvalInPatLogP`.
`boolean`	`usePatternEvalFirstCapital`
`boolean`	`usePatternEvalGoogleNgram` Used only if `patternScoring` is `PhEvalInPat` or `PhEvalInPat`.
`boolean`	`usePatternEvalSemanticOdds` Used only if `patternScoring` is `PhEvalInPat` or `PhEvalInPatLogP`.
`boolean`	`usePatternEvalWordClass` Used only if `patternScoring` is `PhEvalInPat` or `PhEvalInPat`.
`boolean`	`usePatternEvalWordShape` Used only if `patternScoring` is `PhEvalInPat` or `PhEvalInPat`.
`boolean`	`usePatternEvalWordShapeStr`
`boolean`	`usePatternResultAsLabel` Label words that are learned so that in further iterations we have more information
`boolean`	`usePhraseEvalBOW` use bag of words
`boolean`	`usePhraseEvalDomainNgram` use domain tf-idf for learning phrases
`boolean`	`usePhraseEvalEditDistOther` Edit distance between this phrase and other phrases in other dictionaries
`boolean`	`usePhraseEvalEditDistSame` Edit distance between this phrase and the other phrases in the label dictionary
`boolean`	`usePhraseEvalFirstCapital`
`boolean`	`usePhraseEvalGoogleNgram` use google tf-idf for learning phrases.
`boolean`	`usePhraseEvalPatWtByFreq` use \sum_allpat pattern_wt_that_extracted_phrase/phrase_freq for learning phrases
`boolean`	`usePhraseEvalSemanticOdds` odds of the phrase freq in the label dictionary vs other dictionaries
`boolean`	`usePhraseEvalWordClass` Only works if you have single label.
`boolean`	`usePhraseEvalWordShape`
`boolean`	`usePhraseEvalWordShapeStr`
`boolean`	`usePhraseEvalWordVector` Only works if you have single label.
`boolean`	`useWordVectorsToComputeSim`
`java.lang.String`	`wordIgnoreRegex` Do not learn phrases that match this regex.
`edu.stanford.nlp.patterns.GetPatternsFromDataMultiClass.WordScoring`	`wordScoring`
`java.lang.String`	`wordVectorFile`
`boolean`	`writeMatchedTokensFiles`
`boolean`	`writeMatchedTokensIdsForEachPhrase`

Constructor Summary

Constructors
Constructor and Description
`ConstantsAndVariables(java.util.Properties props, java.util.Map<java.lang.String,java.util.Set<CandidatePhrase>> labelDictionary, java.util.Map<java.lang.String,java.lang.Class<? extends TypesafeMap.Key<java.lang.String>>> answerClass, java.util.Map<java.lang.String,java.lang.Class> generalizeClasses, java.util.Map<java.lang.String,java.util.Map<java.lang.Class,java.lang.Object>> ignoreClasses)`
`ConstantsAndVariables(java.util.Properties props, java.util.Set<java.lang.String> labels, java.util.Map<java.lang.String,java.lang.Class<? extends TypesafeMap.Key<java.lang.String>>> answerClass)`
`ConstantsAndVariables(java.util.Properties props, java.util.Set<java.lang.String> labels, java.util.Map<java.lang.String,java.lang.Class<? extends TypesafeMap.Key<java.lang.String>>> answerClass, java.util.Map<java.lang.String,java.lang.Class> generalizeClasses)`
`ConstantsAndVariables(java.util.Properties props, java.util.Set<java.lang.String> labels, java.util.Map<java.lang.String,java.lang.Class<? extends TypesafeMap.Key<java.lang.String>>> answerClass, java.util.Map<java.lang.String,java.lang.Class> generalizeClasses, java.util.Map<java.lang.String,java.util.Map<java.lang.Class,java.lang.Object>> ignoreClasses)`
`ConstantsAndVariables(java.util.Properties props, java.lang.String label, java.lang.Class<? extends TypesafeMap.Key<java.lang.String>> answerClass)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`addSeedWords(java.lang.String label, java.util.Collection<CandidatePhrase> seeds)`
`void`	`addWordShapes(java.lang.String label, java.util.Set<CandidatePhrase> words)`
`static CandidatePhrase`	`containsFuzzy(java.util.Set<CandidatePhrase> words, CandidatePhrase w, int minLen4Fuzzy)`
`java.util.Map<java.lang.String,java.lang.String>`	`getAllOptions()`
`java.util.Map<java.lang.String,java.lang.Class<? extends TypesafeMap.Key<java.lang.String>>>`	`getAnswerClass()`
`java.util.Set<java.lang.String>`	`getCommonEngWords()`
`java.util.concurrent.ConcurrentHashMap<java.lang.String,java.lang.Double>`	`getEditDistanceFromEnglishWords()`
`java.util.concurrent.ConcurrentHashMap<java.lang.String,java.lang.String>`	`getEditDistanceFromEnglishWordsMatches()`
`Pair<java.lang.String,java.lang.Double>`	`getEditDistanceFromOtherClasses(java.lang.String label, java.lang.String ph, int minLen)`
`Pair<java.lang.String,java.lang.Double>`	`getEditDistanceFromThisClass(java.lang.String label, java.lang.String ph, int minLen)`
`double`	`getEditDistanceScoresOtherClass(java.lang.String label, java.lang.String g)`
`double`	`getEditDistanceScoresOtherClassThreshold(java.lang.String label, java.lang.String g)` 1 if lies in edit distance, 0 if not close to any words
`double`	`getEditDistanceScoresThisClass(java.lang.String label, java.lang.String g)`
`double`	`getEditDistanceScoresThisClassThreshold(java.lang.String label, java.lang.String g)`
`java.util.Set<java.lang.String>`	`getEnglishWords()`
`static java.util.Map<java.lang.String,java.lang.Class>`	`getGeneralizeClasses()`
`java.util.Map<java.lang.String,java.lang.Integer>`	`getGeneralWordClassClusters()`
`java.util.Map<java.lang.String,java.util.Map<java.lang.Class,java.lang.Object>>`	`getIgnoreWordswithClassesDuringSelection()`
`java.util.Set<java.lang.String>`	`getLabels()`
`Counter<CandidatePhrase>`	`getLearnedWords(java.lang.String label)`
`java.lang.String`	`getLearnedWordsAsJson()`
`java.lang.String`	`getLearnedWordsAsJsonLastIteration()`
`java.util.Map<java.lang.String,java.util.TreeMap<java.lang.Integer,Counter<CandidatePhrase>>>`	`getLearnedWordsEachIter()`
`java.util.TreeMap<java.lang.Integer,Counter<CandidatePhrase>>`	`getLearnedWordsEachIter(java.lang.String label)`
`java.util.Set<CandidatePhrase>`	`getOtherSemanticClassesWords()`
`java.util.Map<java.lang.String,java.util.Set<CandidatePhrase>>`	`getSeedLabelDictionary()`
`java.lang.String`	`getSetWordsAsJson(java.util.Map<java.lang.String,Counter<CandidatePhrase>> words)`
`static java.util.Set<CandidatePhrase>`	`getStopWords()`
`java.util.Map<java.lang.String,java.lang.Integer>`	`getWordClassClusters()`
`java.util.Map<java.lang.String,java.lang.String>`	`getWordShapeCache()`
`java.util.Map<java.lang.String,Counter<java.lang.String>>`	`getWordShapesForLabels()`
`boolean`	`hasSeedWordOrOtherSem(CandidatePhrase p)`
`static boolean`	`isFuzzyMatch(java.lang.String w1, java.lang.String w2, int minLen4Fuzzy)`
`static java.lang.Iterable<java.io.File>`	`listFileIncludingItself(java.lang.String file)`
`void`	`setGeneralWordClassClusters(java.util.Map<java.lang.String,java.lang.Integer> generalWordClassClusters)`
`void`	`setLearnedWordsEachIter(java.util.TreeMap<java.lang.Integer,Counter<CandidatePhrase>> words, java.lang.String label)`
`void`	`setOtherSemanticClassesWords(java.util.Set<CandidatePhrase> other)`
`void`	`setUp(java.util.Properties props)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

numIterationsForPatterns

@ArgumentParser.Option(name="numIterationsForPatterns")
public java.lang.Integer numIterationsForPatterns

Maximum number of iterations to run

numPatterns

@ArgumentParser.Option(name="numPatterns")
public int numPatterns

Maximum number of patterns learned in each iteration

outDir
```
@ArgumentParser.Option(name="outDir")
public java.lang.String outDir
```
The output directory where the justifications of learning patterns and phrases would be saved. These are needed for visualization

allPatternsDir

@ArgumentParser.Option(name="allPatternsDir")
public java.lang.String allPatternsDir

Cached file of all patterns for all tokens

computeAllPatterns
```
@ArgumentParser.Option(name="computeAllPatterns")
public boolean computeAllPatterns
```
If all patterns should be computed. Otherwise patterns are read from allPatternsFile

patternScoring

@ArgumentParser.Option(name="patternScoring")
public GetPatternsFromDataMultiClass.PatternScoring patternScoring

Pattern Scoring mechanism. See GetPatternsFromDataMultiClass.PatternScoring for options.

thresholdSelectPattern

@ArgumentParser.Option(name="thresholdSelectPattern")
public double thresholdSelectPattern

Threshold for learning a pattern

restrictToMatched
```
@ArgumentParser.Option(name="restrictToMatched")
public boolean restrictToMatched
```
Currently, does not work correctly. TODO: make this work. Ideally this would label words only when they occur in the context of any learned pattern. This comment seems old. Test it!

usePatternResultAsLabel
```
@ArgumentParser.Option(name="usePatternResultAsLabel")
public boolean usePatternResultAsLabel
```
Label words that are learned so that in further iterations we have more information

debug
```
@ArgumentParser.Option(name="debug")
public int debug
```
Debug flag for learning patterns. 0 means no output, 1 means necessary output, 2 means necessary output+some justification, 3 means extreme debug output

identifier

@ArgumentParser.Option(name="identifier")
public java.lang.String identifier

Save this run as ...

useMatchingPhrase
```
@ArgumentParser.Option(name="useMatchingPhrase")
public boolean useMatchingPhrase
```
Use the actual dictionary matching phrase(s) instead of the token word or lemma in calculating the stats

tuneThresholdKeepRunning
```
@ArgumentParser.Option(name="tuneThresholdKeepRunning")
public boolean tuneThresholdKeepRunning
```
Reduce pattern threshold (=0.8*current_value) to extract as many patterns as possible (still restricted by numPatterns)

maxExtractNumWords

@ArgumentParser.Option(name="maxExtractNumWords")
public int maxExtractNumWords

Maximum number of words to learn

useOtherLabelsWordsasNegative
```
@ArgumentParser.Option(name="useOtherLabelsWordsasNegative")
public boolean useOtherLabelsWordsasNegative
```
use the seed dictionaries and the new words learned for the other labels in the previous iterations as negative

matchLowerCaseContext

@ArgumentParser.Option(name="matchLowerCaseContext")
public static boolean matchLowerCaseContext

Lowercase the context words/lemmas

targetAllowedTagsInitialsStr

@ArgumentParser.Option(name="targetAllowedTagsInitialsStr")
public java.lang.String targetAllowedTagsInitialsStr

Initials of all POS tags to use if usePOS4Pattern is true, separated by comma.

allowedTagsInitials

public java.util.Map<java.lang.String,java.util.Set<java.lang.String>> allowedTagsInitials

targetAllowedNERs
```
@ArgumentParser.Option(name="targetAllowedNERs")
public java.lang.String targetAllowedNERs
```
Allowed NERs for labels. Format is label1,NER1,NER11;label2,NER2,NER21,NER22;label3,... useTargetNERRestriction flag should be true

allowedNERsforLabels

public java.util.Map<java.lang.String,java.util.Set<java.lang.String>> allowedNERsforLabels

numWordsToAdd

@ArgumentParser.Option(name="numWordsToAdd")
public int numWordsToAdd

Number of words to learn in each iteration

thresholdNumPatternsApplied

@ArgumentParser.Option(name="thresholdNumPatternsApplied")
public double thresholdNumPatternsApplied

wordScoring

@ArgumentParser.Option(name="wordScoring")
public edu.stanford.nlp.patterns.GetPatternsFromDataMultiClass.WordScoring wordScoring

thresholdWordExtract

@ArgumentParser.Option(name="thresholdWordExtract")
public double thresholdWordExtract

justify
```
public boolean justify
```

LRSigma
```
@ArgumentParser.Option(name="LRSigma")
public double LRSigma
```
Sigma for L2 regularization in Logisitic regression, if a classifier is used to score phrases

englishWordsFiles

@ArgumentParser.Option(name="englishWordsFiles")
public java.lang.String englishWordsFiles

English words that are not labeled when labeling using seed dictionaries

commonWordsPatternFiles
```
@ArgumentParser.Option(name="commonWordsPatternFiles")
public java.lang.String commonWordsPatternFiles
```
Words to be ignored when learning phrases if removePhrasesWithStopWords or removeStopWordsFromSelectedPhrases is true. Also, these words are considered negative when scoring a pattern (similar to othersemanticclasses).

otherSemanticClassesFiles
```
@ArgumentParser.Option(name="otherSemanticClassesFiles")
public java.lang.String otherSemanticClassesFiles
```
List of dictionary phrases that are negative for all labels to be learned. Format is file_1,file_2,... where file_i has each phrase in a different line

minLen4FuzzyForPattern

@ArgumentParser.Option(name="minLen4FuzzyForPattern")
public int minLen4FuzzyForPattern

Minimum length of words that can be matched fuzzily

wordIgnoreRegex

@ArgumentParser.Option(name="wordIgnoreRegex")
public java.lang.String wordIgnoreRegex

Do not learn phrases that match this regex.

numThreads

@ArgumentParser.Option(name="numThreads")
public int numThreads

Number of threads

stopWordsPatternFiles

@ArgumentParser.Option(name="stopWordsPatternFiles",
                       gloss="stop words")
public java.lang.String stopWordsPatternFiles

Words that are not learned. Patterns are not created around these words. And, if useStopWordsBeforeTerm in CreatePatterns is true.

env

public java.util.Map<java.lang.String,Env> env

Environment for TokenSequencePattern

globalEnv
```
public static Env globalEnv
```

removeStopWordsFromSelectedPhrases

@ArgumentParser.Option(name="removeStopWordsFromSelectedPhrases")
public boolean removeStopWordsFromSelectedPhrases

removePhrasesWithStopWords

@ArgumentParser.Option(name="removePhrasesWithStopWords")
public boolean removePhrasesWithStopWords

externalFeatureWeightsDir

@ArgumentParser.Option(name="externalFeatureWeightsFile")
public java.lang.String externalFeatureWeightsDir

doNotApplyPatterns

@ArgumentParser.Option(name="doNotApplyPatterns")
public boolean doNotApplyPatterns

sqrtPatScore

@ArgumentParser.Option(name="sqrtPatScore")
public boolean sqrtPatScore

If score for a pattern is square rooted

minUnlabPhraseSupportForPat

@ArgumentParser.Option(name="minUnlabPhraseSupportForPat")
public int minUnlabPhraseSupportForPat

Remove patterns that have number of unlabeled words is less than this.

minPosPhraseSupportForPat

@ArgumentParser.Option(name="minPosPhraseSupportForPat")
public int minPosPhraseSupportForPat

Remove patterns that have number of positive words less than this.

addIndvWordsFromPhrasesExceptLastAsNeg
```
@ArgumentParser.Option(name="addIndvWordsFromPhrasesExceptLastAsNeg")
public boolean addIndvWordsFromPhrasesExceptLastAsNeg
```
For example, if positive seed dict contains "cancer" and "breast cancer" then "breast" is included as negative

distSimWeights

public java.util.Map<java.lang.String,Counter<java.lang.Integer>> distSimWeights

dictOddsWeights

public java.util.Map<java.lang.String,Counter<CandidatePhrase>> dictOddsWeights

invertedIndexClass

@ArgumentParser.Option(name="invertedIndexClass",
                       gloss="another option is Lucene backed, which is not included in the CoreNLP release. Contact us to get a copy (distributed under Apache License).")
public java.lang.Class<? extends SentenceIndex> invertedIndexClass

invertedIndexDirectory

@ArgumentParser.Option(name="invertedIndexDirectory")
public java.lang.String invertedIndexDirectory

Where the inverted index (either in memory or lucene) is stored

clubNeighboringLabeledWords

@ArgumentParser.Option(name="clubNeighboringLabeledWords")
public boolean clubNeighboringLabeledWords

patternType

@ArgumentParser.Option(name="patternType")
public PatternFactory.PatternType patternType

subsampleUnkAsNegUsingSim

@ArgumentParser.Option(name="subsampleUnkAsNegUsingSim",
                       gloss="When learning a classifier, remove phrases from unknown phrases that are too close to the positive phrases")
public boolean subsampleUnkAsNegUsingSim

expandPositivesWhenSampling

@ArgumentParser.Option(name="expandPositivesWhenSampling",
                       gloss="when sampling for learning feature wts for learning phrases, expand the positives")
public boolean expandPositivesWhenSampling

expandNegativesWhenSampling

@ArgumentParser.Option(name="expandNegativesWhenSampling",
                       gloss="when sampling for learning feature wts for learning phrases, expand the negatives")
public boolean expandNegativesWhenSampling

similarityThresholdHighPrecision

@ArgumentParser.Option(name="similarityThresholdHighPrecision",
                       gloss="used for expanding positives")
public double similarityThresholdHighPrecision

positiveSimilarityThresholdLowPrecision

@ArgumentParser.Option(name="positiveSimilarityThresholdLowPrecision",
                       gloss="used for not choosing close unknowns as positives")
public double positiveSimilarityThresholdLowPrecision

wordVectorFile

@ArgumentParser.Option(name="wordVectorFile",
                       gloss="if using word vectors for computing similarities")
public java.lang.String wordVectorFile

useWordVectorsToComputeSim

@ArgumentParser.Option(name="useWordVectorsToComputeSim",
                       gloss="use vectors directly instead of word classes for computing similarity")
public boolean useWordVectorsToComputeSim

goldEntitiesEvalFiles

@ArgumentParser.Option(name="goldEntitiesEvalFiles",
                       gloss="label1,gold_list_of_entities_file;label2,...")
public java.lang.String goldEntitiesEvalFiles

evaluate

@ArgumentParser.Option(name="evaluate")
public boolean evaluate

featureCountThreshold

@ArgumentParser.Option(name="featureCountThreshold")
public int featureCountThreshold

expandPhrasesNumTopSimilar

@ArgumentParser.Option(name="expandPhrasesNumTopSimilar",
                       gloss="k in kNN")
public int expandPhrasesNumTopSimilar

fuzzyMatch
```
@ArgumentParser.Option(name="fuzzyMatch")
public boolean fuzzyMatch
```
Whether to do a fuzzy matching when matching seeds to text. You can tune minLen4FuzzyForPattern parameter.

ignoreCaseSeedMatch

@ArgumentParser.Option(name="ignoreCaseSeedMatch")
public java.util.Map<java.lang.String,java.lang.String> ignoreCaseSeedMatch

Ignore case when matching seed words. It's a map so something like {name->true,place->false}

sentsOutFile

@ArgumentParser.Option(name="sentsOutFile")
public java.lang.String sentsOutFile

savePatternsWordsDir

@ArgumentParser.Option(name="savePatternsWordsDir")
public boolean savePatternsWordsDir

learn

@ArgumentParser.Option(name="learn")
public boolean learn

removeOverLappingLabelsFromSeed

@ArgumentParser.Option(name="removeOverLappingLabelsFromSeed")
public boolean removeOverLappingLabelsFromSeed

Keeps only one label for each token, whichever has the longest

usePhraseEvalWordClass

@ArgumentParser.Option(name="usePhraseEvalWordClass")
public boolean usePhraseEvalWordClass

Only works if you have single label. And the word classes are given.

usePhraseEvalWordVector

@ArgumentParser.Option(name="usePhraseEvalWordVector")
public boolean usePhraseEvalWordVector

Only works if you have single label. And the word vectors are given.

usePhraseEvalGoogleNgram
```
@ArgumentParser.Option(name="usePhraseEvalGoogleNgram")
public boolean usePhraseEvalGoogleNgram
```
use google tf-idf for learning phrases. Need to also provide googleNgram_dbname, googleNgram_username and googleNgram_host

usePhraseEvalDomainNgram

@ArgumentParser.Option(name="usePhraseEvalDomainNgram")
public boolean usePhraseEvalDomainNgram

use domain tf-idf for learning phrases

usePhraseEvalPatWtByFreq
```
@ArgumentParser.Option(name="usePhraseEvalPatWtByFreq")
public boolean usePhraseEvalPatWtByFreq
```
use \sum_allpat pattern_wt_that_extracted_phrase/phrase_freq for learning phrases

usePhraseEvalSemanticOdds

@ArgumentParser.Option(name="usePhraseEvalSemanticOdds")
public boolean usePhraseEvalSemanticOdds

odds of the phrase freq in the label dictionary vs other dictionaries

usePhraseEvalEditDistSame
```
@ArgumentParser.Option(name="usePhraseEvalEditDistSame")
public boolean usePhraseEvalEditDistSame
```
Edit distance between this phrase and the other phrases in the label dictionary

usePhraseEvalEditDistOther

@ArgumentParser.Option(name="usePhraseEvalEditDistOther")
public boolean usePhraseEvalEditDistOther

Edit distance between this phrase and other phrases in other dictionaries

usePhraseEvalWordShape

@ArgumentParser.Option(name="usePhraseEvalWordShape",
                       gloss="% of phrases of that label that have the same word shape")
public boolean usePhraseEvalWordShape

usePhraseEvalWordShapeStr

@ArgumentParser.Option(name="usePhraseEvalWordShapeStr",
                       gloss="uses the word shape str as a feature")
public boolean usePhraseEvalWordShapeStr

usePhraseEvalFirstCapital

@ArgumentParser.Option(name="usePhraseEvalFirstCapital",
                       gloss="words starts with a capital letter")
public boolean usePhraseEvalFirstCapital

usePhraseEvalBOW

@ArgumentParser.Option(name="usePhraseEvalBOW")
public boolean usePhraseEvalBOW

use bag of words

usePatternEvalWordClass
```
@ArgumentParser.Option(name="usePatternEvalWordClass")
public boolean usePatternEvalWordClass
```
Used only if patternScoring is PhEvalInPat or PhEvalInPat. See usePhrase* for meanings.

usePatternEvalWordShape
```
@ArgumentParser.Option(name="usePatternEvalWordShape")
public boolean usePatternEvalWordShape
```
Used only if patternScoring is PhEvalInPat or PhEvalInPat. See usePhrase* for meanings.

usePatternEvalWordShapeStr

@ArgumentParser.Option(name="usePatternEvalWordShapeStr",
                       gloss="uses the word shape str as a feature")
public boolean usePatternEvalWordShapeStr

usePatternEvalFirstCapital

@ArgumentParser.Option(name="usePatternEvalFirstCapital",
                       gloss="words starts with a capital letter")
public boolean usePatternEvalFirstCapital

usePatternEvalGoogleNgram
```
@ArgumentParser.Option(name="usePatternEvalGoogleNgram")
public boolean usePatternEvalGoogleNgram
```
Used only if patternScoring is PhEvalInPat or PhEvalInPat. See usePhrase* for meanings.

usePatternEvalDomainNgram
```
@ArgumentParser.Option(name="usePatternEvalDomainNgram")
public boolean usePatternEvalDomainNgram
```
Used only if patternScoring is PhEvalInPat or PhEvalInPat. See usePhrase* for meanings. Need to also provide googleNgram_dbname, googleNgram_username and googleNgram_host

usePatternEvalSemanticOdds
```
@ArgumentParser.Option(name="usePatternEvalSemanticOdds")
public boolean usePatternEvalSemanticOdds
```
Used only if patternScoring is PhEvalInPat or PhEvalInPatLogP. See usePhrase* for meanings.

usePatternEvalEditDistSame
```
@ArgumentParser.Option(name="usePatternEvalEditDistSame")
public boolean usePatternEvalEditDistSame
```
Used only if patternScoring is PhEvalInPat or PhEvalInPatLogP. See usePhrase* for meanings.

usePatternEvalEditDistOther
```
@ArgumentParser.Option(name="usePatternEvalEditDistOther")
public boolean usePatternEvalEditDistOther
```
Used only if patternScoring is PhEvalInPat or PhEvalInPatLogP. See usePhrase* for meanings.

usePatternEvalBOW

@ArgumentParser.Option(name="usePatternEvalBOW")
public boolean usePatternEvalBOW

use bag of words

perSelectRand
```
@ArgumentParser.Option(name="perSelectRand")
public double perSelectRand
```
These are used to learn weights for features if using logistic regression. Percentage of non-labeled tokens selected as negative.

perSelectNeg
```
@ArgumentParser.Option(name="perSelectNeg")
public double perSelectNeg
```
These are used to learn weights for features if using logistic regression. Percentage of negative tokens selected as negative.

doNotExtractPhraseAnyWordLabeledOtherClass
```
@ArgumentParser.Option(name="doNotExtractPhraseAnyWordLabeledOtherClass")
public boolean doNotExtractPhraseAnyWordLabeledOtherClass
```
Especially useful for multi word phrase extraction. Do not extract a phrase if any word is labeled with any other class.

saveInvertedIndex
```
@ArgumentParser.Option(name="saveInvertedIndex")
public boolean saveInvertedIndex
```
You can save the inverted index. Lucene index is saved by default to invertedIndexDirectory if given.

loadInvertedIndex
```
@ArgumentParser.Option(name="loadInvertedIndex")
public boolean loadInvertedIndex
```
You can load the inverted index using this file. If false and using lucene index, the existing directory is deleted and new index is made.

storePatsForEachToken

@ArgumentParser.Option(name="storePatsForEachToken",
                       gloss="used for storing patterns in PSQL/MEMORY/LUCENE")
public ConstantsAndVariables.PatternForEachTokenWay storePatsForEachToken

backgroundSymbol

public static java.lang.String backgroundSymbol

invertedIndex
```
public SentenceIndex invertedIndex
```

extremedebug

public static java.lang.String extremedebug

minimaldebug

public static java.lang.String minimaldebug

functionWords

public java.util.List<java.lang.String> functionWords

batchProcessSents
```
@ArgumentParser.Option(name="batchProcessSents")
public boolean batchProcessSents
```
Use this option if you are limited by memory ; ignored if fileFormat is ser.

writeMatchedTokensFiles

@ArgumentParser.Option(name="writeMatchedTokensFiles")
public boolean writeMatchedTokensFiles

writeMatchedTokensIdsForEachPhrase

@ArgumentParser.Option(name="writeMatchedTokensIdsForEachPhrase")
public boolean writeMatchedTokensIdsForEachPhrase

Constructor Detail

ConstantsAndVariables

public ConstantsAndVariables(java.util.Properties props,
                             java.util.Set<java.lang.String> labels,
                             java.util.Map<java.lang.String,java.lang.Class<? extends TypesafeMap.Key<java.lang.String>>> answerClass,
                             java.util.Map<java.lang.String,java.lang.Class> generalizeClasses,
                             java.util.Map<java.lang.String,java.util.Map<java.lang.Class,java.lang.Object>> ignoreClasses)
                      throws java.io.IOException

Throws:: java.io.IOException

ConstantsAndVariables

public ConstantsAndVariables(java.util.Properties props,
                             java.util.Map<java.lang.String,java.util.Set<CandidatePhrase>> labelDictionary,
                             java.util.Map<java.lang.String,java.lang.Class<? extends TypesafeMap.Key<java.lang.String>>> answerClass,
                             java.util.Map<java.lang.String,java.lang.Class> generalizeClasses,
                             java.util.Map<java.lang.String,java.util.Map<java.lang.Class,java.lang.Object>> ignoreClasses)
                      throws java.io.IOException

Throws:: java.io.IOException

ConstantsAndVariables

public ConstantsAndVariables(java.util.Properties props,
                             java.util.Set<java.lang.String> labels,
                             java.util.Map<java.lang.String,java.lang.Class<? extends TypesafeMap.Key<java.lang.String>>> answerClass)
                      throws java.io.IOException

Throws:: java.io.IOException

ConstantsAndVariables

public ConstantsAndVariables(java.util.Properties props,
                             java.lang.String label,
                             java.lang.Class<? extends TypesafeMap.Key<java.lang.String>> answerClass)
                      throws java.io.IOException

Throws:: java.io.IOException

ConstantsAndVariables

public ConstantsAndVariables(java.util.Properties props,
                             java.util.Set<java.lang.String> labels,
                             java.util.Map<java.lang.String,java.lang.Class<? extends TypesafeMap.Key<java.lang.String>>> answerClass,
                             java.util.Map<java.lang.String,java.lang.Class> generalizeClasses)
                      throws java.io.IOException

Throws:: java.io.IOException

Method Detail

getLabels

public java.util.Set<java.lang.String> getLabels()

getAllOptions

public java.util.Map<java.lang.String,java.lang.String> getAllOptions()

hasSeedWordOrOtherSem

public boolean hasSeedWordOrOtherSem(CandidatePhrase p)

getLearnedWordsEachIter

public java.util.TreeMap<java.lang.Integer,Counter<CandidatePhrase>> getLearnedWordsEachIter(java.lang.String label)

getLearnedWordsEachIter

public java.util.Map<java.lang.String,java.util.TreeMap<java.lang.Integer,Counter<CandidatePhrase>>> getLearnedWordsEachIter()

setLearnedWordsEachIter

public void setLearnedWordsEachIter(java.util.TreeMap<java.lang.Integer,Counter<CandidatePhrase>> words,
                                    java.lang.String label)

setUp

public void setUp(java.util.Properties props)
           throws java.io.IOException

Throws:: java.io.IOException

listFileIncludingItself

public static java.lang.Iterable<java.io.File> listFileIncludingItself(java.lang.String file)

getWordShapesForLabels

public java.util.Map<java.lang.String,Counter<java.lang.String>> getWordShapesForLabels()

getGeneralizeClasses

public static java.util.Map<java.lang.String,java.lang.Class> getGeneralizeClasses()

getStopWords

public static java.util.Set<CandidatePhrase> getStopWords()

addWordShapes

public void addWordShapes(java.lang.String label,
                          java.util.Set<CandidatePhrase> words)

getSeedLabelDictionary

public java.util.Map<java.lang.String,java.util.Set<CandidatePhrase>> getSeedLabelDictionary()

getLearnedWords

public Counter<CandidatePhrase> getLearnedWords(java.lang.String label)

getLearnedWordsAsJson

public java.lang.String getLearnedWordsAsJson()

getLearnedWordsAsJsonLastIteration

public java.lang.String getLearnedWordsAsJsonLastIteration()

getSetWordsAsJson

public java.lang.String getSetWordsAsJson(java.util.Map<java.lang.String,Counter<CandidatePhrase>> words)

getEnglishWords

public java.util.Set<java.lang.String> getEnglishWords()

getCommonEngWords

public java.util.Set<java.lang.String> getCommonEngWords()

getOtherSemanticClassesWords

public java.util.Set<CandidatePhrase> getOtherSemanticClassesWords()

setOtherSemanticClassesWords

public void setOtherSemanticClassesWords(java.util.Set<CandidatePhrase> other)

getWordClassClusters

public java.util.Map<java.lang.String,java.lang.Integer> getWordClassClusters()

getEditDistanceFromThisClass

public Pair<java.lang.String,java.lang.Double> getEditDistanceFromThisClass(java.lang.String label,
                                                                            java.lang.String ph,
                                                                            int minLen)

getEditDistanceFromOtherClasses

public Pair<java.lang.String,java.lang.Double> getEditDistanceFromOtherClasses(java.lang.String label,
                                                                               java.lang.String ph,
                                                                               int minLen)

getEditDistanceFromEnglishWords

public java.util.concurrent.ConcurrentHashMap<java.lang.String,java.lang.Double> getEditDistanceFromEnglishWords()

getEditDistanceFromEnglishWordsMatches

public java.util.concurrent.ConcurrentHashMap<java.lang.String,java.lang.String> getEditDistanceFromEnglishWordsMatches()

getEditDistanceScoresOtherClass

public double getEditDistanceScoresOtherClass(java.lang.String label,
                                              java.lang.String g)

getEditDistanceScoresOtherClassThreshold

public double getEditDistanceScoresOtherClassThreshold(java.lang.String label,
                                                       java.lang.String g)

1 if lies in edit distance, 0 if not close to any words

Parameters:: g -
Returns:

getEditDistanceScoresThisClassThreshold

public double getEditDistanceScoresThisClassThreshold(java.lang.String label,
                                                      java.lang.String g)

getEditDistanceScoresThisClass

public double getEditDistanceScoresThisClass(java.lang.String label,
                                             java.lang.String g)

isFuzzyMatch

public static boolean isFuzzyMatch(java.lang.String w1,
                                   java.lang.String w2,
                                   int minLen4Fuzzy)

containsFuzzy

public static CandidatePhrase containsFuzzy(java.util.Set<CandidatePhrase> words,
                                            CandidatePhrase w,
                                            int minLen4Fuzzy)

getGeneralWordClassClusters

public java.util.Map<java.lang.String,java.lang.Integer> getGeneralWordClassClusters()

setGeneralWordClassClusters

public void setGeneralWordClassClusters(java.util.Map<java.lang.String,java.lang.Integer> generalWordClassClusters)

getWordShapeCache

public java.util.Map<java.lang.String,java.lang.String> getWordShapeCache()

getAnswerClass

public java.util.Map<java.lang.String,java.lang.Class<? extends TypesafeMap.Key<java.lang.String>>> getAnswerClass()

getIgnoreWordswithClassesDuringSelection

public java.util.Map<java.lang.String,java.util.Map<java.lang.Class,java.lang.Object>> getIgnoreWordswithClassesDuringSelection()

addSeedWords

public void addSeedWords(java.lang.String label,
                         java.util.Collection<CandidatePhrase> seeds)
                  throws java.lang.Exception

Throws:: java.lang.Exception

Class ConstantsAndVariables

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

numIterationsForPatterns

numPatterns

outDir

allPatternsDir

computeAllPatterns

patternScoring

thresholdSelectPattern

restrictToMatched

usePatternResultAsLabel

debug

identifier

useMatchingPhrase

tuneThresholdKeepRunning

maxExtractNumWords

useOtherLabelsWordsasNegative

matchLowerCaseContext

targetAllowedTagsInitialsStr

allowedTagsInitials

targetAllowedNERs

allowedNERsforLabels

numWordsToAdd

thresholdNumPatternsApplied

wordScoring

thresholdWordExtract

justify

LRSigma

englishWordsFiles

commonWordsPatternFiles

otherSemanticClassesFiles

minLen4FuzzyForPattern

wordIgnoreRegex

numThreads

stopWordsPatternFiles

env

globalEnv

removeStopWordsFromSelectedPhrases

removePhrasesWithStopWords

externalFeatureWeightsDir

doNotApplyPatterns

sqrtPatScore

minUnlabPhraseSupportForPat

minPosPhraseSupportForPat

addIndvWordsFromPhrasesExceptLastAsNeg

distSimWeights

dictOddsWeights

invertedIndexClass

invertedIndexDirectory

clubNeighboringLabeledWords

patternType

subsampleUnkAsNegUsingSim

expandPositivesWhenSampling

expandNegativesWhenSampling

similarityThresholdHighPrecision

positiveSimilarityThresholdLowPrecision

wordVectorFile

useWordVectorsToComputeSim

goldEntitiesEvalFiles

evaluate

featureCountThreshold

expandPhrasesNumTopSimilar

fuzzyMatch

ignoreCaseSeedMatch

sentsOutFile

savePatternsWordsDir

learn

removeOverLappingLabelsFromSeed

usePhraseEvalWordClass

usePhraseEvalWordVector

usePhraseEvalGoogleNgram

usePhraseEvalDomainNgram

usePhraseEvalPatWtByFreq

usePhraseEvalSemanticOdds

usePhraseEvalEditDistSame