ConstantsAndVariables (Stanford CoreNLP API)

java.lang.Object
- edu.stanford.nlp.patterns.surface.ConstantsAndVariables

All Implemented Interfaces:: Serializable

public class ConstantsAndVariables
extends Object
implements Serializable

See Also:: Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`ConstantsAndVariables.DataSentsIterator`
`static class`	`ConstantsAndVariables.PatternForEachTokenWay`
`static class`	`ConstantsAndVariables.ScorePhraseMeasures`

Field Summary

Fields
Modifier and Type	Field and Description
`boolean`	`addIndvWordsFromPhrasesExceptLastAsNeg` For example, if positive seed dict contains "cancer" and "breast cancer" then "breast" is included as negative
`Map<String,Set<String>>`	`allowedNERsforLabels`
`Map<String,Set<String>>`	`allowedTagsInitials`
`String`	`allPatternsDir` Cached file of all patterns for all tokens
`String`	`backgroundSymbol`
`boolean`	`batchProcessSents` Use this option if you are limited by memory ; ignored if fileFormat is ser.
`boolean`	`clubNeighboringLabeledWords`
`String`	`commonWordsPatternFiles` Words to be ignored when learning phrases if `removePhrasesWithStopWords` or `removeStopWordsFromSelectedPhrases` is true.
`boolean`	`computeAllPatterns` If all patterns should be computed.
`int`	`debug` Debug flag for learning patterns.
`Map<String,Counter<String>>`	`dictOddsWeights`
`Map<String,Counter<Integer>>`	`distSimWeights`
`boolean`	`doNotApplyPatterns`
`boolean`	`doNotExtractPhraseAnyWordLabeledOtherClass` Especially useful for multi word phrase extraction.
`String`	`englishWordsFiles` English words that are not labeled when labeling using seed dictionaries
`Map<String,Env>`	`env` Environment for `TokenSequencePattern`
`String`	`externalFeatureWeightsFile`
`static String`	`extremedebug`
`List<String>`	`fillerWords`
`String`	`identifier` Save this run as ...
`Pattern`	`ignoreWordRegex` by default doesn't ignore anything.
`boolean`	`includeExternalFeatures`
`SentenceIndex`	`invertedIndex`
`Class<? extends SentenceIndex>`	`invertedIndexClass`
`String`	`invertedIndexDirectory` Where the inverted index (either in memory or lucene) is stored
`boolean`	`justify`
`boolean`	`loadInvertedIndex` You can load the inverted index using this file.
`double`	`LRSigma` Sigma for L2 regularization in Logisitic regression, if a classifier is used to score phrases
`boolean`	`matchLowerCaseContext` Lowercase the context words/lemmas
`int`	`maxExtractNumWords` Maximum number of words to learn
`static String`	`minimaldebug`
`int`	`minLen4FuzzyForPattern` Minimum length of words that can be matched fuzzily
`int`	`minPosPhraseSupportForPat` Remove patterns that have number of positive words less than this.
`int`	`minUnlabPhraseSupportForPat` Remove patterns that have number of unlabeled words is less than this.
`Integer`	`numIterationsForPatterns` Maximum number of iterations to run
`int`	`numPatterns` Maximum number of patterns learned in each iteration
`int`	`numThreads` Number of threads
`int`	`numWordsCompound`
`int`	`numWordsToAdd` Number of words to learn in each iteration
`String`	`otherSemanticClassesFiles` List of dictionary phrases that are negative for all labels to be learned.
`String`	`outDir` The output directory where the justifications of learning patterns and phrases would be saved.
`ConcurrentHashIndex<SurfacePattern>`	`patternIndex`
`GetPatternsFromDataMultiClass.PatternScoring`	`patternScoring` Pattern Scoring mechanism.
`double`	`perSelectNeg` These are used to learn weights for features if using logistic regression.
`double`	`perSelectRand` These are used to learn weights for features if using logistic regression.
`boolean`	`removeOverLappingLabelsFromSeed` Keeps only one label for each token, whichever has the longest
`boolean`	`removePhrasesWithStopWords`
`boolean`	`removeStopWordsFromSelectedPhrases`
`boolean`	`restrictToMatched` Currently, does not work correctly.
`boolean`	`saveInvertedIndex` You can save the inverted index.
`boolean`	`sqrtPatScore` If score for a pattern is square rooted
`String`	`stopWordsPatternFiles` Words that are not learned.
`ConstantsAndVariables.PatternForEachTokenWay`	`storePatsForEachToken`
`String`	`targetAllowedNERs` Allowed NERs for labels.
`String`	`targetAllowedTagsInitialsStr` Initials of all POS tags to use if `usePOS4Pattern` is true, separated by comma.
`double`	`thresholdNumPatternsApplied`
`double`	`thresholdSelectPattern` Threshold for learning a pattern
`double`	`thresholdWordExtract`
`boolean`	`tuneThresholdKeepRunning` Reduce pattern threshold (=0.8*current_value) to extract as many patterns as possible (still restricted by `numPatterns`)
`boolean`	`useContextNERRestriction` If the NER tag of the context tokens is not the background symbol, generalize the token with the NER tag
`boolean`	`useLemmaContextTokens` Use lemma instead of words for the context tokens
`boolean`	`useMatchingPhrase` Use the actual dictionary matching phrase(s) instead of the token word or lemma in calculating the stats
`boolean`	`useOtherLabelsWordsasNegative` use the seed dictionaries and the new words learned for the other labels in the previous iterations as negative
`boolean`	`usePatternEvalDomainNgram` Used only if `patternScoring` is `PhEvalInPat` or `PhEvalInPat`.
`boolean`	`usePatternEvalEditDistOther` Used only if `patternScoring` is `PhEvalInPat` or `PhEvalInPat`.
`boolean`	`usePatternEvalEditDistSame` Used only if `patternScoring` is `PhEvalInPat` or `PhEvalInPat`.
`boolean`	`usePatternEvalGoogleNgram` Used only if `patternScoring` is `PhEvalInPat` or `PhEvalInPat`.
`boolean`	`usePatternEvalSemanticOdds` Used only if `patternScoring` is `PhEvalInPat` or `PhEvalInPat`.
`boolean`	`usePatternEvalWordClass` Used only if `patternScoring` is `PhEvalInPat` or `PhEvalInPat`.
`boolean`	`usePatternEvalWordShape` Used only if `patternScoring` is `PhEvalInPat` or `PhEvalInPat`.
`boolean`	`usePatternResultAsLabel` Label words that are learned so that in further iterations we have more information
`boolean`	`usePhraseEvalDomainNgram` use domain tf-idf for learning phrases
`boolean`	`usePhraseEvalEditDistOther` Edit distance between this phrase and other phrases in other dictionaries
`boolean`	`usePhraseEvalEditDistSame` Edit distance between this phrase and the other phrases in the label dictionary
`boolean`	`usePhraseEvalGoogleNgram` use google tf-idf for learning phrases
`boolean`	`usePhraseEvalPatWtByFreq` use \sum_allpat pattern_wt_that_extracted_phrase/phrase_freq for learning phrases
`boolean`	`usePhraseEvalSemanticOdds` odds of the phrase freq in the label dictionary vs other dictionaries
`boolean`	`usePhraseEvalWordClass` Only works if you have single label.
`boolean`	`usePhraseEvalWordShape`
`boolean`	`useTargetNERRestriction` Add NER restriction to the target phrase in the patterns
`boolean`	`useTargetParserParentRestriction` Adds the parent's tag from the parse tree to the target phrase in the patterns
`String`	`wordIgnoreRegex` Do not learn phrases that match this regex.
`edu.stanford.nlp.patterns.surface.GetPatternsFromDataMultiClass.WordScoring`	`wordScoring`
`boolean`	`writeMatchedTokensFiles`

Constructor Summary

Constructors
Constructor and Description
`ConstantsAndVariables(Properties props, Map<String,Set<String>> labelDictionary, Map<String,Class<? extends TypesafeMap.Key<String>>> answerClass, Map<String,Class> generalizeClasses, Map<String,Map<Class,Object>> ignoreClasses)`
`ConstantsAndVariables(Properties props, Set<String> labels, Map<String,Class<? extends TypesafeMap.Key<String>>> answerClass)`
`ConstantsAndVariables(Properties props, Set<String> labels, Map<String,Class<? extends TypesafeMap.Key<String>>> answerClass, Map<String,Class> generalizeClasses)`
`ConstantsAndVariables(Properties props, Set<String> labels, Map<String,Class<? extends TypesafeMap.Key<String>>> answerClass, Map<String,Class> generalizeClasses, Map<String,Map<Class,Object>> ignoreClasses)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`addGeneralizeClasses(Map<String,Class> gen)`
`void`	`addLabelDictionary(String label, Set<String> words)`
`void`	`addWordShapes(String label, Set<String> words)`
`static String`	`containsFuzzy(Set<String> words, String w, int minLen4Fuzzy)`
`Map<String,Class<? extends TypesafeMap.Key<String>>>`	`getAnswerClass()`
`Set<String>`	`getCommonEngWords()`
`double`	`getEditDistanceFromEng(String ph, int minLen)`
`ConcurrentHashMap<String,Double>`	`getEditDistanceFromEnglishWords()`
`ConcurrentHashMap<String,String>`	`getEditDistanceFromEnglishWordsMatches()`
`Pair<String,Double>`	`getEditDistanceFromOtherSemanticClasses(String ph, int minLen)`
`Pair<String,Double>`	`getEditDistanceFromThisClass(String label, String ph, int minLen)`
`double`	`getEditDistanceScoresOtherClass(String g)`
`double`	`getEditDistanceScoresOtherClassThreshold(String g)` 1 if lies in edit distance, 0 if not close to any words
`double`	`getEditDistanceScoresThisClass(String label, String g)`
`double`	`getEditDistanceScoresThisClassThreshold(String label, String g)`
`Set<String>`	`getEnglishWords()`
`Map<String,Class>`	`getGeneralizeClasses()`
`Map<String,Integer>`	`getGeneralWordClassClusters()`
`Map<String,Map<Class,Object>>`	`getIgnoreWordswithClassesDuringSelection()`
`Map<String,Set<String>>`	`getLabelDictionary()`
`Set<String>`	`getOtherSemanticClassesWords()`
`ConcurrentHashIndex<SurfacePattern>`	`getPatternIndex()`
`Set<String>`	`getStopWords()`
`Map<String,Integer>`	`getWordClassClusters()`
`Map<String,String>`	`getWordShapeCache()`
`Map<String,Counter<String>>`	`getWordShapesForLabels()`
`static boolean`	`isFuzzyMatch(String w1, String w2, int minLen4Fuzzy)`
`void`	`setGeneralWordClassClusters(Map<String,Integer> generalWordClassClusters)`
`void`	`setLabelDictionary(Map<String,Set<String>> seedSets)`
`void`	`setOtherSemanticClassesWords(Set<String> other)`
`void`	`setPatternIndex(ConcurrentHashIndex<SurfacePattern> patternIndex)`
`void`	`setUp(Properties props)`
`void`	`setWordShapesForLabels(Map<String,Counter<String>> wordShapesForLabels)`
`Counter<Integer>`	`transformPatternsToIndex(Counter<SurfacePattern> pats)`
`Counter<SurfacePattern>`	`transformPatternsToSurface(Counter<Integer> pats)`
`Integer`	`transformPatternToIndex(SurfacePattern pat)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

numIterationsForPatterns

@Execution.Option(name="numIterationsForPatterns")
public Integer numIterationsForPatterns

Maximum number of iterations to run

numPatterns
```
@Execution.Option(name="numPatterns")
public int numPatterns
```
Maximum number of patterns learned in each iteration

outDir
```
@Execution.Option(name="outDir")
public String outDir
```
The output directory where the justifications of learning patterns and phrases would be saved. These are needed for visualization

allPatternsDir

@Execution.Option(name="allPatternsDir")
public String allPatternsDir

Cached file of all patterns for all tokens

computeAllPatterns
```
@Execution.Option(name="computeAllPatterns")
public boolean computeAllPatterns
```
If all patterns should be computed. Otherwise patterns are read from allPatternsFile

patternScoring

@Execution.Option(name="patternScoring")
public GetPatternsFromDataMultiClass.PatternScoring patternScoring

Pattern Scoring mechanism. See GetPatternsFromDataMultiClass.PatternScoring for options.

thresholdSelectPattern

@Execution.Option(name="thresholdSelectPattern")
public double thresholdSelectPattern

Threshold for learning a pattern

restrictToMatched
```
@Execution.Option(name="restrictToMatched")
public boolean restrictToMatched
```
Currently, does not work correctly. TODO: make this work. Ideally this would label words only when they occur in the context of any learned pattern. This comment seems old. Test it!

usePatternResultAsLabel
```
@Execution.Option(name="usePatternResultAsLabel")
public boolean usePatternResultAsLabel
```
Label words that are learned so that in further iterations we have more information

debug
```
@Execution.Option(name="debug")
public int debug
```
Debug flag for learning patterns. 0 means no output, 1 means necessary output, 2 means necessary output+some justification, 3 means extreme debug output

identifier

@Execution.Option(name="identifier")
public String identifier

Save this run as ...

useMatchingPhrase
```
@Execution.Option(name="useMatchingPhrase")
public boolean useMatchingPhrase
```
Use the actual dictionary matching phrase(s) instead of the token word or lemma in calculating the stats

tuneThresholdKeepRunning
```
@Execution.Option(name="tuneThresholdKeepRunning")
public boolean tuneThresholdKeepRunning
```
Reduce pattern threshold (=0.8*current_value) to extract as many patterns as possible (still restricted by numPatterns)

maxExtractNumWords

@Execution.Option(name="maxExtractNumWords")
public int maxExtractNumWords

Maximum number of words to learn

useOtherLabelsWordsasNegative
```
@Execution.Option(name="useOtherLabelsWordsasNegative")
public boolean useOtherLabelsWordsasNegative
```
use the seed dictionaries and the new words learned for the other labels in the previous iterations as negative

useLemmaContextTokens

@Execution.Option(name="useLemmaContextTokens")
public boolean useLemmaContextTokens

Use lemma instead of words for the context tokens

matchLowerCaseContext

@Execution.Option(name="matchLowerCaseContext")
public boolean matchLowerCaseContext

Lowercase the context words/lemmas

useTargetNERRestriction

@Execution.Option(name="useTargetNERRestriction")
public boolean useTargetNERRestriction

Add NER restriction to the target phrase in the patterns

targetAllowedTagsInitialsStr
```
@Execution.Option(name="targetAllowedTagsInitialsStr")
public String targetAllowedTagsInitialsStr
```
Initials of all POS tags to use if usePOS4Pattern is true, separated by comma.

allowedTagsInitials

public Map<String,Set<String>> allowedTagsInitials

targetAllowedNERs
```
@Execution.Option(name="targetAllowedNERs")
public String targetAllowedNERs
```
Allowed NERs for labels. Format is label1,NER1,NER11;label2,NER2,NER21,NER22;label3,... useTargetNERRestriction flag should be true

allowedNERsforLabels

public Map<String,Set<String>> allowedNERsforLabels

useTargetParserParentRestriction
```
@Execution.Option(name="useTargetParserParentRestriction")
public boolean useTargetParserParentRestriction
```
Adds the parent's tag from the parse tree to the target phrase in the patterns

useContextNERRestriction
```
@Execution.Option(name="useContextNERRestriction")
public boolean useContextNERRestriction
```
If the NER tag of the context tokens is not the background symbol, generalize the token with the NER tag

numWordsToAdd

@Execution.Option(name="numWordsToAdd")
public int numWordsToAdd

Number of words to learn in each iteration

thresholdNumPatternsApplied

@Execution.Option(name="thresholdNumPatternsApplied")
public double thresholdNumPatternsApplied

wordScoring

@Execution.Option(name="wordScoring")
public edu.stanford.nlp.patterns.surface.GetPatternsFromDataMultiClass.WordScoring wordScoring

thresholdWordExtract

@Execution.Option(name="thresholdWordExtract")
public double thresholdWordExtract

justify
```
public boolean justify
```

LRSigma
```
@Execution.Option(name="LRSigma")
public double LRSigma
```
Sigma for L2 regularization in Logisitic regression, if a classifier is used to score phrases

englishWordsFiles
```
@Execution.Option(name="englishWordsFiles")
public String englishWordsFiles
```
English words that are not labeled when labeling using seed dictionaries

commonWordsPatternFiles
```
@Execution.Option(name="commonWordsPatternFiles")
public String commonWordsPatternFiles
```
Words to be ignored when learning phrases if removePhrasesWithStopWords or removeStopWordsFromSelectedPhrases is true. Also, these words are considered negative when scoring a pattern (similar to othersemanticclasses).

otherSemanticClassesFiles
```
@Execution.Option(name="otherSemanticClassesFiles")
public String otherSemanticClassesFiles
```
List of dictionary phrases that are negative for all labels to be learned. Format is file_1,file_2,... where file_i has each phrase in a different line

minLen4FuzzyForPattern

@Execution.Option(name="minLen4FuzzyForPattern")
public int minLen4FuzzyForPattern

Minimum length of words that can be matched fuzzily

wordIgnoreRegex

@Execution.Option(name="wordIgnoreRegex")
public String wordIgnoreRegex

Do not learn phrases that match this regex.

numThreads

@Execution.Option(name="numThreads")
public int numThreads

Number of threads

stopWordsPatternFiles
```
@Execution.Option(name="stopWordsPatternFiles",
                  gloss="stop words")
public String stopWordsPatternFiles
```
Words that are not learned. Patterns are not created around these words. And, if useStopWordsBeforeTerm in CreatePatterns is true.

fillerWords
```
public List<String> fillerWords
```

env
```
public Map<String,Env> env
```
Environment for TokenSequencePattern

ignoreWordRegex
```
public Pattern ignoreWordRegex
```
by default doesn't ignore anything. What phrases to ignore.

removeStopWordsFromSelectedPhrases

@Execution.Option(name="removeStopWordsFromSelectedPhrases")
public boolean removeStopWordsFromSelectedPhrases

removePhrasesWithStopWords

@Execution.Option(name="removePhrasesWithStopWords")
public boolean removePhrasesWithStopWords

includeExternalFeatures

@Execution.Option(name="includeExternalFeatures")
public boolean includeExternalFeatures

externalFeatureWeightsFile

@Execution.Option(name="externalFeatureWeightsFile")
public String externalFeatureWeightsFile

doNotApplyPatterns

@Execution.Option(name="doNotApplyPatterns")
public boolean doNotApplyPatterns

numWordsCompound

@Execution.Option(name="numWordsCompound")
public int numWordsCompound

sqrtPatScore

@Execution.Option(name="sqrtPatScore")
public boolean sqrtPatScore

If score for a pattern is square rooted

minUnlabPhraseSupportForPat
```
@Execution.Option(name="minUnlabPhraseSupportForPat")
public int minUnlabPhraseSupportForPat
```
Remove patterns that have number of unlabeled words is less than this.

minPosPhraseSupportForPat
```
@Execution.Option(name="minPosPhraseSupportForPat")
public int minPosPhraseSupportForPat
```
Remove patterns that have number of positive words less than this.

addIndvWordsFromPhrasesExceptLastAsNeg
```
@Execution.Option(name="addIndvWordsFromPhrasesExceptLastAsNeg")
public boolean addIndvWordsFromPhrasesExceptLastAsNeg
```
For example, if positive seed dict contains "cancer" and "breast cancer" then "breast" is included as negative

distSimWeights

public Map<String,Counter<Integer>> distSimWeights

dictOddsWeights

public Map<String,Counter<String>> dictOddsWeights

invertedIndexClass

@Execution.Option(name="invertedIndexClass",
                  gloss="another option is Lucene backed, which is not included in the CoreNLP release. Contact us to get a copy (distributed under Apache License).")
public Class<? extends SentenceIndex> invertedIndexClass

invertedIndexDirectory

@Execution.Option(name="invertedIndexDirectory")
public String invertedIndexDirectory

Where the inverted index (either in memory or lucene) is stored

clubNeighboringLabeledWords

@Execution.Option(name="clubNeighboringLabeledWords")
public boolean clubNeighboringLabeledWords

removeOverLappingLabelsFromSeed

@Execution.Option(name="removeOverLappingLabelsFromSeed")
public boolean removeOverLappingLabelsFromSeed

Keeps only one label for each token, whichever has the longest

usePhraseEvalWordClass
```
@Execution.Option(name="usePhraseEvalWordClass")
public boolean usePhraseEvalWordClass
```
Only works if you have single label. And the word classes are given.

usePhraseEvalGoogleNgram

@Execution.Option(name="usePhraseEvalGoogleNgram")
public boolean usePhraseEvalGoogleNgram

use google tf-idf for learning phrases

usePhraseEvalDomainNgram

@Execution.Option(name="usePhraseEvalDomainNgram")
public boolean usePhraseEvalDomainNgram

use domain tf-idf for learning phrases

usePhraseEvalPatWtByFreq
```
@Execution.Option(name="usePhraseEvalPatWtByFreq")
public boolean usePhraseEvalPatWtByFreq
```
use \sum_allpat pattern_wt_that_extracted_phrase/phrase_freq for learning phrases

usePhraseEvalSemanticOdds
```
@Execution.Option(name="usePhraseEvalSemanticOdds")
public boolean usePhraseEvalSemanticOdds
```
odds of the phrase freq in the label dictionary vs other dictionaries

usePhraseEvalEditDistSame
```
@Execution.Option(name="usePhraseEvalEditDistSame")
public boolean usePhraseEvalEditDistSame
```
Edit distance between this phrase and the other phrases in the label dictionary

usePhraseEvalEditDistOther
```
@Execution.Option(name="usePhraseEvalEditDistOther")
public boolean usePhraseEvalEditDistOther
```
Edit distance between this phrase and other phrases in other dictionaries

usePhraseEvalWordShape

@Execution.Option(name="usePhraseEvalWordShape")
public boolean usePhraseEvalWordShape

usePatternEvalWordClass
```
@Execution.Option(name="usePatternEvalWordClass")
public boolean usePatternEvalWordClass
```
Used only if patternScoring is PhEvalInPat or PhEvalInPat. See usePhrase* for meanings.

usePatternEvalWordShape
```
@Execution.Option(name="usePatternEvalWordShape")
public boolean usePatternEvalWordShape
```
Used only if patternScoring is PhEvalInPat or PhEvalInPat. See usePhrase* for meanings.

usePatternEvalGoogleNgram
```
@Execution.Option(name="usePatternEvalGoogleNgram")
public boolean usePatternEvalGoogleNgram
```
Used only if patternScoring is PhEvalInPat or PhEvalInPat. See usePhrase* for meanings.

usePatternEvalDomainNgram
```
@Execution.Option(name="usePatternEvalDomainNgram")
public boolean usePatternEvalDomainNgram
```
Used only if patternScoring is PhEvalInPat or PhEvalInPat. See usePhrase* for meanings.

usePatternEvalSemanticOdds
```
@Execution.Option(name="usePatternEvalSemanticOdds")
public boolean usePatternEvalSemanticOdds
```
Used only if patternScoring is PhEvalInPat or PhEvalInPat. See usePhrase* for meanings.

usePatternEvalEditDistSame
```
@Execution.Option(name="usePatternEvalEditDistSame")
public boolean usePatternEvalEditDistSame
```
Used only if patternScoring is PhEvalInPat or PhEvalInPat. See usePhrase* for meanings.

usePatternEvalEditDistOther
```
@Execution.Option(name="usePatternEvalEditDistOther")
public boolean usePatternEvalEditDistOther
```
Used only if patternScoring is PhEvalInPat or PhEvalInPat. See usePhrase* for meanings.

perSelectRand
```
@Execution.Option(name="perSelectRand")
public double perSelectRand
```
These are used to learn weights for features if using logistic regression. Percentage of non-labeled tokens selected as negative.

perSelectNeg
```
@Execution.Option(name="perSelectNeg")
public double perSelectNeg
```
These are used to learn weights for features if using logistic regression. Percentage of negative tokens selected as negative.

doNotExtractPhraseAnyWordLabeledOtherClass
```
@Execution.Option(name="doNotExtractPhraseAnyWordLabeledOtherClass")
public boolean doNotExtractPhraseAnyWordLabeledOtherClass
```
Especially useful for multi word phrase extraction. Do not extract a phrase if any word is labeled with any other class.

saveInvertedIndex
```
@Execution.Option(name="saveInvertedIndex")
public boolean saveInvertedIndex
```
You can save the inverted index. Lucene index is saved by default to invertedIndexDirectory if given.

loadInvertedIndex
```
@Execution.Option(name="loadInvertedIndex")
public boolean loadInvertedIndex
```
You can load the inverted index using this file. If false and using lucene index, the existing directory is deleted and new index is made.

storePatsForEachToken

@Execution.Option(name="storePatsForEachToken",
                  gloss="used for storing patterns in PSQL")
public ConstantsAndVariables.PatternForEachTokenWay storePatsForEachToken

backgroundSymbol
```
public String backgroundSymbol
```

invertedIndex
```
public SentenceIndex invertedIndex
```

extremedebug
```
public static String extremedebug
```

minimaldebug
```
public static String minimaldebug
```

patternIndex

public ConcurrentHashIndex<SurfacePattern> patternIndex

batchProcessSents
```
@Execution.Option(name="batchProcessSents")
public boolean batchProcessSents
```
Use this option if you are limited by memory ; ignored if fileFormat is ser.

writeMatchedTokensFiles

@Execution.Option(name="writeMatchedTokensFiles")
public boolean writeMatchedTokensFiles

Constructor Detail

ConstantsAndVariables

public ConstantsAndVariables(Properties props,
                             Set<String> labels,
                             Map<String,Class<? extends TypesafeMap.Key<String>>> answerClass,
                             Map<String,Class> generalizeClasses,
                             Map<String,Map<Class,Object>> ignoreClasses)
                      throws IOException

Throws:: IOException

ConstantsAndVariables

public ConstantsAndVariables(Properties props,
                             Map<String,Set<String>> labelDictionary,
                             Map<String,Class<? extends TypesafeMap.Key<String>>> answerClass,
                             Map<String,Class> generalizeClasses,
                             Map<String,Map<Class,Object>> ignoreClasses)
                      throws IOException

Throws:: IOException

ConstantsAndVariables

public ConstantsAndVariables(Properties props,
                             Set<String> labels,
                             Map<String,Class<? extends TypesafeMap.Key<String>>> answerClass)
                      throws IOException

Throws:: IOException

ConstantsAndVariables

public ConstantsAndVariables(Properties props,
                             Set<String> labels,
                             Map<String,Class<? extends TypesafeMap.Key<String>>> answerClass,
                             Map<String,Class> generalizeClasses)
                      throws IOException

Throws:: IOException

Method Detail

getPatternIndex

public ConcurrentHashIndex<SurfacePattern> getPatternIndex()

setPatternIndex

public void setPatternIndex(ConcurrentHashIndex<SurfacePattern> patternIndex)

setUp

public void setUp(Properties props)
           throws IOException

Throws:: IOException

getWordShapesForLabels

public Map<String,Counter<String>> getWordShapesForLabels()

setWordShapesForLabels

public void setWordShapesForLabels(Map<String,Counter<String>> wordShapesForLabels)

addGeneralizeClasses

public void addGeneralizeClasses(Map<String,Class> gen)

getGeneralizeClasses

public Map<String,Class> getGeneralizeClasses()

getStopWords
```
public Set<String> getStopWords()
```

addWordShapes

public void addWordShapes(String label,
                          Set<String> words)

setLabelDictionary

public void setLabelDictionary(Map<String,Set<String>> seedSets)

getLabelDictionary

public Map<String,Set<String>> getLabelDictionary()

addLabelDictionary

public void addLabelDictionary(String label,
                               Set<String> words)

getEnglishWords
```
public Set<String> getEnglishWords()
```

getCommonEngWords

public Set<String> getCommonEngWords()

getOtherSemanticClassesWords

public Set<String> getOtherSemanticClassesWords()

setOtherSemanticClassesWords

public void setOtherSemanticClassesWords(Set<String> other)

getWordClassClusters

public Map<String,Integer> getWordClassClusters()

getEditDistanceFromThisClass

public Pair<String,Double> getEditDistanceFromThisClass(String label,
                                                        String ph,
                                                        int minLen)

getEditDistanceFromOtherSemanticClasses

public Pair<String,Double> getEditDistanceFromOtherSemanticClasses(String ph,
                                                                   int minLen)

getEditDistanceFromEng

public double getEditDistanceFromEng(String ph,
                                     int minLen)

getEditDistanceFromEnglishWords

public ConcurrentHashMap<String,Double> getEditDistanceFromEnglishWords()

getEditDistanceFromEnglishWordsMatches

public ConcurrentHashMap<String,String> getEditDistanceFromEnglishWordsMatches()

getEditDistanceScoresOtherClass

public double getEditDistanceScoresOtherClass(String g)

getEditDistanceScoresOtherClassThreshold
```
public double getEditDistanceScoresOtherClassThreshold(String g)
```
1 if lies in edit distance, 0 if not close to any words

Parameters:

g -

Returns:

getEditDistanceScoresThisClassThreshold

public double getEditDistanceScoresThisClassThreshold(String label,
                                                      String g)

getEditDistanceScoresThisClass

public double getEditDistanceScoresThisClass(String label,
                                             String g)

isFuzzyMatch

public static boolean isFuzzyMatch(String w1,
                                   String w2,
                                   int minLen4Fuzzy)

containsFuzzy

public static String containsFuzzy(Set<String> words,
                                   String w,
                                   int minLen4Fuzzy)

getGeneralWordClassClusters

public Map<String,Integer> getGeneralWordClassClusters()

setGeneralWordClassClusters

public void setGeneralWordClassClusters(Map<String,Integer> generalWordClassClusters)

getWordShapeCache

public Map<String,String> getWordShapeCache()

getAnswerClass

public Map<String,Class<? extends TypesafeMap.Key<String>>> getAnswerClass()

getIgnoreWordswithClassesDuringSelection

public Map<String,Map<Class,Object>> getIgnoreWordswithClassesDuringSelection()

transformPatternsToSurface

public Counter<SurfacePattern> transformPatternsToSurface(Counter<Integer> pats)

transformPatternsToIndex

public Counter<Integer> transformPatternsToIndex(Counter<SurfacePattern> pats)

transformPatternToIndex

public Integer transformPatternToIndex(SurfacePattern pat)

Class ConstantsAndVariables

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

numIterationsForPatterns

numPatterns

outDir

allPatternsDir

computeAllPatterns

patternScoring

thresholdSelectPattern

restrictToMatched

usePatternResultAsLabel

debug

identifier

useMatchingPhrase

tuneThresholdKeepRunning

maxExtractNumWords

useOtherLabelsWordsasNegative

useLemmaContextTokens

matchLowerCaseContext

useTargetNERRestriction

targetAllowedTagsInitialsStr

allowedTagsInitials

targetAllowedNERs

allowedNERsforLabels

useTargetParserParentRestriction

useContextNERRestriction

numWordsToAdd

thresholdNumPatternsApplied

wordScoring

thresholdWordExtract

justify

LRSigma

englishWordsFiles

commonWordsPatternFiles

otherSemanticClassesFiles

minLen4FuzzyForPattern

wordIgnoreRegex

numThreads

stopWordsPatternFiles

fillerWords

env

ignoreWordRegex

removeStopWordsFromSelectedPhrases

removePhrasesWithStopWords

includeExternalFeatures

externalFeatureWeightsFile

doNotApplyPatterns

numWordsCompound

sqrtPatScore

minUnlabPhraseSupportForPat

minPosPhraseSupportForPat

addIndvWordsFromPhrasesExceptLastAsNeg

distSimWeights

dictOddsWeights

invertedIndexClass

invertedIndexDirectory

clubNeighboringLabeledWords

removeOverLappingLabelsFromSeed

usePhraseEvalWordClass

usePhraseEvalGoogleNgram

usePhraseEvalDomainNgram

usePhraseEvalPatWtByFreq

usePhraseEvalSemanticOdds

usePhraseEvalEditDistSame

usePhraseEvalEditDistOther

usePhraseEvalWordShape

usePatternEvalWordClass

usePatternEvalWordShape

usePatternEvalGoogleNgram

usePatternEvalDomainNgram

usePatternEvalSemanticOdds

usePatternEvalEditDistSame

usePatternEvalEditDistOther

perSelectRand

perSelectNeg