|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.stanford.nlp.classify.AbstractLinearClassifierFactory<L,F>
edu.stanford.nlp.classify.LinearClassifierFactory<L,F>
public class LinearClassifierFactory<L,F>
Builds various types of linear classifiers, with functionality for
setting objective function, optimization method, and other parameters.
Classifiers can be defined with passed constructor arguments or using setter methods.
Defaults to Quasi-newton optimization of a LogConditionalObjectiveFunction
(Merges old classes: CGLinearClassifierFactory, QNLinearClassifierFactory, and MaxEntClassifierFactory).
trainSemiSupGE(edu.stanford.nlp.classify.GeneralDataset, java.util.List extends edu.stanford.nlp.ling.Datum>, java.util.List, double)
methodsNested Class Summary | |
---|---|
static class |
LinearClassifierFactory.LinearClassifierCreator<L,F>
|
Field Summary | |
---|---|
protected static double[] |
sigmasToTry
|
Constructor Summary | |
---|---|
LinearClassifierFactory()
|
|
LinearClassifierFactory(boolean useSum)
|
|
LinearClassifierFactory(double tol)
|
|
LinearClassifierFactory(double tol,
boolean useSum,
double sigma)
|
|
LinearClassifierFactory(double tol,
boolean useSum,
int prior,
double sigma,
double epsilon)
|
|
LinearClassifierFactory(double tol,
boolean useSum,
int prior,
double sigma,
double epsilon,
int mem)
|
|
LinearClassifierFactory(Minimizer<DiffFunction> min)
|
|
LinearClassifierFactory(Minimizer<DiffFunction> min,
boolean useSum)
|
|
LinearClassifierFactory(Minimizer<DiffFunction> min,
double tol,
boolean useSum)
|
|
LinearClassifierFactory(Minimizer<DiffFunction> min,
double tol,
boolean useSum,
double sigma)
|
|
LinearClassifierFactory(Minimizer<DiffFunction> min,
double tol,
boolean useSum,
int prior,
double sigma)
|
|
LinearClassifierFactory(Minimizer<DiffFunction> min,
double tol,
boolean useSum,
int prior,
double sigma,
double epsilon)
Create a factory that builds linear classifiers from training data. |
|
LinearClassifierFactory(Minimizer<DiffFunction> min,
double tol,
boolean useSum,
LogPrior logPrior)
|
Method Summary | |
---|---|
double[][] |
adaptWeights(double[][] origWeights,
GeneralDataset<L,F> adaptDataset)
Adapt classifier (adjust the mean of Gaussian prior) under construction -pichuan |
void |
crossValidateSetSigma(GeneralDataset<L,F> dataset)
Calls the method crossValidateSetSigma(GeneralDataset, int) with 5-fold cross-validation. |
void |
crossValidateSetSigma(GeneralDataset<L,F> dataset,
int kfold)
callls the method crossValidateSetSigma(GeneralDataset, int, Scorer, LineSearcher) with
multi-class log-likelihood scoring (see MultiClassAccuracyStats ) and golden-section line search
(see GoldenSectionLineSearch ). |
void |
crossValidateSetSigma(GeneralDataset<L,F> dataset,
int kfold,
LineSearcher minimizer)
|
void |
crossValidateSetSigma(GeneralDataset<L,F> dataset,
int kfold,
Scorer<L> scorer)
|
void |
crossValidateSetSigma(GeneralDataset<L,F> dataset,
int kfold,
Scorer<L> scorer,
LineSearcher minimizer)
Sets the sigma parameter to a value that optimizes the cross-validation score given by scorer . |
LinearClassifierFactory.LinearClassifierCreator<L,F> |
getClassifierCreator(GeneralDataset<L,F> dataset)
|
double |
getSigma()
|
double[] |
heldOutSetSigma(GeneralDataset<L,F> train)
|
double[] |
heldOutSetSigma(GeneralDataset<L,F> train,
GeneralDataset<L,F> dev)
|
double[] |
heldOutSetSigma(GeneralDataset<L,F> train,
GeneralDataset<L,F> dev,
LineSearcher minimizer)
|
double[] |
heldOutSetSigma(GeneralDataset<L,F> train,
GeneralDataset<L,F> dev,
Scorer<L> scorer)
|
double[] |
heldOutSetSigma(GeneralDataset<L,F> trainSet,
GeneralDataset<L,F> devSet,
Scorer<L> scorer,
LineSearcher minimizer)
Sets the sigma parameter to a value that optimizes the held-out score given by scorer . |
double[] |
heldOutSetSigma(GeneralDataset<L,F> train,
Scorer<L> scorer)
|
Classifier<String,String> |
loadFromFilename(String file)
Given the path to a file representing the text based serialization of a Linear Classifier, reconstitutes and returns that LinearClassifier. |
void |
resetWeight()
NOTE: Nothing is actually done with this value. |
void |
setEpsilon(double eps)
Sets the epsilon value for LogConditionalObjectiveFunction . |
boolean |
setEvaluators(int iters,
Evaluator[] evaluators)
|
void |
setHeldOutSearcher(LineSearcher heldOutSearcher)
Set the LineSearcher to be used in heldOutSetSigma(GeneralDataset, GeneralDataset) . |
void |
setMem(int mem)
Set the mem value for QNMinimizer . |
void |
setMinimizer(Minimizer<DiffFunction> min)
Sets the minimizer. |
void |
setPrior(LogPrior logPrior)
Set the prior. |
void |
setRetrainFromScratchAfterSigmaTuning(boolean retrainFromScratchAfterSigmaTuning)
If set to true, then when training a classifier, after an optimal sigma is chosen a model is relearned from scratch. |
void |
setSigma(double sigma)
|
void |
setTol(double tol)
Set the tolerance. |
void |
setTuneSigmaCV(int folds)
setTuneSigmaCV sets the tuneSigmaCV flag: when turned on,
the sigma is tuned by cross-validation. |
void |
setTuneSigmaHeldOut()
setTuneSigmaHeldOut sets the tuneSigmaHeldOut flag: when turned on,
the sigma is tuned by means of held-out (70%-30%). |
void |
setUseSum(boolean useSum)
NOTE: nothing is actually done with this value! SetUseSum sets the useSum flag: when turned on,
the Summed Conditional Objective Function is used. |
void |
setVerbose(boolean verbose)
Set the verbose flag for CGMinimizer . |
LinearClassifier<L,F> |
trainClassifier(GeneralDataset<L,F> dataset)
Trains a Classifier on a Dataset . |
LinearClassifier<L,F> |
trainClassifier(GeneralDataset<L,F> dataset,
double[] initial)
|
Classifier<L,F> |
trainClassifier(GeneralDataset<L,F> dataset,
float[] dataWeights,
LogPrior prior)
|
Classifier<L,F> |
trainClassifier(Iterable<Datum<L,F>> dataIterable)
|
LinearClassifier<L,F> |
trainClassifier(List<RVFDatum<L,F>> examples)
Deprecated. |
Classifier<L,F> |
trainClassifierSemiSup(GeneralDataset<L,F> data,
GeneralDataset<L,F> biasedData,
double[][] confusionMatrix,
double[] initial)
IMPORTANT: dataset and biasedDataset must have same featureIndex, labelIndex |
LinearClassifier<L,F> |
trainClassifierV(GeneralDataset<L,F> train,
double min,
double max,
boolean accuracy)
Train a classifier with a sigma tuned on a validation set. |
LinearClassifier<L,F> |
trainClassifierV(GeneralDataset<L,F> train,
GeneralDataset<L,F> validation,
double min,
double max,
boolean accuracy)
Train a classifier with a sigma tuned on a validation set. |
LinearClassifier<L,F> |
trainSemiSupGE(GeneralDataset<L,F> labeledDataset,
List<? extends Datum<L,F>> unlabeledDataList)
Trains the linear classifier using Generalized Expectation criteria as described in Generalized Expectation Criteria for Semi Supervised Learning of Conditional Random Fields, Mann and McCallum, ACL 2008. |
LinearClassifier<L,F> |
trainSemiSupGE(GeneralDataset<L,F> labeledDataset,
List<? extends Datum<L,F>> unlabeledDataList,
double convexComboCoeff)
|
LinearClassifier<L,F> |
trainSemiSupGE(GeneralDataset<L,F> labeledDataset,
List<? extends Datum<L,F>> unlabeledDataList,
List<F> GEFeatures,
double convexComboCoeff)
Trains the linear classifier using Generalized Expectation criteria as described in Generalized Expectation Criteria for Semi Supervised Learning of Conditional Random Fields, Mann and McCallum, ACL 2008. |
double[][] |
trainWeights(GeneralDataset<L,F> dataset)
|
double[][] |
trainWeights(GeneralDataset<L,F> dataset,
double[] initial)
|
double[][] |
trainWeights(GeneralDataset<L,F> dataset,
double[] initial,
boolean bypassTuneSigma)
|
double[][] |
trainWeightsSemiSup(GeneralDataset<L,F> data,
GeneralDataset<L,F> biasedData,
double[][] confusionMatrix,
double[] initial)
|
void |
useConjugateGradientAscent()
Sets the minimizer to CGMinimizer . |
void |
useConjugateGradientAscent(boolean verbose)
Sets the minimizer to CGMinimizer , with the passed verbose flag. |
void |
useHybridMinimizer()
|
void |
useHybridMinimizer(double initialSMDGain,
int stochasticBatchSize,
StochasticCalculateMethods stochasticMethod,
int cutoffIteration)
|
void |
useHybridMinimizerWithInPlaceSGD(int SGDPasses,
int tuneSampleSize,
double sigma)
|
void |
useInPlaceStochasticGradientDescent()
|
void |
useInPlaceStochasticGradientDescent(int SGDPasses,
int tuneSampleSize,
double sigma)
|
void |
useQuasiNewton()
Sets the minimizer to QuasiNewton. |
void |
useQuasiNewton(boolean useRobust)
|
void |
useStochasticGradientDescent()
|
void |
useStochasticGradientDescent(double gainSGD,
int stochasticBatchSize)
|
void |
useStochasticGradientDescentToQuasiNewton(double SGDGain,
int batchSize,
int sgdPasses,
int qnPasses,
int hessSamples,
int QNMem,
boolean outputToFile)
|
void |
useStochasticMetaDescent()
|
void |
useStochasticMetaDescent(double initialSMDGain,
int stochasticBatchSize,
StochasticCalculateMethods stochasticMethod,
int passes)
|
void |
useStochasticQN(double initialSMDGain,
int stochasticBatchSize)
|
Methods inherited from class edu.stanford.nlp.classify.AbstractLinearClassifierFactory |
---|
trainClassifier, trainClassifier |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected static double[] sigmasToTry
Constructor Detail |
---|
public LinearClassifierFactory()
public LinearClassifierFactory(Minimizer<DiffFunction> min)
public LinearClassifierFactory(boolean useSum)
public LinearClassifierFactory(double tol)
public LinearClassifierFactory(Minimizer<DiffFunction> min, boolean useSum)
public LinearClassifierFactory(Minimizer<DiffFunction> min, double tol, boolean useSum)
public LinearClassifierFactory(double tol, boolean useSum, double sigma)
public LinearClassifierFactory(Minimizer<DiffFunction> min, double tol, boolean useSum, double sigma)
public LinearClassifierFactory(Minimizer<DiffFunction> min, double tol, boolean useSum, int prior, double sigma)
public LinearClassifierFactory(double tol, boolean useSum, int prior, double sigma, double epsilon)
public LinearClassifierFactory(double tol, boolean useSum, int prior, double sigma, double epsilon, int mem)
public LinearClassifierFactory(Minimizer<DiffFunction> min, double tol, boolean useSum, int prior, double sigma, double epsilon)
min
- The method to be used for optimization (minimization) (default: QNMinimizer
)tol
- The convergence threshold for the minimization (default: 1e-4)useSum
- Asks to the optimizer to minimize the sum of the
likelihoods of individual data items rather than their product (default: false)
NOTE: this is currently ignored!!!prior
- What kind of prior to use, as an enum constant from class
LogPriorsigma
- The strength of the prior (smaller is stronger for most
standard priors) (default: 1.0)epsilon
- A second parameter to the prior (currently only used
by the Huber prior)public LinearClassifierFactory(Minimizer<DiffFunction> min, double tol, boolean useSum, LogPrior logPrior)
Method Detail |
---|
public double[][] adaptWeights(double[][] origWeights, GeneralDataset<L,F> adaptDataset)
origWeights
- the original weights trained from the training dataadaptDataset
- the Dataset used to adapt the trained weights
public double[][] trainWeights(GeneralDataset<L,F> dataset)
trainWeights
in class AbstractLinearClassifierFactory<L,F>
public double[][] trainWeights(GeneralDataset<L,F> dataset, double[] initial)
public double[][] trainWeights(GeneralDataset<L,F> dataset, double[] initial, boolean bypassTuneSigma)
public Classifier<L,F> trainClassifierSemiSup(GeneralDataset<L,F> data, GeneralDataset<L,F> biasedData, double[][] confusionMatrix, double[] initial)
public double[][] trainWeightsSemiSup(GeneralDataset<L,F> data, GeneralDataset<L,F> biasedData, double[][] confusionMatrix, double[] initial)
public LinearClassifier<L,F> trainSemiSupGE(GeneralDataset<L,F> labeledDataset, List<? extends Datum<L,F>> unlabeledDataList, List<F> GEFeatures, double convexComboCoeff)
public LinearClassifier<L,F> trainSemiSupGE(GeneralDataset<L,F> labeledDataset, List<? extends Datum<L,F>> unlabeledDataList)
public LinearClassifier<L,F> trainSemiSupGE(GeneralDataset<L,F> labeledDataset, List<? extends Datum<L,F>> unlabeledDataList, double convexComboCoeff)
public LinearClassifier<L,F> trainClassifierV(GeneralDataset<L,F> train, GeneralDataset<L,F> validation, double min, double max, boolean accuracy)
public LinearClassifier<L,F> trainClassifierV(GeneralDataset<L,F> train, double min, double max, boolean accuracy)
train
- The data to train (and validate) on.
public void setTol(double tol)
public void setPrior(LogPrior logPrior)
logPrior
- One of the priors defined in
LogConditionalObjectiveFunction
.
LogPrior.QUADRATIC
is the default.public void setVerbose(boolean verbose)
CGMinimizer
.
Only used with conjugate-gradient minimization.
false
is the default.
public void setMinimizer(Minimizer<DiffFunction> min)
QNMinimizer
is the default.
public void setEpsilon(double eps)
LogConditionalObjectiveFunction
.
public void setSigma(double sigma)
public double getSigma()
public void useQuasiNewton()
QNMinimizer
is the default.
public void useQuasiNewton(boolean useRobust)
public void useStochasticQN(double initialSMDGain, int stochasticBatchSize)
public void useStochasticMetaDescent()
public void useStochasticMetaDescent(double initialSMDGain, int stochasticBatchSize, StochasticCalculateMethods stochasticMethod, int passes)
public void useStochasticGradientDescent()
public void useStochasticGradientDescent(double gainSGD, int stochasticBatchSize)
public void useInPlaceStochasticGradientDescent()
public void useInPlaceStochasticGradientDescent(int SGDPasses, int tuneSampleSize, double sigma)
public void useHybridMinimizerWithInPlaceSGD(int SGDPasses, int tuneSampleSize, double sigma)
public void useStochasticGradientDescentToQuasiNewton(double SGDGain, int batchSize, int sgdPasses, int qnPasses, int hessSamples, int QNMem, boolean outputToFile)
public void useHybridMinimizer()
public void useHybridMinimizer(double initialSMDGain, int stochasticBatchSize, StochasticCalculateMethods stochasticMethod, int cutoffIteration)
public void setMem(int mem)
QNMinimizer
.
Only used with quasi-newton minimization. 15 is the default.
mem
- Number of previous function/derivative evaluations to store
to estimate second derivative. Storing more previous evaluations
improves training convergence speed. This number can be very
small, if memory conservation is the priority. For large
optimization systems (of 100,000-1,000,000 dimensions), setting this
to 15 produces quite good results, but setting it to 50 can
decrease the iteration count by about 20% over a value of 15.public void useConjugateGradientAscent(boolean verbose)
CGMinimizer
, with the passed verbose
flag.
public void useConjugateGradientAscent()
CGMinimizer
.
public void setUseSum(boolean useSum)
useSum
flag: when turned on,
the Summed Conditional Objective Function is used. Otherwise, the
LogConditionalObjectiveFunction is used. The default is false.
public void setTuneSigmaHeldOut()
tuneSigmaHeldOut
flag: when turned on,
the sigma is tuned by means of held-out (70%-30%). Otherwise no tuning on sigma is done.
The default is false.
public void setTuneSigmaCV(int folds)
tuneSigmaCV
flag: when turned on,
the sigma is tuned by cross-validation. The number of folds is the parameter.
If there is less data than the number of folds, leave-one-out is used.
The default is false.
public void resetWeight()
restWeight
flag. This flag makes sense only if sigma is tuned:
when turned on, the weights outputed by the tuneSigma method will be reset to zero when training the
classifier.
The default is false.
public void crossValidateSetSigma(GeneralDataset<L,F> dataset)
crossValidateSetSigma(GeneralDataset, int)
with 5-fold cross-validation.
dataset
- the data set to optimize sigma on.public void crossValidateSetSigma(GeneralDataset<L,F> dataset, int kfold)
crossValidateSetSigma(GeneralDataset, int, Scorer, LineSearcher)
with
multi-class log-likelihood scoring (see MultiClassAccuracyStats
) and golden-section line search
(see GoldenSectionLineSearch
).
dataset
- the data set to optimize sigma on.public void crossValidateSetSigma(GeneralDataset<L,F> dataset, int kfold, Scorer<L> scorer)
public void crossValidateSetSigma(GeneralDataset<L,F> dataset, int kfold, LineSearcher minimizer)
public void crossValidateSetSigma(GeneralDataset<L,F> dataset, int kfold, Scorer<L> scorer, LineSearcher minimizer)
scorer
. Search for an optimal value
is carried out by minimizer
dataset
- the data set to optimize sigma on.public void setHeldOutSearcher(LineSearcher heldOutSearcher)
LineSearcher
to be used in heldOutSetSigma(GeneralDataset, GeneralDataset)
.
public double[] heldOutSetSigma(GeneralDataset<L,F> train)
public double[] heldOutSetSigma(GeneralDataset<L,F> train, Scorer<L> scorer)
public double[] heldOutSetSigma(GeneralDataset<L,F> train, GeneralDataset<L,F> dev)
public double[] heldOutSetSigma(GeneralDataset<L,F> train, GeneralDataset<L,F> dev, Scorer<L> scorer)
public double[] heldOutSetSigma(GeneralDataset<L,F> train, GeneralDataset<L,F> dev, LineSearcher minimizer)
public double[] heldOutSetSigma(GeneralDataset<L,F> trainSet, GeneralDataset<L,F> devSet, Scorer<L> scorer, LineSearcher minimizer)
scorer
. Search for an optimal value
is carried out by minimizer
dataset the data set to optimize sigma on.
kfold
public void setRetrainFromScratchAfterSigmaTuning(boolean retrainFromScratchAfterSigmaTuning)
public Classifier<L,F> trainClassifier(Iterable<Datum<L,F>> dataIterable)
public Classifier<L,F> trainClassifier(GeneralDataset<L,F> dataset, float[] dataWeights, LogPrior prior)
public LinearClassifier<L,F> trainClassifier(GeneralDataset<L,F> dataset)
AbstractLinearClassifierFactory
Classifier
on a Dataset
.
trainClassifier
in interface ClassifierFactory<L,F,Classifier<L,F>>
trainClassifier
in class AbstractLinearClassifierFactory<L,F>
Classifier
trained on the data.public LinearClassifier<L,F> trainClassifier(GeneralDataset<L,F> dataset, double[] initial)
public Classifier<String,String> loadFromFilename(String file)
@Deprecated public LinearClassifier<L,F> trainClassifier(List<RVFDatum<L,F>> examples)
trainClassifier
in interface ClassifierFactory<L,F,Classifier<L,F>>
trainClassifier
in class AbstractLinearClassifierFactory<L,F>
public boolean setEvaluators(int iters, Evaluator[] evaluators)
public LinearClassifierFactory.LinearClassifierCreator<L,F> getClassifierCreator(GeneralDataset<L,F> dataset)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |