edu.stanford.nlp.classify
Class SVMLightClassifierFactory<L,F>

java.lang.Object
  extended by edu.stanford.nlp.classify.SVMLightClassifierFactory<L,F>
All Implemented Interfaces:
ClassifierFactory<L,F,SVMLightClassifier<L,F>>, Serializable

public class SVMLightClassifierFactory<L,F>
extends Object
implements ClassifierFactory<L,F,SVMLightClassifier<L,F>>

This class is meant for training SVMs (SVMLightClassifiers). It actually calls SVM Light. or SVM Struct for multiclass SVMs, on the command line, reads in the produced model file and creates a Linear Classifier. A Platt model is also trained (unless otherwise specified) on top of the SVM so that probabilities can be produced.

Author:
Jenny Finkel, Aria Haghighi, Sarah Spikes (sdspikes@cs.stanford.edu) (templatization)
See Also:
Serialized Form

Field Summary
protected  File alphaFile
           
protected  double C
          C can be tuned using held-out set or cross-validation For binary SVM, if C=0, svmlight uses default of 1/(avg x*x)
protected  boolean verbose
           
 
Constructor Summary
SVMLightClassifierFactory()
           
SVMLightClassifierFactory(String svmLightLearn, String svmStructLearn)
           
 
Method Summary
 void crossValidateSetC(GeneralDataset<L,F> dataset, int numFolds, Scorer<L> scorer, LineSearcher minimizer)
          This method will cross validate on the given data and number of folds to find the optimal C.
 double getC()
          Get the C parameter (for the slack variables) for training the SVM.
 boolean getDeleteTempFilesOnExitFlag()
           
 int getFolds()
           
 double getHeldOutPercent()
           
 Scorer getScorer()
           
 int getSvmLightVerbosity()
           
 boolean getTuneCV()
           
 boolean getTuneHeldOut()
           
 LineSearcher getTuneMinimizer()
           
 boolean getUseSigma()
          Get whether or not to train an overlying platt (sigmoid) model for producing meaningful probabilities.
 void heldOutSetC(GeneralDataset<L,F> train, double percentHeldOut, Scorer<L> scorer, LineSearcher minimizer)
           
 void heldOutSetC(GeneralDataset<L,F> trainSet, GeneralDataset<L,F> devSet, Scorer<L> scorer, LineSearcher minimizer)
          This method will cross validate on the given data and number of folds to find the optimal C.
 void setC(double C)
          Set the C parameter (for the slack variables) for training the SVM.
 void setDeleteTempFilesOnExitFlag(boolean deleteTempFilesOnExit)
           
 void setFolds(int folds)
           
 void setHeldOutPercent(double heldOutPercent)
           
 void setScorer(Scorer<L> scorer)
           
 void setSvmLightVerbosity(int svmLightVerbosity)
           
 void setTuneCV(boolean tuneCV)
           
 void setTuneHeldOut(boolean tuneHeldOut)
           
 void setTuneMinimizer(LineSearcher minimizer)
           
 void setUseSigmoid(boolean useSigmoid)
          Specify whether or not to train an overlying platt (sigmoid) model for producing meaningful probabilities.
 SVMLightClassifier<L,F> trainClassifier(GeneralDataset<L,F> dataset)
           
 SVMLightClassifier<L,F> trainClassifier(List<RVFDatum<L,F>> examples)
          Deprecated. 
 SVMLightClassifier<L,F> trainClassifierBasic(GeneralDataset<L,F> dataset)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

C

protected double C
C can be tuned using held-out set or cross-validation For binary SVM, if C=0, svmlight uses default of 1/(avg x*x)


verbose

protected boolean verbose

alphaFile

protected File alphaFile
Constructor Detail

SVMLightClassifierFactory

public SVMLightClassifierFactory(String svmLightLearn,
                                 String svmStructLearn)
Parameters:
svmLightLearn - is the fullPathname of the training program of svmLight with default value "/u/nlp/packages/svm_light/svm_learn"
svmStructLearn - is the fullPathname of the training program of svmMultiClass with default value "/u/nlp/packages/svm_multiclass/svm_multiclass_learn"

SVMLightClassifierFactory

public SVMLightClassifierFactory()
Method Detail

setC

public void setC(double C)
Set the C parameter (for the slack variables) for training the SVM.


getC

public double getC()
Get the C parameter (for the slack variables) for training the SVM.


setUseSigmoid

public void setUseSigmoid(boolean useSigmoid)
Specify whether or not to train an overlying platt (sigmoid) model for producing meaningful probabilities.


getUseSigma

public boolean getUseSigma()
Get whether or not to train an overlying platt (sigmoid) model for producing meaningful probabilities.


getDeleteTempFilesOnExitFlag

public boolean getDeleteTempFilesOnExitFlag()

setDeleteTempFilesOnExitFlag

public void setDeleteTempFilesOnExitFlag(boolean deleteTempFilesOnExit)

crossValidateSetC

public void crossValidateSetC(GeneralDataset<L,F> dataset,
                              int numFolds,
                              Scorer<L> scorer,
                              LineSearcher minimizer)
This method will cross validate on the given data and number of folds to find the optimal C. The scorer is how you determine what to optimize for (F-score, accuracy, etc). The C is then saved, so that if you train a classifier after calling this method, that C will be used.


heldOutSetC

public void heldOutSetC(GeneralDataset<L,F> train,
                        double percentHeldOut,
                        Scorer<L> scorer,
                        LineSearcher minimizer)

heldOutSetC

public void heldOutSetC(GeneralDataset<L,F> trainSet,
                        GeneralDataset<L,F> devSet,
                        Scorer<L> scorer,
                        LineSearcher minimizer)
This method will cross validate on the given data and number of folds to find the optimal C. The scorer is how you determine what to optimize for (F-score, accuracy, etc). The C is then saved, so that if you train a classifier after calling this method, that C will be used.


trainClassifier

@Deprecated
public SVMLightClassifier<L,F> trainClassifier(List<RVFDatum<L,F>> examples)
Deprecated. 

Specified by:
trainClassifier in interface ClassifierFactory<L,F,SVMLightClassifier<L,F>>

getHeldOutPercent

public double getHeldOutPercent()

setHeldOutPercent

public void setHeldOutPercent(double heldOutPercent)

getFolds

public int getFolds()

setFolds

public void setFolds(int folds)

getTuneMinimizer

public LineSearcher getTuneMinimizer()

setTuneMinimizer

public void setTuneMinimizer(LineSearcher minimizer)

getScorer

public Scorer getScorer()

setScorer

public void setScorer(Scorer<L> scorer)

getTuneCV

public boolean getTuneCV()

setTuneCV

public void setTuneCV(boolean tuneCV)

getTuneHeldOut

public boolean getTuneHeldOut()

setTuneHeldOut

public void setTuneHeldOut(boolean tuneHeldOut)

getSvmLightVerbosity

public int getSvmLightVerbosity()

setSvmLightVerbosity

public void setSvmLightVerbosity(int svmLightVerbosity)

trainClassifier

public SVMLightClassifier<L,F> trainClassifier(GeneralDataset<L,F> dataset)
Specified by:
trainClassifier in interface ClassifierFactory<L,F,SVMLightClassifier<L,F>>

trainClassifierBasic

public SVMLightClassifier<L,F> trainClassifierBasic(GeneralDataset<L,F> dataset)


Stanford NLP Group