edu.stanford.nlp.optimization
Class AbstractStochasticCachingDiffFunction

java.lang.Object
  extended by edu.stanford.nlp.optimization.AbstractCachingDiffFunction
      extended by edu.stanford.nlp.optimization.AbstractStochasticCachingDiffFunction
All Implemented Interfaces:
DiffFunction, Function, HasInitial
Direct Known Subclasses:
AbstractStochasticCachingDiffUpdateFunction

public abstract class AbstractStochasticCachingDiffFunction
extends AbstractCachingDiffFunction

Author:
Alex Kleeman

Nested Class Summary
static class AbstractStochasticCachingDiffFunction.SamplingMethod
           
 
Field Summary
protected  List<Integer> allIndices
           
protected  int curElement
           
protected  double finiteDifferenceStepSize
          finiteDifferenceStepSize - this is the fixed step size for the finite difference approximation.
protected  double[] gradPerturbed
           
 boolean hasNewVals
           
protected  double[] HdotV
           
protected  int[] lastBatch
           
protected  int lastBatchSize
           
protected  int lastElement
           
protected  double[] lastVBatch
           
protected  double[] lastXBatch
           
 StochasticCalculateMethods method
           
protected  Random randGenerator
           
 boolean recalculatePrevBatch
           
 boolean returnPreviousValues
           
 AbstractStochasticCachingDiffFunction.SamplingMethod sampleMethod
           
protected  boolean scaleUp
           
protected  int[] thisBatch
           
protected  double[] xPerturbed
           
 
Fields inherited from class edu.stanford.nlp.optimization.AbstractCachingDiffFunction
derivative, value
 
Constructor Summary
AbstractStochasticCachingDiffFunction()
           
 
Method Summary
abstract  void calculateStochastic(double[] x, double[] v, int[] batch)
          calculateStochastic needs to calculate a stochastic approximation to the derivative and value of of a function for a given batch of the data.
protected  void clearCache()
          Clears the cache in a way that doesn't require reallocation :-)
abstract  int dataDimension()
          Data dimension must return the size of the data used by the function.
 void decrementBatch(int batchSize)
          decrementBatch - This decrements the curElement variable by the amount batchSize.
 double[] derivativeAt(double[] x, double[] v, int batchSize)
           
 double[] derivativeAt(double[] x, int batchSize)
           
protected  void getBatch(int batchSize)
          getBatch is used to generate the next sequence of indices to be passed to the actual function.
 double[] HdotVAt(double[] x, double[] v)
           
 double[] HdotVAt(double[] x, double[] v, double[] curDerivative, int batchSize)
           
 double[] HdotVAt(double[] x, double[] v, int batchSize)
          HdotVAt will return the hessian vector product H.v at the point x for a batchSize subset of the data There are several ways to perform this calculation, as of now Finite Difference, and Algorithmic Differentiation are the methods that have been used.
 void incrementBatch(int batchSize)
          incrementBatch will shift the curElement variable to mark the next batch.
 void incrementRandom(int numTimes)
           
 double[] initial()
          Returns the intitial point in the domain (but not necessarily a feasible one).
 double[] lastDerivative()
           
 double lastValue()
           
 void scaleUp(boolean toScaleUp)
           
 void setValue(double v)
           
 double valueAt(double[] x, double[] v, int batchSize)
          This function will return the stochastic approximation at the point x.
 double valueAt(double[] x, int batchSize)
          valueAt(x,batchSize) derivativeAt(x,batchSize) invokes the calculateStochastic function to get the current value at x for the next batchSize data points.
 
Methods inherited from class edu.stanford.nlp.optimization.AbstractCachingDiffFunction
calculate, copy, derivativeAt, domainDimension, valueAt
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

hasNewVals

public boolean hasNewVals

recalculatePrevBatch

public boolean recalculatePrevBatch

returnPreviousValues

public boolean returnPreviousValues

lastBatchSize

protected int lastBatchSize

lastBatch

protected int[] lastBatch

thisBatch

protected int[] thisBatch

lastXBatch

protected double[] lastXBatch

lastVBatch

protected double[] lastVBatch

lastElement

protected int lastElement

HdotV

protected double[] HdotV

gradPerturbed

protected double[] gradPerturbed

xPerturbed

protected double[] xPerturbed

curElement

protected int curElement

allIndices

protected List<Integer> allIndices

randGenerator

protected Random randGenerator

scaleUp

protected boolean scaleUp

method

public StochasticCalculateMethods method

sampleMethod

public AbstractStochasticCachingDiffFunction.SamplingMethod sampleMethod

finiteDifferenceStepSize

protected double finiteDifferenceStepSize
finiteDifferenceStepSize - this is the fixed step size for the finite difference approximation. a few tests were run using the SMD minimizer, and step sizes of 1e-4 to 1e-3 seemed to be ideal. (akleeman)

Constructor Detail

AbstractStochasticCachingDiffFunction

public AbstractStochasticCachingDiffFunction()
Method Detail

incrementRandom

public void incrementRandom(int numTimes)

scaleUp

public void scaleUp(boolean toScaleUp)

calculateStochastic

public abstract void calculateStochastic(double[] x,
                                         double[] v,
                                         int[] batch)
calculateStochastic needs to calculate a stochastic approximation to the derivative and value of of a function for a given batch of the data. The approximation to the derivative must be stored in the array derivative , the approximation to the value in value and the approximation to the Hessian vector product H.v in the array HdotV . Note that the hessian vector product is used primarily with the Stochastic Meta Descent optimization routine SMDMinimizer . Important: The stochastic approximation must be such that the sum of all stochastic calculations over each of the batches in the data must equal the full calculation. i.e. for a data set of size 100 the sum of the gradients for batches 1-10 , 11-20 , 21-30 .... 91-100 must be the same as the gradient for the full calculation (at the very least in expectation). Be sure to take into account the priors.

Parameters:
x - - value to evaluate at
v - - the vector for the Hessian vector product H.v
batch - - an array containing the indices of the data to use in the calculation, this array is being calculated internal to the abstract, and only needs to be handled not generated by the implemenation.

dataDimension

public abstract int dataDimension()
Data dimension must return the size of the data used by the function.


clearCache

protected void clearCache()
Clears the cache in a way that doesn't require reallocation :-)

Overrides:
clearCache in class AbstractCachingDiffFunction

initial

public double[] initial()
Description copied from interface: HasInitial
Returns the intitial point in the domain (but not necessarily a feasible one).

Specified by:
initial in interface HasInitial
Overrides:
initial in class AbstractCachingDiffFunction
Returns:
a domain point

decrementBatch

public void decrementBatch(int batchSize)
decrementBatch - This decrements the curElement variable by the amount batchSize. by decrementing the batch and then calling calculate you can recalculate over the previous batch.


incrementBatch

public void incrementBatch(int batchSize)
incrementBatch will shift the curElement variable to mark the next batch. It also resets the flags: hasNewElements recalculatePrevBatch returnPreviousValues


getBatch

protected void getBatch(int batchSize)
getBatch is used to generate the next sequence of indices to be passed to the actual function. Depending on the current sample method this is done by: Ordered - simply generates the indices 1,2,3,4,.... RandomWithReplacement - Samples uniformly from the set of possible indices RandomWithoutReplacement - Samples from the set of possible indices removing each used index, then restarting after each pass


valueAt

public double valueAt(double[] x,
                      int batchSize)
valueAt(x,batchSize) derivativeAt(x,batchSize) invokes the calculateStochastic function to get the current value at x for the next batchSize data points. Will not return a HdotV since it passes v = null;


derivativeAt

public double[] derivativeAt(double[] x,
                             int batchSize)

valueAt

public double valueAt(double[] x,
                      double[] v,
                      int batchSize)
This function will return the stochastic approximation at the point x. the vector v is the vector to be used in the vector product H.v. passing v = null will simply revert to a calculation without the hessian vector product.


derivativeAt

public double[] derivativeAt(double[] x,
                             double[] v,
                             int batchSize)

HdotVAt

public double[] HdotVAt(double[] x,
                        double[] v,
                        int batchSize)
HdotVAt will return the hessian vector product H.v at the point x for a batchSize subset of the data There are several ways to perform this calculation, as of now Finite Difference, and Algorithmic Differentiation are the methods that have been used. To use this function calculateStochastic must also fill the array Hv with the hessian vector product. Alternative: use the function getHdotVFiniteDifference which will simply make two calls to the function and come up with an approximation to this value.


HdotVAt

public double[] HdotVAt(double[] x,
                        double[] v,
                        double[] curDerivative,
                        int batchSize)

HdotVAt

public double[] HdotVAt(double[] x,
                        double[] v)

lastDerivative

public double[] lastDerivative()

lastValue

public double lastValue()
Overrides:
lastValue in class AbstractCachingDiffFunction

setValue

public void setValue(double v)
Overrides:
setValue in class AbstractCachingDiffFunction


Stanford NLP Group