edu.stanford.nlp.ie.crf
Class CRFFeatureExporter<IN extends CoreMap>

java.lang.Object
  extended by edu.stanford.nlp.ie.crf.CRFFeatureExporter<IN>

public class CRFFeatureExporter<IN extends CoreMap>
extends Object

Exports CRF features for use with other programs - Usage: CRFFeatureExporter -prop -trainFile -exportFeatures - Output file is automatically gzipped/b2zipped if ending in gz/bz2 - bzip2 requires that bzip2 is availaible via command line - Currently exports features in a format that can be read by a modified crfsgd (crfsgd assumes features are gzipped) TODO: Support other formats (like crfsuite)

Author:
Angel Chang

Constructor Summary
CRFFeatureExporter(CRFClassifier<IN> classifier)
           
 
Method Summary
static void main(String[] args)
           
 void printFeatures(String exportFile, Collection<List<IN>> documents)
          Output features from a collection of documents to a file Format is with one line per token using the following format word label feat1 feat2 ...
 void printFeatures(String exportFile, int[][][][] docsData, int[][] labels)
          Output features that have already been converted into features (using documentToDataAndLabels) in format suitable for CRFSuite Format is with one line per token using the following format label feat1 feat2 ...
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CRFFeatureExporter

public CRFFeatureExporter(CRFClassifier<IN> classifier)
Method Detail

printFeatures

public void printFeatures(String exportFile,
                          int[][][][] docsData,
                          int[][] labels)
Output features that have already been converted into features (using documentToDataAndLabels) in format suitable for CRFSuite Format is with one line per token using the following format label feat1 feat2 ... (where each space is actually a tab) Each document is separated by an empty line

Parameters:
exportFile - file to export the features to
docsData - array of document features
labels - correct labels indexed by document, and position within document

printFeatures

public void printFeatures(String exportFile,
                          Collection<List<IN>> documents)
Output features from a collection of documents to a file Format is with one line per token using the following format word label feat1 feat2 ... (where each space is actually a tab) Each document is separated by an empty line This format is suitable for modified crfsgd

Parameters:
exportFile - file to export the features to
documents - input collection of documents

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception


Stanford NLP Group