edu.stanford.nlp.international.process
Class TreebankPreprocessor

java.lang.Object
  extended by edu.stanford.nlp.international.process.TreebankPreprocessor

public final class TreebankPreprocessor
extends Object

A data preparation pipeline for treebanks

A simple framework for preparing various kinds of treebank data. The original goal was to prepare the Penn Arabic Treebank (PATB) trees for parsing. This pipeline arose from the need to prepare various data sets in a uniform manner for the execution of experiments that require multiple tools. The design objectives are:

These objectives are realized through three features:

The process for preparing arbitrary data set X is as follows:

  1. Add parameters to ConfigParser as necessary
  2. Implement the Dataset interface for the new data set (or use one of the existing classes)
  3. Implement Mapper classes as needed
  4. Specify the data set parameters in a plain text file
  5. Run TreebankPreprocessor using the plain text file as the argument

Author:
Spence Green

Field Summary
static Map<String,Integer> optionArgDefs
           
 
Method Summary
static void main(String[] args)
          Execute with no arguments for usage.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

optionArgDefs

public static final Map<String,Integer> optionArgDefs
Method Detail

main

public static void main(String[] args)
Execute with no arguments for usage.



Stanford NLP Group