edu.stanford.nlp.process
Class PTBEscapingProcessor<IN extends HasWord,L,F>

java.lang.Object
  extended by edu.stanford.nlp.process.AbstractListProcessor<IN,HasWord,L,F>
      extended by edu.stanford.nlp.process.PTBEscapingProcessor<IN,L,F>
Type Parameters:
L - The type of the labels
F - The type of the features
All Implemented Interfaces:
DocumentProcessor<IN,HasWord,L,F>, ListProcessor<IN,HasWord>, Function<List<IN>,List<HasWord>>

public class PTBEscapingProcessor<IN extends HasWord,L,F>
extends AbstractListProcessor<IN,HasWord,L,F>
implements Function<List<IN>,List<HasWord>>

Produces a new Document of Words in which special characters of the PTB have been properly escaped.

Author:
Teg Grenager (grenager@stanford.edu), Sarah Spikes (sdspikes@cs.stanford.edu) (Templatization)

Field Summary
protected static char[] defaultOldChars
           
protected  boolean fixQuotes
           
protected static String[] newStrings
           
protected  char[] oldChars
           
protected static String[] oldStrings
           
protected  Map<String,String> stringSubs
           
 
Constructor Summary
PTBEscapingProcessor()
           
PTBEscapingProcessor(Map<String,String> stringSubs, char[] oldChars, boolean fixQuotes)
           
 
Method Summary
 List<HasWord> apply(List<IN> hasWordsList)
          Unescape a List of HasWords.
static void main(String[] args)
          This will do the escaping on an input file.
protected static Map<String,String> makeStringMap()
           
 List<HasWord> process(List<? extends IN> input)
          Take a List (including a Sentence) of input, and return a List that has been processed in some way.
static String unprocess(String s)
           
 
Methods inherited from class edu.stanford.nlp.process.AbstractListProcessor
processDocument, processLists
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

stringSubs

protected Map<String,String> stringSubs

oldChars

protected char[] oldChars

oldStrings

protected static final String[] oldStrings

newStrings

protected static final String[] newStrings

defaultOldChars

protected static final char[] defaultOldChars

fixQuotes

protected boolean fixQuotes
Constructor Detail

PTBEscapingProcessor

public PTBEscapingProcessor()

PTBEscapingProcessor

public PTBEscapingProcessor(Map<String,String> stringSubs,
                            char[] oldChars,
                            boolean fixQuotes)
Method Detail

makeStringMap

protected static Map<String,String> makeStringMap()

apply

public List<HasWord> apply(List<IN> hasWordsList)
Unescape a List of HasWords. Implements the Function<List<HasWord>, List<HasWord>> interface.

Specified by:
apply in interface Function<List<IN extends HasWord>,List<HasWord>>
Parameters:
hasWordsList - The function's argument
Returns:
The function's evaluated value

unprocess

public static String unprocess(String s)

process

public List<HasWord> process(List<? extends IN> input)
Description copied from interface: ListProcessor
Take a List (including a Sentence) of input, and return a List that has been processed in some way.

Specified by:
process in interface ListProcessor<IN extends HasWord,HasWord>
Parameters:
input - must be a List of objects of type HasWord

main

public static void main(String[] args)
This will do the escaping on an input file. Input file must already be tokenized, with tokens separated by whitespace.
Usage: java edu.stanford.nlp.process.PTBEscapingProcessor fileOrUrl

Parameters:
args - Command line argument: a file or URL


Stanford NLP Group