edu.stanford.nlp.sequences
Class PlainTextDocumentReaderAndWriter<IN extends CoreMap>

java.lang.Object
  extended by edu.stanford.nlp.sequences.PlainTextDocumentReaderAndWriter<IN>
All Implemented Interfaces:
IteratorFromReaderFactory<List<IN>>, DocumentReaderAndWriter<IN>, Serializable

public class PlainTextDocumentReaderAndWriter<IN extends CoreMap>
extends Object
implements DocumentReaderAndWriter<IN>

This class provides methods for reading plain text documents and writing out those documents once classified in several different formats.

Implementation note: see itest/src/edu/stanford/nlp/ie/crf/CRFClassifierITest.java for examples and test cases for the output options. It can be over anything that extends CoreMap, and the default is CoreLabel

Author:
Jenny Finkel, Christopher Manning (new output options organization), Sonal Gupta (made the class generic)
See Also:
Serialized Form

Nested Class Summary
static class PlainTextDocumentReaderAndWriter.OutputStyle
           
 
Constructor Summary
PlainTextDocumentReaderAndWriter()
          Construct a PlainTextDocumentReaderAndWriter.
 
Method Summary
 String getAnswers(List<IN> l, PlainTextDocumentReaderAndWriter.OutputStyle outputStyle, boolean preserveSpacing)
           
 Iterator<List<IN>> getIterator(Reader r)
           
 void init(SeqClassifierFlags flags)
          Will be called immediately after construction.
 void init(SeqClassifierFlags flags, TokenizerFactory<IN> tokenizerFactory)
           
 void init(SeqClassifierFlags flags, TokenizerFactory<IN> tokenizerFactory, CoreTokenFactory<IN> tokenFactory)
           
 void printAnswers(List<IN> list, PrintWriter out)
          Print the classifications for the document to the given Writer.
 void printAnswers(List<IN> l, PrintWriter out, PlainTextDocumentReaderAndWriter.OutputStyle outputStyle, boolean preserveSpacing)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PlainTextDocumentReaderAndWriter

public PlainTextDocumentReaderAndWriter()
Construct a PlainTextDocumentReaderAndWriter. You should call init() after using the constructor.

Method Detail

init

public void init(SeqClassifierFlags flags)
Description copied from interface: DocumentReaderAndWriter
Will be called immediately after construction. It's easier having an init() method because DocumentReaderAndWriter objects are usually created using reflection.

Specified by:
init in interface DocumentReaderAndWriter<IN extends CoreMap>
Parameters:
flags - Flags specifying behavior

init

public void init(SeqClassifierFlags flags,
                 TokenizerFactory<IN> tokenizerFactory)

init

public void init(SeqClassifierFlags flags,
                 TokenizerFactory<IN> tokenizerFactory,
                 CoreTokenFactory<IN> tokenFactory)

getIterator

public Iterator<List<IN>> getIterator(Reader r)
Specified by:
getIterator in interface IteratorFromReaderFactory<List<IN extends CoreMap>>

printAnswers

public void printAnswers(List<IN> list,
                         PrintWriter out)
Print the classifications for the document to the given Writer. This method now checks the outputFormat property, and can print in slashTags, inlineXML, or xml (stand-Off XML). For both the XML output formats, it preserves spacing, while for the slashTags format, it prints tokenized (since preserveSpacing output is somewhat dysfunctional with the slashTags format).

Specified by:
printAnswers in interface DocumentReaderAndWriter<IN extends CoreMap>
Parameters:
list - List of tokens with classifier answers
out - Where to print the output to

getAnswers

public String getAnswers(List<IN> l,
                         PlainTextDocumentReaderAndWriter.OutputStyle outputStyle,
                         boolean preserveSpacing)

printAnswers

public void printAnswers(List<IN> l,
                         PrintWriter out,
                         PlainTextDocumentReaderAndWriter.OutputStyle outputStyle,
                         boolean preserveSpacing)


Stanford NLP Group