edu.stanford.nlp.sequences
Class PlainTextDocumentReaderAndWriter

java.lang.Object
  extended by edu.stanford.nlp.sequences.PlainTextDocumentReaderAndWriter
All Implemented Interfaces:
IteratorFromReaderFactory<List<CoreLabel>>, DocumentReaderAndWriter, Serializable

public class PlainTextDocumentReaderAndWriter
extends Object
implements DocumentReaderAndWriter

This provides methods for reading plain text documents and writing out those documents once classified in several different formats.

Author:
Jenny Finkel, Christopher Manning (new output options organization)
See Also:
Serialized Form

Field Summary
static int OUTPUT_STYLE_INLINE_XML
           
static int OUTPUT_STYLE_SLASH_TAGS
           
static int OUTPUT_STYLE_XML
           
 
Constructor Summary
PlainTextDocumentReaderAndWriter()
          Construct a PlainTextDocumentReaderAndWriter.
 
Method Summary
static int asIntOutputFormat(String outputFormat)
           
static String getAnswers(List<CoreLabel> l)
          Deprecated. This has been left in since it is still called in the version of the tagger that we currently distribute, but it will be removed.
 String getAnswers(List<CoreLabel> l, int outputStyle, boolean preserveSpacing)
           
 Iterator<List<CoreLabel>> getIterator(Reader r)
           
 void init(SeqClassifierFlags flags)
          Will be called immediately after construction.
 void printAnswers(List<CoreLabel> list, PrintWriter out)
          Print the classifications for the document to the given Writer.
 void printAnswers(List<CoreLabel> l, PrintWriter out, int outputStyle, boolean preserveSpacing)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

OUTPUT_STYLE_SLASH_TAGS

public static final int OUTPUT_STYLE_SLASH_TAGS
See Also:
Constant Field Values

OUTPUT_STYLE_XML

public static final int OUTPUT_STYLE_XML
See Also:
Constant Field Values

OUTPUT_STYLE_INLINE_XML

public static final int OUTPUT_STYLE_INLINE_XML
See Also:
Constant Field Values
Constructor Detail

PlainTextDocumentReaderAndWriter

public PlainTextDocumentReaderAndWriter()
Construct a PlainTextDocumentReaderAndWriter. You should call init() after using the constructor.

Method Detail

init

public void init(SeqClassifierFlags flags)
Description copied from interface: DocumentReaderAndWriter
Will be called immediately after construction. It's easier having an init() method because DocumentReaderAndWriter objects are usually created using reflection.

Specified by:
init in interface DocumentReaderAndWriter
Parameters:
flags - Flags specifying behavior

getIterator

public Iterator<List<CoreLabel>> getIterator(Reader r)
Specified by:
getIterator in interface IteratorFromReaderFactory<List<CoreLabel>>

getAnswers

public static String getAnswers(List<CoreLabel> l)
Deprecated. This has been left in since it is still called in the version of the tagger that we currently distribute, but it will be removed.


asIntOutputFormat

public static int asIntOutputFormat(String outputFormat)

printAnswers

public void printAnswers(List<CoreLabel> list,
                         PrintWriter out)
Print the classifications for the document to the given Writer. This method now checks the outputFormat property, and can print in slashTags, inlineXML, or xml (stand-Off XML). For both the XML output formats, it preserves spacing, while for the slashTags format, it prints tokenized (since preserveSpacing output is somewhat dysfunctional with the slashTags format).

Specified by:
printAnswers in interface DocumentReaderAndWriter
Parameters:
list - List of tokens with classifier answers
out - Where to print the output to

getAnswers

public String getAnswers(List<CoreLabel> l,
                         int outputStyle,
                         boolean preserveSpacing)

printAnswers

public void printAnswers(List<CoreLabel> l,
                         PrintWriter out,
                         int outputStyle,
                         boolean preserveSpacing)


Stanford NLP Group