edu.stanford.nlp.process
Class WordToTaggedWordProcessor<IN extends HasWord,L,F>

java.lang.Object
  extended by edu.stanford.nlp.process.AbstractListProcessor<IN,HasWord,L,F>
      extended by edu.stanford.nlp.process.WordToTaggedWordProcessor<IN,L,F>
Type Parameters:
L - The type of the labels
F - The type of the features
All Implemented Interfaces:
DocumentProcessor<IN,HasWord,L,F>, ListProcessor<IN,HasWord>

public class WordToTaggedWordProcessor<IN extends HasWord,L,F>
extends AbstractListProcessor<IN,HasWord,L,F>

Transforms a Document of Words into a document all or partly of TaggedWords by breaking words on a tag divider character.

Author:
Teg Grenager (grenager@stanford.edu), Christopher Manning, Sarah Spikes (sdspikes@cs.stanford.edu) (Templatization)

Field Summary
protected  char splitChar
          The char that we will split on.
 
Constructor Summary
WordToTaggedWordProcessor()
          Create a WordToTaggedWordProcessor using the default forward slash character to split on.
WordToTaggedWordProcessor(char splitChar)
          Flexibly set the tag splitting chars.
 
Method Summary
static void main(java.lang.String[] args)
          This will print out some text, recognizing tags.
 java.util.List<HasWord> process(java.util.List<? extends IN> words)
          Returns a new Document where each Word with a tag has been converted to a TaggedWord.
 
Methods inherited from class edu.stanford.nlp.process.AbstractListProcessor
processDocument, processLists
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

splitChar

protected char splitChar
The char that we will split on.

Constructor Detail

WordToTaggedWordProcessor

public WordToTaggedWordProcessor()
Create a WordToTaggedWordProcessor using the default forward slash character to split on.


WordToTaggedWordProcessor

public WordToTaggedWordProcessor(char splitChar)
Flexibly set the tag splitting chars. A splitChar of 0 is interpreted to mean never split off a tag.

Parameters:
splitChar - The character at which to split
Method Detail

process

public java.util.List<HasWord> process(java.util.List<? extends IN> words)
Returns a new Document where each Word with a tag has been converted to a TaggedWord. Things in the input which don't implement HasWord will be deleted in the output. Things which do will be scanned for being word + splitChar + tag. If they are, they are split up and inserted as TaggedWords, otherwise they are added to the document with their current type. More precisely, they will be split on the last instance of splitChar with index above 0. This will give the correct split, providing tags don't include the splitChar, regardless of escaping, and will not allow an empty or null word - you can think of the first character as always being escaped.

Parameters:
words - The input Document (should be of HasWords)
Returns:
A new Document, perhaps with some of the things TaggedWords

main

public static void main(java.lang.String[] args)
This will print out some text, recognizing tags. It can be used to test tag breaking.
Usage: java edu.stanford.nlp.process.WordToTaggedWordProcessor fileOrUrl

Parameters:
args - Command line argument: a file or URL


Stanford NLP Group