Create a WordToTaggedWordProcessor using the default
forward slash character to split on.
public WordToTaggedWordProcessor(char splitChar)
Flexibly set the tag splitting chars. A splitChar of 0 is
interpreted to mean never split off a tag.
splitChar - The character at which to split
public java.util.List<HasWord> process(java.util.List<? extends IN> words)
Returns a new Document where each Word with a tag has been converted
to a TaggedWord. Things in the input which don't implement HasWord
will be deleted in the output. Things which do will be scanned for
being word + splitChar + tag. If they are, they are split up and
inserted as TaggedWords, otherwise they are added to the document
with their current type. More precisely, they will be split on the
last instance of splitChar with index above 0. This will give the
correct split, providing tags don't include the splitChar, regardless
of escaping, and will not allow an empty or null word - you can think
of the first character as always being escaped.
words - The input Document (should be of HasWords)
A new Document, perhaps with some of the things TaggedWords
public static void main(java.lang.String args)
This will print out some text, recognizing tags. It can be used to
test tag breaking. Usage:
java edu.stanford.nlp.process.WordToTaggedWordProcessor fileOrUrl