|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--edu.stanford.nlp.process.AbstractTokenizer | +--edu.stanford.nlp.process.PTBTokenizer
Tokenizer implementation that conforms to the Penn Treebank tokenization conventions. This tokenizer is a Java implementation of Professor Chris Manning's Flex tokenizer, pgtt-treebank.l. It reads raw text and outputs tokens as edu.stanford.nlp.trees.Words in the Penn treebank format. It can optionally return carriage returns as tokens.
Constructor Summary | |
PTBTokenizer()
Constructs a new PTBTokenizer that treats carriage returns as normal whitespace. |
|
PTBTokenizer(boolean tokenizeCRs)
Constructs a new PTBTokenizer that optionally returns carriage returns as their own token. |
|
PTBTokenizer(Reader r)
Constructs a new PTBTokenizer that treats carriage returns as normal whitespace. |
|
PTBTokenizer(Reader r,
boolean tokenizeCRs)
Constructs a new PTBTokenizer that optionally returns carriage returns as their own token. |
Method Summary | |
boolean |
hasNext()
Returns true if this Tokenizer has more elements. |
static void |
main(String[] args)
Reads a file from the argument and prints its tokens one per line. |
Object |
next()
Returns the next Word token, or null if there is none. |
void |
setSource(Reader r)
Sets the source of this Tokenizer to be the Reader r. |
Methods inherited from class edu.stanford.nlp.process.AbstractTokenizer |
pushBack, remove, tokenize |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public PTBTokenizer()
public PTBTokenizer(boolean tokenizeCRs)
PTBLexer#cr
.
public PTBTokenizer(Reader r)
public PTBTokenizer(Reader r, boolean tokenizeCRs)
PTBLexer#cr
.
Method Detail |
public boolean hasNext()
hasNext
in interface Tokenizer
hasNext
in class AbstractTokenizer
public Object next()
next
in interface Tokenizer
next
in class AbstractTokenizer
public static void main(String[] args) throws IOException
Usage: java edu.stanford.nlp.process.PTBTokenizer filename
args
- Command line arguments
IOException
public void setSource(Reader r)
setSource
in interface Tokenizer
setSource
in class AbstractTokenizer
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |