edu.stanford.nlp.process
Class PTBTokenizer.PTBTokenizerFactory<T extends HasWord>

java.lang.Object
  extended by edu.stanford.nlp.process.PTBTokenizer.PTBTokenizerFactory<T>
All Implemented Interfaces:
IteratorFromReaderFactory<T>, TokenizerFactory<T>
Enclosing class:
PTBTokenizer<T extends HasWord>

public static class PTBTokenizer.PTBTokenizerFactory<T extends HasWord>
extends Object
implements TokenizerFactory<T>


Field Summary
protected  LexedTokenFactory<T> factory
           
protected  boolean invertible
           
protected  boolean suppressEscaping
           
protected  boolean tokenizeCRs
           
 
Constructor Summary
PTBTokenizer.PTBTokenizerFactory(boolean tokenizeCRs, LexedTokenFactory<T> factory)
           
 
Method Summary
 Iterator<T> getIterator(Reader r)
           
 Tokenizer<T> getTokenizer(Reader r)
           
static PTBTokenizer.PTBTokenizerFactory<Word> newPTBTokenizerFactory()
          Constructs a new PTBTokenizerFactory that treats carriage returns as normal whitespace and returns Word objects.
static PTBTokenizer.PTBTokenizerFactory<Word> newPTBTokenizerFactory(boolean tokenizeCRs)
          Constructs a new PTBTokenizer that optionally returns carriage returns as their own token.
static PTBTokenizer.PTBTokenizerFactory<CoreLabel> newPTBTokenizerFactory(boolean tokenizeCRs, boolean invertible)
           
static PTBTokenizer.PTBTokenizerFactory<Word> newPTBTokenizerFactory(boolean tokenizeCRs, boolean invertible, boolean suppressEscaping)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

tokenizeCRs

protected boolean tokenizeCRs

invertible

protected boolean invertible

suppressEscaping

protected boolean suppressEscaping

factory

protected LexedTokenFactory<T extends HasWord> factory
Constructor Detail

PTBTokenizer.PTBTokenizerFactory

public PTBTokenizer.PTBTokenizerFactory(boolean tokenizeCRs,
                                        LexedTokenFactory<T> factory)
Method Detail

newPTBTokenizerFactory

public static PTBTokenizer.PTBTokenizerFactory<Word> newPTBTokenizerFactory()
Constructs a new PTBTokenizerFactory that treats carriage returns as normal whitespace and returns Word objects.

Returns:
A TokenizerFactory that returns Word objects

newPTBTokenizerFactory

public static PTBTokenizer.PTBTokenizerFactory<Word> newPTBTokenizerFactory(boolean tokenizeCRs)
Constructs a new PTBTokenizer that optionally returns carriage returns as their own token.

Parameters:
tokenizeCRs - If true, CRs come back as Words whose text is the value of PTBLexer.cr.
Returns:
A TokenizerFactory that returns Word objects

newPTBTokenizerFactory

public static PTBTokenizer.PTBTokenizerFactory<CoreLabel> newPTBTokenizerFactory(boolean tokenizeCRs,
                                                                                 boolean invertible)

newPTBTokenizerFactory

public static PTBTokenizer.PTBTokenizerFactory<Word> newPTBTokenizerFactory(boolean tokenizeCRs,
                                                                            boolean invertible,
                                                                            boolean suppressEscaping)

getIterator

public Iterator<T> getIterator(Reader r)
Specified by:
getIterator in interface IteratorFromReaderFactory<T extends HasWord>

getTokenizer

public Tokenizer<T> getTokenizer(Reader r)
Specified by:
getTokenizer in interface TokenizerFactory<T extends HasWord>


Stanford NLP Group