edu.stanford.nlp.process
Class PTBTokenizer.PTBTokenizerFactory<T extends HasWord>

java.lang.Object
  extended by edu.stanford.nlp.process.PTBTokenizer.PTBTokenizerFactory<T>
All Implemented Interfaces:
IteratorFromReaderFactory<T>, TokenizerFactory<T>
Enclosing class:
PTBTokenizer<T extends HasWord>

public static class PTBTokenizer.PTBTokenizerFactory<T extends HasWord>
extends Object
implements TokenizerFactory<T>


Field Summary
protected  LexedTokenFactory<T> factory
           
protected  boolean invertible
           
protected  boolean suppressEscaping
           
protected  boolean tokenizeCRs
           
 
Constructor Summary
PTBTokenizer.PTBTokenizerFactory()
          Constructs a PTBTokenizerFactory which returns Word objects.
PTBTokenizer.PTBTokenizerFactory(boolean tokenizeCRs, LexedTokenFactory<T> factory)
           
 
Method Summary
 Iterator<T> getIterator(Reader r)
           
 Tokenizer<T> getTokenizer(Reader r)
           
static PTBTokenizer.PTBTokenizerFactory<Word> newPTBTokenizerFactory()
          Constructs a new PTBTokenizerFactory that treats carriage returns as normal whitespace and returns Word objects.
static PTBTokenizer.PTBTokenizerFactory<Word> newPTBTokenizerFactory(boolean tokenizeCRs)
          Constructs a new PTBTokenizer that optionally returns carriage returns as their own token.
static PTBTokenizer.PTBTokenizerFactory<CoreLabel> newPTBTokenizerFactory(boolean tokenizeCRs, boolean invertible)
           
static PTBTokenizer.PTBTokenizerFactory<Word> newPTBTokenizerFactory(boolean tokenizeCRs, boolean invertible, boolean suppressEscaping)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

tokenizeCRs

protected boolean tokenizeCRs

invertible

protected boolean invertible

suppressEscaping

protected boolean suppressEscaping

factory

protected LexedTokenFactory<T extends HasWord> factory
Constructor Detail

PTBTokenizer.PTBTokenizerFactory

public PTBTokenizer.PTBTokenizerFactory()
Constructs a PTBTokenizerFactory which returns Word objects. With the conversion of PTBTokenizer to be typesafe, this constructor shouldn't really be here, and indeed, you need to add extra lines to erase types to get it to compile. But it's still needed at present: you need a no-argument constructor for when a TokenizerFactory is created by newInstance(), e.g., in the parser. We should probably change things by extending the TokenizerFactory interface so that it has a static factory method defined for constructing tokenizers. TODO: Clean up this mess.


PTBTokenizer.PTBTokenizerFactory

public PTBTokenizer.PTBTokenizerFactory(boolean tokenizeCRs,
                                        LexedTokenFactory<T> factory)
Method Detail

newPTBTokenizerFactory

public static PTBTokenizer.PTBTokenizerFactory<Word> newPTBTokenizerFactory()
Constructs a new PTBTokenizerFactory that treats carriage returns as normal whitespace and returns Word objects.

Returns:
A TokenizerFactory that returns Word objects

newPTBTokenizerFactory

public static PTBTokenizer.PTBTokenizerFactory<Word> newPTBTokenizerFactory(boolean tokenizeCRs)
Constructs a new PTBTokenizer that optionally returns carriage returns as their own token.

Parameters:
tokenizeCRs - If true, CRs come back as Words whose text is the value of PTBLexer.cr.
Returns:
A TokenizerFactory that returns Word objects

newPTBTokenizerFactory

public static PTBTokenizer.PTBTokenizerFactory<CoreLabel> newPTBTokenizerFactory(boolean tokenizeCRs,
                                                                                 boolean invertible)

newPTBTokenizerFactory

public static PTBTokenizer.PTBTokenizerFactory<Word> newPTBTokenizerFactory(boolean tokenizeCRs,
                                                                            boolean invertible,
                                                                            boolean suppressEscaping)

getIterator

public Iterator<T> getIterator(Reader r)
Specified by:
getIterator in interface IteratorFromReaderFactory<T extends HasWord>

getTokenizer

public Tokenizer<T> getTokenizer(Reader r)
Specified by:
getTokenizer in interface TokenizerFactory<T extends HasWord>


Stanford NLP Group