edu.stanford.nlp.process
Class PTBTokenizer.PTBTokenizerFactory<T extends HasWord>

java.lang.Object
  extended by edu.stanford.nlp.process.PTBTokenizer.PTBTokenizerFactory<T>
All Implemented Interfaces:
IteratorFromReaderFactory<T>, TokenizerFactory<T>
Enclosing class:
PTBTokenizer<T extends HasWord>

public static class PTBTokenizer.PTBTokenizerFactory<T extends HasWord>
extends Object
implements TokenizerFactory<T>


Field Summary
protected  LexedTokenFactory<T> factory
           
protected  String options
           
 
Method Summary
 Iterator<T> getIterator(Reader r)
           
 Tokenizer<T> getTokenizer(Reader r)
           
static PTBTokenizer.PTBTokenizerFactory<CoreLabel> newCoreLabelTokenizerFactory(String options)
          Constructs a new PTBTokenizer that returns CoreLabel objects and uses the options passed in.
static PTBTokenizer.PTBTokenizerFactory<Word> newPTBTokenizerFactory(boolean tokenizeNLs)
          Constructs a new PTBTokenizer that optionally returns carriage returns as their own token.
static PTBTokenizer.PTBTokenizerFactory<CoreLabel> newPTBTokenizerFactory(boolean tokenizeNLs, boolean invertible)
           
static
<T extends HasWord>
PTBTokenizer.PTBTokenizerFactory<T>
newPTBTokenizerFactory(LexedTokenFactory<T> tokenFactory, String options)
          Constructs a new PTBTokenizer that uses the LexedTokenFactory and options passed in.
static TokenizerFactory<Word> newTokenizerFactory()
          Constructs a new TokenizerFactory that returns Word objects and treats carriage returns as normal whitespace.
static PTBTokenizer.PTBTokenizerFactory<Word> newWordTokenizerFactory(String options)
          Constructs a new PTBTokenizer that returns Word objects and uses the options passed in.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

factory

protected LexedTokenFactory<T extends HasWord> factory

options

protected String options
Method Detail

newTokenizerFactory

public static TokenizerFactory<Word> newTokenizerFactory()
Constructs a new TokenizerFactory that returns Word objects and treats carriage returns as normal whitespace. THIS METHOD IS INVOKED BY REFLECTION BY SOME OF THE JAVANLP CODE TO LOAD A TOKENIZER FACTORY. IT SHOULD BE PRESENT IN A TokenizerFactory.

Returns:
A TokenizerFactory that returns Word objects

newPTBTokenizerFactory

public static PTBTokenizer.PTBTokenizerFactory<Word> newPTBTokenizerFactory(boolean tokenizeNLs)
Constructs a new PTBTokenizer that optionally returns carriage returns as their own token.

Parameters:
tokenizeNLs - If true, newlines come back as Words whose text is the value of PTBLexer.NEWLINE_TOKEN.
Returns:
A TokenizerFactory that returns Word objects

newWordTokenizerFactory

public static PTBTokenizer.PTBTokenizerFactory<Word> newWordTokenizerFactory(String options)
Constructs a new PTBTokenizer that returns Word objects and uses the options passed in.

Parameters:
options - A String of options
Returns:
A TokenizerFactory that returns Word objects

newCoreLabelTokenizerFactory

public static PTBTokenizer.PTBTokenizerFactory<CoreLabel> newCoreLabelTokenizerFactory(String options)
Constructs a new PTBTokenizer that returns CoreLabel objects and uses the options passed in.

Parameters:
options - A String of options
Returns:
A TokenizerFactory that returns CoreLabel objects o

newPTBTokenizerFactory

public static <T extends HasWord> PTBTokenizer.PTBTokenizerFactory<T> newPTBTokenizerFactory(LexedTokenFactory<T> tokenFactory,
                                                                                             String options)
Constructs a new PTBTokenizer that uses the LexedTokenFactory and options passed in.

Parameters:
tokenFactory - The LexedTokenFactory
options - A String of options
Returns:
A TokenizerFactory that returns objects of the type of the LexedTokenFactory

newPTBTokenizerFactory

public static PTBTokenizer.PTBTokenizerFactory<CoreLabel> newPTBTokenizerFactory(boolean tokenizeNLs,
                                                                                 boolean invertible)

getIterator

public Iterator<T> getIterator(Reader r)
Specified by:
getIterator in interface IteratorFromReaderFactory<T extends HasWord>

getTokenizer

public Tokenizer<T> getTokenizer(Reader r)
Specified by:
getTokenizer in interface TokenizerFactory<T extends HasWord>


Stanford NLP Group