edu.stanford.nlp.process
Class PTBTokenizer.PTBTokenizerFactory<T extends HasWord>

java.lang.Object
  extended by edu.stanford.nlp.process.PTBTokenizer.PTBTokenizerFactory<T>
Type Parameters:
T - The class of the returned tokens
All Implemented Interfaces:
IteratorFromReaderFactory<T>, TokenizerFactory<T>
Enclosing class:
PTBTokenizer<T extends HasWord>

public static class PTBTokenizer.PTBTokenizerFactory<T extends HasWord>
extends Object
implements TokenizerFactory<T>

This class provides a factory which will vend instances of PTBTokenizer which wrap a provided Reader. See the documentation for PTBTokenizer for details of the parameters and options.

See Also:
PTBTokenizer

Field Summary
protected  LexedTokenFactory<T> factory
           
protected  String options
           
 
Method Summary
 Iterator<T> getIterator(Reader r)
          Returns a tokenizer wrapping the given Reader.
 Tokenizer<T> getTokenizer(Reader r)
          Returns a tokenizer wrapping the given Reader.
static PTBTokenizer.PTBTokenizerFactory<CoreLabel> newCoreLabelTokenizerFactory(String options)
          Constructs a new PTBTokenizer that returns CoreLabel objects and uses the options passed in.
static PTBTokenizer.PTBTokenizerFactory<Word> newPTBTokenizerFactory(boolean tokenizeNLs)
          Constructs a new PTBTokenizer that optionally returns carriage returns as their own token.
static PTBTokenizer.PTBTokenizerFactory<CoreLabel> newPTBTokenizerFactory(boolean tokenizeNLs, boolean invertible)
           
static
<T extends HasWord>
PTBTokenizer.PTBTokenizerFactory<T>
newPTBTokenizerFactory(LexedTokenFactory<T> tokenFactory, String options)
          Constructs a new PTBTokenizer that uses the LexedTokenFactory and options passed in.
static TokenizerFactory<Word> newTokenizerFactory()
          Constructs a new TokenizerFactory that returns Word objects and treats carriage returns as normal whitespace.
static PTBTokenizer.PTBTokenizerFactory<Word> newWordTokenizerFactory(String options)
          Constructs a new PTBTokenizer that returns Word objects and uses the options passed in.
 void setOptions(String options)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

factory

protected LexedTokenFactory<T extends HasWord> factory

options

protected String options
Method Detail

newTokenizerFactory

public static TokenizerFactory<Word> newTokenizerFactory()
Constructs a new TokenizerFactory that returns Word objects and treats carriage returns as normal whitespace. THIS METHOD IS INVOKED BY REFLECTION BY SOME OF THE JAVANLP CODE TO LOAD A TOKENIZER FACTORY. IT SHOULD BE PRESENT IN A TokenizerFactory.

Returns:
A TokenizerFactory that returns Word objects

newPTBTokenizerFactory

public static PTBTokenizer.PTBTokenizerFactory<Word> newPTBTokenizerFactory(boolean tokenizeNLs)
Constructs a new PTBTokenizer that optionally returns carriage returns as their own token.

Parameters:
tokenizeNLs - If true, newlines come back as Words whose text is the value of PTBLexer.NEWLINE_TOKEN.
Returns:
A TokenizerFactory that returns Word objects

newWordTokenizerFactory

public static PTBTokenizer.PTBTokenizerFactory<Word> newWordTokenizerFactory(String options)
Constructs a new PTBTokenizer that returns Word objects and uses the options passed in.

Parameters:
options - A String of options
Returns:
A TokenizerFactory that returns Word objects

newCoreLabelTokenizerFactory

public static PTBTokenizer.PTBTokenizerFactory<CoreLabel> newCoreLabelTokenizerFactory(String options)
Constructs a new PTBTokenizer that returns CoreLabel objects and uses the options passed in.

Parameters:
options - A String of options
Returns:
A TokenizerFactory that returns CoreLabel objects o

newPTBTokenizerFactory

public static <T extends HasWord> PTBTokenizer.PTBTokenizerFactory<T> newPTBTokenizerFactory(LexedTokenFactory<T> tokenFactory,
                                                                                             String options)
Constructs a new PTBTokenizer that uses the LexedTokenFactory and options passed in.

Parameters:
tokenFactory - The LexedTokenFactory
options - A String of options
Returns:
A TokenizerFactory that returns objects of the type of the LexedTokenFactory

newPTBTokenizerFactory

public static PTBTokenizer.PTBTokenizerFactory<CoreLabel> newPTBTokenizerFactory(boolean tokenizeNLs,
                                                                                 boolean invertible)

getIterator

public Iterator<T> getIterator(Reader r)
Returns a tokenizer wrapping the given Reader.

Specified by:
getIterator in interface IteratorFromReaderFactory<T extends HasWord>
Parameters:
r - Where to read objects from
Returns:
An Iterator over the objects

getTokenizer

public Tokenizer<T> getTokenizer(Reader r)
Returns a tokenizer wrapping the given Reader.

Specified by:
getTokenizer in interface TokenizerFactory<T extends HasWord>

setOptions

public void setOptions(String options)
Specified by:
setOptions in interface TokenizerFactory<T extends HasWord>


Stanford NLP Group