T
- The class of the returned tokenspublic static class PTBTokenizer.PTBTokenizerFactory<T extends HasWord> extends Object implements TokenizerFactory<T>
PTBTokenizer
for details of the parameters and options.PTBTokenizer
Modifier and Type | Field and Description |
---|---|
protected LexedTokenFactory<T> |
factory |
protected String |
options |
Modifier and Type | Method and Description |
---|---|
Iterator<T> |
getIterator(Reader r)
Returns a tokenizer wrapping the given Reader.
|
Tokenizer<T> |
getTokenizer(Reader r)
Returns a tokenizer wrapping the given Reader.
|
Tokenizer<T> |
getTokenizer(Reader r,
String extraOptions) |
static PTBTokenizer.PTBTokenizerFactory<CoreLabel> |
newCoreLabelTokenizerFactory(String options)
Constructs a new PTBTokenizer that returns CoreLabel objects and
uses the options passed in.
|
static PTBTokenizer.PTBTokenizerFactory<CoreLabel> |
newPTBTokenizerFactory(boolean tokenizeNLs,
boolean invertible) |
static <T extends HasWord> |
newPTBTokenizerFactory(LexedTokenFactory<T> tokenFactory,
String options)
Constructs a new PTBTokenizer that uses the LexedTokenFactory and
options passed in.
|
static TokenizerFactory<Word> |
newTokenizerFactory()
Constructs a new TokenizerFactory that returns Word objects and
treats carriage returns as normal whitespace.
|
static PTBTokenizer.PTBTokenizerFactory<Word> |
newWordTokenizerFactory(String options)
Constructs a new PTBTokenizer that returns Word objects and
uses the options passed in.
|
void |
setOptions(String options) |
protected final LexedTokenFactory<T extends HasWord> factory
protected String options
public static TokenizerFactory<Word> newTokenizerFactory()
public static PTBTokenizer.PTBTokenizerFactory<Word> newWordTokenizerFactory(String options)
options
- A String of optionspublic static PTBTokenizer.PTBTokenizerFactory<CoreLabel> newCoreLabelTokenizerFactory(String options)
options
- A String of optionspublic static <T extends HasWord> PTBTokenizer.PTBTokenizerFactory<T> newPTBTokenizerFactory(LexedTokenFactory<T> tokenFactory, String options)
tokenFactory
- The LexedTokenFactoryoptions
- A String of optionspublic static PTBTokenizer.PTBTokenizerFactory<CoreLabel> newPTBTokenizerFactory(boolean tokenizeNLs, boolean invertible)
public Iterator<T> getIterator(Reader r)
getIterator
in interface IteratorFromReaderFactory<T extends HasWord>
r
- Where to read objects frompublic Tokenizer<T> getTokenizer(Reader r)
getTokenizer
in interface TokenizerFactory<T extends HasWord>
public Tokenizer<T> getTokenizer(Reader r, String extraOptions)
getTokenizer
in interface TokenizerFactory<T extends HasWord>
public void setOptions(String options)
setOptions
in interface TokenizerFactory<T extends HasWord>