edu.stanford.nlp.process
Class PTBTokenizer.PTBTokenizerFactory<T extends HasWord>
java.lang.Object
edu.stanford.nlp.process.PTBTokenizer.PTBTokenizerFactory<T>
- All Implemented Interfaces:
- IteratorFromReaderFactory<T>, TokenizerFactory<T>
- Enclosing class:
- PTBTokenizer<T extends HasWord>
public static class PTBTokenizer.PTBTokenizerFactory<T extends HasWord>
- extends Object
- implements TokenizerFactory<T>
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
tokenizeCRs
protected boolean tokenizeCRs
invertible
protected boolean invertible
suppressEscaping
protected boolean suppressEscaping
factory
protected LexedTokenFactory<T extends HasWord> factory
PTBTokenizer.PTBTokenizerFactory
public PTBTokenizer.PTBTokenizerFactory()
- Constructs a PTBTokenizerFactory which returns Word objects.
With the conversion of PTBTokenizer to be typesafe, this constructor
shouldn't really be here, and indeed, you need to add extra lines to
erase types to get it to compile. But it's still needed at present:
you need a no-argument constructor for when a TokenizerFactory is
created by newInstance(), e.g., in the parser. We should probably
change things by extending the TokenizerFactory interface so that it
has a static factory method defined for constructing tokenizers.
TODO: Clean up this mess.
PTBTokenizer.PTBTokenizerFactory
public PTBTokenizer.PTBTokenizerFactory(boolean tokenizeCRs,
LexedTokenFactory<T> factory)
newPTBTokenizerFactory
public static PTBTokenizer.PTBTokenizerFactory<Word> newPTBTokenizerFactory()
- Constructs a new PTBTokenizerFactory that treats carriage returns as
normal whitespace and returns Word objects.
- Returns:
- A TokenizerFactory that returns Word objects
newPTBTokenizerFactory
public static PTBTokenizer.PTBTokenizerFactory<Word> newPTBTokenizerFactory(boolean tokenizeCRs)
- Constructs a new PTBTokenizer that optionally returns carriage returns
as their own token.
- Parameters:
tokenizeCRs
- If true, CRs come back as Words whose text is
the value of PTBLexer.cr
.
- Returns:
- A TokenizerFactory that returns Word objects
newPTBTokenizerFactory
public static PTBTokenizer.PTBTokenizerFactory<CoreLabel> newPTBTokenizerFactory(boolean tokenizeCRs,
boolean invertible)
newPTBTokenizerFactory
public static PTBTokenizer.PTBTokenizerFactory<Word> newPTBTokenizerFactory(boolean tokenizeCRs,
boolean invertible,
boolean suppressEscaping)
getIterator
public Iterator<T> getIterator(Reader r)
- Specified by:
getIterator
in interface IteratorFromReaderFactory<T extends HasWord>
getTokenizer
public Tokenizer<T> getTokenizer(Reader r)
- Specified by:
getTokenizer
in interface TokenizerFactory<T extends HasWord>
Stanford NLP Group