edu.stanford.nlp.trees.international.arabic
Class ArabicTokenizer
java.lang.Object
edu.stanford.nlp.process.AbstractTokenizer<Word>
edu.stanford.nlp.trees.international.arabic.ArabicTokenizer
- All Implemented Interfaces:
- Tokenizer<Word>, java.util.Iterator<Word>
public class ArabicTokenizer
- extends AbstractTokenizer<Word>
An ArabicTokenizer is a simple tokenizer that splits off a few punctuation characters,
and otherwise just splits on and discards whitespace characters.
This implementation returns Word objects. It has a parameter for whether
to make EOL a token or whether to treat EOL characters as whitespace.
If an EOL is a token, the class returns it as a Word with String value "\n".
- Author:
- Christopher Manning
Constructor Summary |
ArabicTokenizer(java.io.Reader r)
Constructs a new ArabicTokenizer |
ArabicTokenizer(java.io.Reader r,
boolean eolIsSignificant)
Constructs a new ArabicTokenizer |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ArabicTokenizer
public ArabicTokenizer(java.io.Reader r)
- Constructs a new ArabicTokenizer
- Parameters:
r
- The Reader r that is its source.
ArabicTokenizer
public ArabicTokenizer(java.io.Reader r,
boolean eolIsSignificant)
- Constructs a new ArabicTokenizer
- Parameters:
r
- The Reader that is its source.eolIsSignificant
- Whether eol tokens should be returned.
getNext
protected Word getNext()
- Internally fetches the next token.
- Specified by:
getNext
in class AbstractTokenizer<Word>
- Returns:
- the next token in the token stream, or null if none exists.
factory
public static TokenizerFactory<Word> factory()
factory
public static TokenizerFactory<Word> factory(boolean eolIsSignificant)
main
public static void main(java.lang.String[] args)
throws java.io.IOException
- Reads a file from the argument and prints its tokens one per line.
This is mainly as a testing aid, but it can also be quite useful
standalone to turn a corpus into a one token per line file of tokens.
Usage:
java edu.stanford.nlp.process.ArabicTokenizer filename
- Parameters:
args
- Command line arguments
- Throws:
java.io.IOException
- If can't open files, etc.
Stanford NLP Group