edu.stanford.nlp.objectbank
Class XMLBeginEndIterator<E>

java.lang.Object
  extended by edu.stanford.nlp.util.AbstractIterator<E>
      extended by edu.stanford.nlp.objectbank.XMLBeginEndIterator<E>
All Implemented Interfaces:
Tokenizer<E>, Iterator<E>

public class XMLBeginEndIterator<E>
extends AbstractIterator<E>
implements Tokenizer<E>

A class which iterates over Strings occuring between the begin and end of a selected tag or tags. The element is specified by a regexp, matched against the name of the element (i.e., excluding the angle bracket characters) using matches()). The class ignores all other characters in the input Reader.

Author:
Teg Grenager (grenager@stanford.edu)

Constructor Summary
XMLBeginEndIterator(Reader in, String tagNameRegexp)
           
XMLBeginEndIterator(Reader in, String tagNameRegexp, boolean keepInternalTags)
           
XMLBeginEndIterator(Reader in, String tagNameRegexp, boolean keepInternalTags, boolean keepDelimitingTags)
           
XMLBeginEndIterator(Reader in, String tagNameRegexp, Function<String,E> op, boolean keepInternalTags)
           
XMLBeginEndIterator(Reader in, String tagNameRegexp, Function<String,E> op, boolean keepInternalTags, boolean keepDelimitingTags)
           
 
Method Summary
static IteratorFromReaderFactory<String> getFactory(String tag)
          Returns a factory that vends BeginEndIterators that reads the contents of the given Reader, extracts text between the specified Strings, then returns the result.
static IteratorFromReaderFactory<String> getFactory(String tag, boolean keepInternalTags, boolean keepDelimitingTags)
           
static
<E> IteratorFromReaderFactory<E>
getFactory(String tag, Function<String,E> op)
           
static
<E> IteratorFromReaderFactory<E>
getFactory(String tag, Function<String,E> op, boolean keepInternalTags, boolean keepDelimitingTags)
           
 boolean hasNext()
          Returns true if and only if this Tokenizer has more elements.
static void main(String[] args)
           
 E next()
          Returns the next token from this Tokenizer.
protected  E parseString(String s)
           
 E peek()
          Returns the next token, without removing it, from the Tokenizer, so that the same token will be again returned on the next call to next() or peek().
 List<E> tokenize()
          Returns pieces of text in element as a List of tokens.
 
Methods inherited from class edu.stanford.nlp.util.AbstractIterator
remove
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface edu.stanford.nlp.process.Tokenizer
remove
 

Constructor Detail

XMLBeginEndIterator

public XMLBeginEndIterator(Reader in,
                           String tagNameRegexp)

XMLBeginEndIterator

public XMLBeginEndIterator(Reader in,
                           String tagNameRegexp,
                           boolean keepInternalTags)

XMLBeginEndIterator

public XMLBeginEndIterator(Reader in,
                           String tagNameRegexp,
                           Function<String,E> op,
                           boolean keepInternalTags)

XMLBeginEndIterator

public XMLBeginEndIterator(Reader in,
                           String tagNameRegexp,
                           boolean keepInternalTags,
                           boolean keepDelimitingTags)

XMLBeginEndIterator

public XMLBeginEndIterator(Reader in,
                           String tagNameRegexp,
                           Function<String,E> op,
                           boolean keepInternalTags,
                           boolean keepDelimitingTags)
Method Detail

parseString

protected E parseString(String s)

hasNext

public boolean hasNext()
Description copied from interface: Tokenizer
Returns true if and only if this Tokenizer has more elements.

Specified by:
hasNext in interface Tokenizer<E>
Specified by:
hasNext in interface Iterator<E>
Specified by:
hasNext in class AbstractIterator<E>

next

public E next()
Description copied from interface: Tokenizer
Returns the next token from this Tokenizer.

Specified by:
next in interface Tokenizer<E>
Specified by:
next in interface Iterator<E>
Specified by:
next in class AbstractIterator<E>
Returns:
the next token in the token stream.

peek

public E peek()
Description copied from interface: Tokenizer
Returns the next token, without removing it, from the Tokenizer, so that the same token will be again returned on the next call to next() or peek().

Specified by:
peek in interface Tokenizer<E>
Returns:
the next token in the token stream.

tokenize

public List<E> tokenize()
Returns pieces of text in element as a List of tokens.

Specified by:
tokenize in interface Tokenizer<E>
Returns:
A list of all tokens remaining in the underlying Reader

getFactory

public static IteratorFromReaderFactory<String> getFactory(String tag)
Returns a factory that vends BeginEndIterators that reads the contents of the given Reader, extracts text between the specified Strings, then returns the result.

Parameters:
tag - The tag the XMLBeginEndIterator will match on
Returns:
The IteratorFromReaderFactory

getFactory

public static IteratorFromReaderFactory<String> getFactory(String tag,
                                                           boolean keepInternalTags,
                                                           boolean keepDelimitingTags)

getFactory

public static <E> IteratorFromReaderFactory<E> getFactory(String tag,
                                                          Function<String,E> op)

getFactory

public static <E> IteratorFromReaderFactory<E> getFactory(String tag,
                                                          Function<String,E> op,
                                                          boolean keepInternalTags,
                                                          boolean keepDelimitingTags)

main

public static void main(String[] args)
                 throws IOException
Throws:
IOException


Stanford NLP Group