edu.stanford.nlp.objectbank
Class XMLBeginEndIterator<E>

java.lang.Object
  extended by edu.stanford.nlp.util.AbstractIterator<E>
      extended by edu.stanford.nlp.objectbank.XMLBeginEndIterator<E>
All Implemented Interfaces:
Tokenizer<E>, java.util.Iterator<E>

public class XMLBeginEndIterator<E>
extends AbstractIterator<E>
implements Tokenizer<E>

A class which iterates over Strings occurring between the begin and end of a selected tag or tags. The element is specified by a regexp, matched against the name of the element (i.e., excluding the angle bracket characters) using matches()). The class ignores all other characters in the input Reader. There are a few different ways to modify the output of the XMLBeginEndIterator. One way is to ask it to keep internal tags; if keepInternalTags is set, then <text>A<foo/>B</text> will be printed as A<foo/>B. Another is to tell it to keep delimiting tags; in the above example, <text> will be kept as well. Finally, you can ask it to keep track of the nesting depth; the ordinary behavior of this iterator is to close all tags with just one close tag. This is incorrect XML behavior, but is kept in case any code relies on it. If countDepth is set, though, the iterator keeps track of how much it has nested.

Author:
Teg Grenager (grenager@stanford.edu)

Constructor Summary
XMLBeginEndIterator(java.io.Reader in, java.lang.String tagNameRegexp)
           
XMLBeginEndIterator(java.io.Reader in, java.lang.String tagNameRegexp, boolean keepInternalTags)
           
XMLBeginEndIterator(java.io.Reader in, java.lang.String tagNameRegexp, boolean keepInternalTags, boolean keepDelimitingTags)
           
XMLBeginEndIterator(java.io.Reader in, java.lang.String tagNameRegexp, boolean keepInternalTags, boolean keepDelimitingTags, boolean countDepth)
           
XMLBeginEndIterator(java.io.Reader in, java.lang.String tagNameRegexp, Function<java.lang.String,E> op, boolean keepInternalTags)
           
XMLBeginEndIterator(java.io.Reader in, java.lang.String tagNameRegexp, Function<java.lang.String,E> op, boolean keepInternalTags, boolean keepDelimitingTags)
           
XMLBeginEndIterator(java.io.Reader in, java.lang.String tagNameRegexp, Function<java.lang.String,E> op, boolean keepInternalTags, boolean keepDelimitingTags, boolean countDepth)
           
 
Method Summary
static IteratorFromReaderFactory<java.lang.String> getFactory(java.lang.String tag)
          Returns a factory that vends BeginEndIterators that reads the contents of the given Reader, extracts text between the specified Strings, then returns the result.
static IteratorFromReaderFactory<java.lang.String> getFactory(java.lang.String tag, boolean keepInternalTags, boolean keepDelimitingTags)
           
static
<E> IteratorFromReaderFactory<E>
getFactory(java.lang.String tag, Function<java.lang.String,E> op)
           
static
<E> IteratorFromReaderFactory<E>
getFactory(java.lang.String tag, Function<java.lang.String,E> op, boolean keepInternalTags, boolean keepDelimitingTags)
           
 boolean hasNext()
          Returns true if and only if this Tokenizer has more elements.
static void main(java.lang.String[] args)
           
 E next()
          Returns the next token from this Tokenizer.
protected  E parseString(java.lang.String s)
           
 E peek()
          Returns the next token, without removing it, from the Tokenizer, so that the same token will be again returned on the next call to next() or peek().
 java.util.List<E> tokenize()
          Returns pieces of text in element as a List of tokens.
 
Methods inherited from class edu.stanford.nlp.util.AbstractIterator
remove
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface edu.stanford.nlp.process.Tokenizer
remove
 

Constructor Detail

XMLBeginEndIterator

public XMLBeginEndIterator(java.io.Reader in,
                           java.lang.String tagNameRegexp)

XMLBeginEndIterator

public XMLBeginEndIterator(java.io.Reader in,
                           java.lang.String tagNameRegexp,
                           boolean keepInternalTags)

XMLBeginEndIterator

public XMLBeginEndIterator(java.io.Reader in,
                           java.lang.String tagNameRegexp,
                           Function<java.lang.String,E> op,
                           boolean keepInternalTags)

XMLBeginEndIterator

public XMLBeginEndIterator(java.io.Reader in,
                           java.lang.String tagNameRegexp,
                           boolean keepInternalTags,
                           boolean keepDelimitingTags)

XMLBeginEndIterator

public XMLBeginEndIterator(java.io.Reader in,
                           java.lang.String tagNameRegexp,
                           boolean keepInternalTags,
                           boolean keepDelimitingTags,
                           boolean countDepth)

XMLBeginEndIterator

public XMLBeginEndIterator(java.io.Reader in,
                           java.lang.String tagNameRegexp,
                           Function<java.lang.String,E> op,
                           boolean keepInternalTags,
                           boolean keepDelimitingTags)

XMLBeginEndIterator

public XMLBeginEndIterator(java.io.Reader in,
                           java.lang.String tagNameRegexp,
                           Function<java.lang.String,E> op,
                           boolean keepInternalTags,
                           boolean keepDelimitingTags,
                           boolean countDepth)
Method Detail

parseString

protected E parseString(java.lang.String s)

hasNext

public boolean hasNext()
Description copied from interface: Tokenizer
Returns true if and only if this Tokenizer has more elements.

Specified by:
hasNext in interface Tokenizer<E>
Specified by:
hasNext in interface java.util.Iterator<E>
Specified by:
hasNext in class AbstractIterator<E>

next

public E next()
Description copied from interface: Tokenizer
Returns the next token from this Tokenizer.

Specified by:
next in interface Tokenizer<E>
Specified by:
next in interface java.util.Iterator<E>
Specified by:
next in class AbstractIterator<E>
Returns:
the next token in the token stream.

peek

public E peek()
Description copied from interface: Tokenizer
Returns the next token, without removing it, from the Tokenizer, so that the same token will be again returned on the next call to next() or peek().

Specified by:
peek in interface Tokenizer<E>
Returns:
the next token in the token stream.

tokenize

public java.util.List<E> tokenize()
Returns pieces of text in element as a List of tokens.

Specified by:
tokenize in interface Tokenizer<E>
Returns:
A list of all tokens remaining in the underlying Reader

getFactory

public static IteratorFromReaderFactory<java.lang.String> getFactory(java.lang.String tag)
Returns a factory that vends BeginEndIterators that reads the contents of the given Reader, extracts text between the specified Strings, then returns the result.

Parameters:
tag - The tag the XMLBeginEndIterator will match on
Returns:
The IteratorFromReaderFactory

getFactory

public static IteratorFromReaderFactory<java.lang.String> getFactory(java.lang.String tag,
                                                                     boolean keepInternalTags,
                                                                     boolean keepDelimitingTags)

getFactory

public static <E> IteratorFromReaderFactory<E> getFactory(java.lang.String tag,
                                                          Function<java.lang.String,E> op)

getFactory

public static <E> IteratorFromReaderFactory<E> getFactory(java.lang.String tag,
                                                          Function<java.lang.String,E> op,
                                                          boolean keepInternalTags,
                                                          boolean keepDelimitingTags)

main

public static void main(java.lang.String[] args)
                 throws java.io.IOException
Throws:
java.io.IOException


Stanford NLP Group