edu.stanford.nlp.objectbank
Class ObjectBank<E>

java.lang.Object
  extended by edu.stanford.nlp.objectbank.ObjectBank<E>
All Implemented Interfaces:
java.io.Serializable, java.lang.Iterable<E>, java.util.Collection<E>

public class ObjectBank<E>
extends java.lang.Object
implements java.util.Collection<E>, java.io.Serializable

The ObjectBank class is designed to make it easy to change the format/source of data read in by other classes and to standardize how data is read in javaNLP classes. This should make reuse of existing code (by non-authors of the code) easier because one has to just create a new ObjectBank which knows where to look for the data and how to turn it into Objects, and then use the new ObjectBank in the class. This will also make it easier to reuse code for reading in the same data.

An ObjectBank is a Collection of Objects. These objects are taken from input sources and then tokenized and parsed into the desired kind of Object. An ObjectBank requires a ReaderIteratorFactory and a IteratorFromReaderFactory. The ReaderIteratorFactory is used to get an Iterator over java.util.Readers which contain representations of the Objects. A ReaderIteratorFactory resembles a collection that takes input sources and dispenses Iterators over java.util.Readers of those sources. A IteratorFromReaderFactory is used to turn a single java.io.Reader into an Iterator over Objects. The IteratorFromReaderFactory splits the contents of the java.util.Reader into Strings and then parses them into appropriate Objects.

Example Usage:

You have a collection of files in the directory /u/nlp/data/gre/questions. Each file contains several Puzzle documents which look like:

 <puzzle>
    <preamble> some text </preamble>
    <question> some intro text
      <answer> answer1 </answer>
      <answer> answer2 </answer>
      <answer> answer3 </answer>
      <answer> answer4 </answer>
    </question>
    <question> another question
      <answer> answer1 </answer>
      <answer> answer2 </answer>
      <answer> answer3 </answer>
      <answer> answer4 </answer>
    </question>
 </puzzle>
 

First you need to build a ReaderIteratorFactory which will provide java.io.Readers over all the files in your directory:

 Collection c = new FileSequentialCollection("/u/nlp/data/gre/questions/", "", false);
 ReaderIteratorFactory rif = new ReaderIteratorFactory(c);
 

Next you need to make an IteratorFromReaderFactory which will take the java.io.Readers vended by the ReaderIteratorFactory, split them up into documents (Strings) and then convert the Strings into Objects. In this case we want to keep everything between each set of tags so we would use a BeginEndTokenizerFactory. You would also need to write a class which extends Function and whose apply method converts the String between the tags into Puzzle objects.

 public class PuzzleParser implements Function {
   public Object apply (Object o) {
     String s = (String)o;
     ...
     Puzzle p = new Puzzle(...);
     ...
     return p;
   }
 }
 

Now to build the IteratorFromReaderFactory:

 IteratorFromReaderFactory rtif = new BeginEndTokenizerFactory("", "", new PuzzleParser());
 

Now, to create your ObjectBank you just give it the ReaderIteratorFactory and IteratorFromReaderFactory that you just created:

 ObjectBank puzzles = new ObjectBank(rif, rtif);
 

Now, if you get a new set of puzzles that are located elsewhere and formatted differently you create a new ObjectBank for reading them in and use that ObjectBank instead with only trivial changes (or possible none at all if the ObjectBank is read in on a constructor) to your code. Or even better, if someone else wants to use your code to evaluate their puzzles, which are located elsewhere and formatted differently, they already know what they have to do to make your code work for them.

Author:
Jenny Finkel Serialized Form

Field Summary
protected  IteratorFromReaderFactory<E> ifrf
           
protected  ReaderIteratorFactory rif
           
 
Constructor Summary
ObjectBank(ReaderIteratorFactory rif, IteratorFromReaderFactory<E> ifrf)
          This creates a new ObjectBank with the given ReaderIteratorFactory and ObjectIteratorFactory.
 
Method Summary
 boolean add(E o)
          Unsupported Operation.
 boolean addAll(java.util.Collection<? extends E> c)
          Unsupported Operation.
 void clear()
           
 void clearMemory()
          If you are keeping the contents in memory, this will clear hte memory, and they will be recomputed the next time iterator() is called.
 boolean contains(java.lang.Object o)
          Can be slow.
 boolean containsAll(java.util.Collection<?> c)
          Can be slow.
static
<X> ObjectBank<X>
getLineIterator(java.util.Collection<?> filesStringsAndReaders, Function<java.lang.String,X> op)
           
static
<X> ObjectBank<X>
getLineIterator(java.util.Collection<?> filesStringsAndReaders, Function<java.lang.String,X> op, java.lang.String encoding)
           
static ObjectBank<java.lang.String> getLineIterator(java.io.File file)
           
static
<X> ObjectBank<X>
getLineIterator(java.io.File file, Function<java.lang.String,X> op)
           
static
<X> ObjectBank<X>
getLineIterator(java.io.File file, Function<java.lang.String,X> op, java.lang.String encoding)
           
static ObjectBank<java.lang.String> getLineIterator(java.io.File file, java.lang.String encoding)
           
static ObjectBank<java.lang.String> getLineIterator(java.io.Reader reader)
           
static
<X> ObjectBank<X>
getLineIterator(java.io.Reader reader, Function<java.lang.String,X> op)
           
static ObjectBank<java.lang.String> getLineIterator(java.lang.String filename)
           
static
<X> ObjectBank<X>
getLineIterator(java.lang.String filename, Function<java.lang.String,X> op)
           
 boolean isEmpty()
           
 java.util.Iterator<E> iterator()
           
 void keepInMemory(boolean keep)
          Tells the ObjectBank to store all of its contents in memory so that it doesn't have to be recomputed each time you iterate through it.
 boolean remove(java.lang.Object o)
          Unsupported Operation.
 boolean removeAll(java.util.Collection<?> c)
          Unsupported Operation.
 boolean retainAll(java.util.Collection<?> c)
          Unsupported Operation.
 int size()
          Can be slow.
 java.lang.Object[] toArray()
          Can be slow.
<T> T[]
toArray(T[] o)
          Can be slow.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface java.util.Collection
equals, hashCode
 

Field Detail

rif

protected ReaderIteratorFactory rif

ifrf

protected IteratorFromReaderFactory<E> ifrf
Constructor Detail

ObjectBank

public ObjectBank(ReaderIteratorFactory rif,
                  IteratorFromReaderFactory<E> ifrf)
This creates a new ObjectBank with the given ReaderIteratorFactory and ObjectIteratorFactory.

Parameters:
rif - The ReaderIteratorFactory from which to get Readers
ifrf - The IteratorFromReaderFactory which turns java.io.Readers into Iterators of Objects
Method Detail

getLineIterator

public static ObjectBank<java.lang.String> getLineIterator(java.lang.String filename)

getLineIterator

public static <X> ObjectBank<X> getLineIterator(java.lang.String filename,
                                                Function<java.lang.String,X> op)

getLineIterator

public static ObjectBank<java.lang.String> getLineIterator(java.io.Reader reader)

getLineIterator

public static <X> ObjectBank<X> getLineIterator(java.io.Reader reader,
                                                Function<java.lang.String,X> op)

getLineIterator

public static ObjectBank<java.lang.String> getLineIterator(java.io.File file)

getLineIterator

public static <X> ObjectBank<X> getLineIterator(java.io.File file,
                                                Function<java.lang.String,X> op)

getLineIterator

public static ObjectBank<java.lang.String> getLineIterator(java.io.File file,
                                                           java.lang.String encoding)

getLineIterator

public static <X> ObjectBank<X> getLineIterator(java.io.File file,
                                                Function<java.lang.String,X> op,
                                                java.lang.String encoding)

getLineIterator

public static <X> ObjectBank<X> getLineIterator(java.util.Collection<?> filesStringsAndReaders,
                                                Function<java.lang.String,X> op)

getLineIterator

public static <X> ObjectBank<X> getLineIterator(java.util.Collection<?> filesStringsAndReaders,
                                                Function<java.lang.String,X> op,
                                                java.lang.String encoding)

iterator

public java.util.Iterator<E> iterator()
Specified by:
iterator in interface java.lang.Iterable<E>
Specified by:
iterator in interface java.util.Collection<E>

keepInMemory

public void keepInMemory(boolean keep)
Tells the ObjectBank to store all of its contents in memory so that it doesn't have to be recomputed each time you iterate through it. This is useful when the data is small enough that it can be kept in memory, but reading/processing it is expensive/slow. Defaults to false.

Parameters:
keep - Whether to keep contents in memory

clearMemory

public void clearMemory()
If you are keeping the contents in memory, this will clear hte memory, and they will be recomputed the next time iterator() is called.


isEmpty

public boolean isEmpty()
Specified by:
isEmpty in interface java.util.Collection<E>

contains

public boolean contains(java.lang.Object o)
Can be slow. Usage not recommended.

Specified by:
contains in interface java.util.Collection<E>

containsAll

public boolean containsAll(java.util.Collection<?> c)
Can be slow. Usage not recommended.

Specified by:
containsAll in interface java.util.Collection<E>

size

public int size()
Can be slow. Usage not recommended.

Specified by:
size in interface java.util.Collection<E>

clear

public void clear()
Specified by:
clear in interface java.util.Collection<E>

toArray

public java.lang.Object[] toArray()
Can be slow. Usage not recommended.

Specified by:
toArray in interface java.util.Collection<E>

toArray

public <T> T[] toArray(T[] o)
Can be slow. Usage not recommended.

Specified by:
toArray in interface java.util.Collection<E>

add

public boolean add(E o)
Unsupported Operation. If you wish to add a new data source, do so in the underlying ReaderIteratorFactory

Specified by:
add in interface java.util.Collection<E>

remove

public boolean remove(java.lang.Object o)
Unsupported Operation. If you wish to remove a data source, do so in the underlying ReaderIteratorFactory

Specified by:
remove in interface java.util.Collection<E>

addAll

public boolean addAll(java.util.Collection<? extends E> c)
Unsupported Operation. If you wish to add new data sources, do so in the underlying ReaderIteratorFactory

Specified by:
addAll in interface java.util.Collection<E>

removeAll

public boolean removeAll(java.util.Collection<?> c)
Unsupported Operation. If you wish to remove data sources, remove, do so in the underlying ReaderIteratorFactory

Specified by:
removeAll in interface java.util.Collection<E>

retainAll

public boolean retainAll(java.util.Collection<?> c)
Unsupported Operation. If you wish to retain only certian data sources, do so in the underlying ReaderIteratorFactory

Specified by:
retainAll in interface java.util.Collection<E>


Stanford NLP Group