edu.stanford.nlp.objectbank
Class ObjectBank<E>

java.lang.Object
  extended by edu.stanford.nlp.objectbank.ObjectBank<E>
All Implemented Interfaces:
Serializable, Iterable<E>, Collection<E>
Direct Known Subclasses:
ObjectBankWrapper

public class ObjectBank<E>
extends Object
implements Collection<E>, Serializable

The ObjectBank class is designed to make it easy to change the format/source of data read in by other classes and to standardize how data is read in javaNLP classes. This should make reuse of existing code (by non-authors of the code) easier because one has to just create a new ObjectBank which knows where to look for the data and how to turn it into Objects, and then use the new ObjectBank in the class. This will also make it easier to reuse code for reading in the same data.

An ObjectBank is a Collection of Objects. These objects are taken from input sources and then tokenized and parsed into the desired kind of Object. An ObjectBank requires a ReaderIteratorFactory and a IteratorFromReaderFactory. The ReaderIteratorFactory is used to get an Iterator over java.util.Readers which contain representations of the Objects. A ReaderIteratorFactory resembles a collection that takes input sources and dispenses Iterators over java.util.Readers of those sources. A IteratorFromReaderFactory is used to turn a single java.io.Reader into an Iterator over Objects. The IteratorFromReaderFactory splits the contents of the java.util.Reader into Strings and then parses them into appropriate Objects.

Example Usage:

You have a collection of files in the directory /u/nlp/data/gre/questions. Each file contains several Puzzle documents which look like:

 <puzzle>
    <preamble> some text </preamble>
    <question> some intro text
      <answer> answer1 </answer>
      <answer> answer2 </answer>
      <answer> answer3 </answer>
      <answer> answer4 </answer>
    </question>
    <question> another question
      <answer> answer1 </answer>
      <answer> answer2 </answer>
      <answer> answer3 </answer>
      <answer> answer4 </answer>
    </question>
 </puzzle>
 

First you need to build a ReaderIteratorFactory which will provide java.io.Readers over all the files in your directory:

 Collection c = new FileSequentialCollection("/u/nlp/data/gre/questions/", "", false);
 ReaderIteratorFactory rif = new ReaderIteratorFactory(c);
 

Next you need to make an IteratorFromReaderFactory which will take the java.io.Readers vended by the ReaderIteratorFactory, split them up into documents (Strings) and then convert the Strings into Objects. In this case we want to keep everything between each set of tags so we would use a BeginEndTokenizerFactory. You would also need to write a class which extends Function and whose apply method converts the String between the tags into Puzzle objects.

 public class PuzzleParser implements Function {
   public Object apply (Object o) {
     String s = (String)o;
     ...
     Puzzle p = new Puzzle(...);
     ...
     return p;
   }
 }
 

Now to build the IteratorFromReaderFactory:

 IteratorFromReaderFactory rtif = new BeginEndTokenizerFactory("", "", new PuzzleParser());
 

Now, to create your ObjectBank you just give it the ReaderIteratorFactory and IteratorFromReaderFactory that you just created:

 ObjectBank puzzles = new ObjectBank(rif, rtif);
 

Now, if you get a new set of puzzles that are located elsewhere and formatted differently you create a new ObjectBank for reading them in and use that ObjectBank instead with only trivial changes (or possible none at all if the ObjectBank is read in on a constructor) to your code. Or even better, if someone else wants to use your code to evaluate their puzzles, which are located elsewhere and formatted differently, they already know what they have to do to make your code work for them.

ToDO: There's still tricky generic stuff to get right here: toArray should take an arg of a different generic type if we follow the Collections API, and the OBIterator doesn't seem to do the generic typing right. Should it rather be F extends E ? [cdm notes, sep 2007]

Author:
Jenny Finkel Serialized Form

Field Summary
protected  IteratorFromReaderFactory ifrf
           
protected  ReaderIteratorFactory rif
           
 
Constructor Summary
ObjectBank(ReaderIteratorFactory rif, IteratorFromReaderFactory<E> ifrf)
          This creates a new ObjectBank with the given ReaderIteratorFactory and ObjectIteratorFactory.
 
Method Summary
 boolean add(E o)
          Unsupported Operation.
 boolean addAll(Collection<? extends E> c)
          Unsupported Operation.
 void clear()
           
 void clearMemory()
          If you are keeping the contents in memory, this will clear hte memory, and they will be recomputed the next time iterator() is called.
 boolean contains(Object o)
          Can be slow.
 boolean containsAll(Collection<?> c)
          Can be slow.
static
<X> ObjectBank<X>
getLineIteratorObjectBank(Collection files, Function<String,X> op)
           
static
<X> ObjectBank<X>
getLineIteratorObjectBank(Collection files, Function<String,X> op, String encoding)
           
static ObjectBank<String> getLineIteratorObjectBank(String filename)
           
static
<X> ObjectBank<X>
getLineIteratorObjectBank(String fileOrString, Function<String,X> op)
           
static ObjectBank<String> getLineIteratorObjectBank(String filename, String encoding)
           
 boolean isEmpty()
           
 Iterator<E> iterator()
           
 void keepInMemory(boolean keep)
          Tells the ObjectBank to store all of its contents in memory so that it doesn't have to be recomputed each time you iterate through it.
 boolean remove(Object o)
          Unsupported Operation.
 boolean removeAll(Collection<?> c)
          Unsupported Operation.
 boolean retainAll(Collection<?> c)
          Unsupported Operation.
 int size()
          Can be slow.
 Object[] toArray()
          Can be slow.
<E> E[]
toArray(E[] o)
          Can be slow.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface java.util.Collection
equals, hashCode
 

Field Detail

rif

protected ReaderIteratorFactory rif

ifrf

protected IteratorFromReaderFactory ifrf
Constructor Detail

ObjectBank

public ObjectBank(ReaderIteratorFactory rif,
                  IteratorFromReaderFactory<E> ifrf)
This creates a new ObjectBank with the given ReaderIteratorFactory and ObjectIteratorFactory.

Parameters:
rif - The ReaderIteratorFactory from which to get Readers
ifrf - The IteratorFromReaderFactory which turns java.io.Readers into Iterators of Objects
Method Detail

getLineIteratorObjectBank

public static <X> ObjectBank<X> getLineIteratorObjectBank(String fileOrString,
                                                          Function<String,X> op)

getLineIteratorObjectBank

public static <X> ObjectBank<X> getLineIteratorObjectBank(Collection files,
                                                          Function<String,X> op)

getLineIteratorObjectBank

public static <X> ObjectBank<X> getLineIteratorObjectBank(Collection files,
                                                          Function<String,X> op,
                                                          String encoding)

getLineIteratorObjectBank

public static ObjectBank<String> getLineIteratorObjectBank(String filename,
                                                           String encoding)

getLineIteratorObjectBank

public static ObjectBank<String> getLineIteratorObjectBank(String filename)

iterator

public Iterator<E> iterator()
Specified by:
iterator in interface Iterable<E>
Specified by:
iterator in interface Collection<E>

keepInMemory

public void keepInMemory(boolean keep)
Tells the ObjectBank to store all of its contents in memory so that it doesn't have to be recomputed each time you iterate through it. This is useful when the data is small enough that it can be kept in memory, but reading/processing it is expensive/slow. Defaults to false.


clearMemory

public void clearMemory()
If you are keeping the contents in memory, this will clear hte memory, and they will be recomputed the next time iterator() is called.


isEmpty

public boolean isEmpty()
Specified by:
isEmpty in interface Collection<E>

contains

public boolean contains(Object o)
Can be slow. Usage not recommended.

Specified by:
contains in interface Collection<E>

containsAll

public boolean containsAll(Collection<?> c)
Can be slow. Usage not recommended.

Specified by:
containsAll in interface Collection<E>

size

public int size()
Can be slow. Usage not recommended.

Specified by:
size in interface Collection<E>

clear

public void clear()
Specified by:
clear in interface Collection<E>

toArray

public Object[] toArray()
Can be slow. Usage not recommended.

Specified by:
toArray in interface Collection<E>

toArray

public <E> E[] toArray(E[] o)
Can be slow. Usage not recommended.

Specified by:
toArray in interface Collection<E>

add

public boolean add(E o)
Unsupported Operation. If you wish to add a new data source, do so in the underlying ReaderIteratorFactory

Specified by:
add in interface Collection<E>

remove

public boolean remove(Object o)
Unsupported Operation. If you wish to remove a data source, do so in the underlying ReaderIteratorFactory

Specified by:
remove in interface Collection<E>

addAll

public boolean addAll(Collection<? extends E> c)
Unsupported Operation. If you wish to add new data sources, do so in the underlying ReaderIteratorFactory

Specified by:
addAll in interface Collection<E>

removeAll

public boolean removeAll(Collection<?> c)
Unsupported Operation. If you wish to remove data sources, remove, do so in the underlying ReaderIteratorFactory

Specified by:
removeAll in interface Collection<E>

retainAll

public boolean retainAll(Collection<?> c)
Unsupported Operation. If you wish to retain only certian data sources, do so in the underlying ReaderIteratorFactory

Specified by:
retainAll in interface Collection<E>


Stanford NLP Group