edu.stanford.nlp.objectbank
Class ObjectBank<E>

java.lang.Object
  extended by edu.stanford.nlp.objectbank.ObjectBank<E>
All Implemented Interfaces:
Serializable, Iterable<E>, Collection<E>

public class ObjectBank<E>
extends Object
implements Collection<E>, Serializable

The ObjectBank class is designed to make it easy to change the format/source of data read in by other classes and to standardize how data is read in javaNLP classes. This should make reuse of existing code (by non-authors of the code) easier because one has to just create a new ObjectBank which knows where to look for the data and how to turn it into Objects, and then use the new ObjectBank in the class. This will also make it easier to reuse code for reading in the same data.

An ObjectBank is a Collection of Objects. These objects are taken from input sources and then tokenized and parsed into the desired kind of Object. An ObjectBank requires a ReaderIteratorFactory and a IteratorFromReaderFactory. The ReaderIteratorFactory is used to get an Iterator over java.util.Readers which contain representations of the Objects. A ReaderIteratorFactory resembles a collection that takes input sources and dispenses Iterators over java.util.Readers of those sources. A IteratorFromReaderFactory is used to turn a single java.io.Reader into an Iterator over Objects. The IteratorFromReaderFactory splits the contents of the java.util.Reader into Strings and then parses them into appropriate Objects.

Example Usages:

The general case is covered below, but the most common thing people actually want to do is read lines from a file. There are special methods to make this easy! You use the getLineIterator method. In its simplest use, it returns an ObjectBank<String>, which is a subclass of Collection<String>. So, statements like these work:
for (String str : ObjectBank.getLineIterator(filename) {
System.out.println(str);
}

String[] strings = ObjectBank.getLineIterator(filename).toArray(new String[0]);

String[] strings = ObjectBank.getLineIterator(filename, "GB18030").toArray(new String[0]);
More complex uses of getLineIterator let you interpret each line of a file as an object of arbitrary type via a transformer Function.

As an example of the general power of this class, suppose you have a collection of files in the directory /u/nlp/data/gre/questions. Each file contains several Puzzle documents which look like:

 <puzzle>
    <preamble> some text </preamble>
    <question> some intro text
      <answer> answer1 </answer>
      <answer> answer2 </answer>
      <answer> answer3 </answer>
      <answer> answer4 </answer>
    </question>
    <question> another question
      <answer> answer1 </answer>
      <answer> answer2 </answer>
      <answer> answer3 </answer>
      <answer> answer4 </answer>
    </question>
 </puzzle>
 

First you need to build a ReaderIteratorFactory which will provide java.io.Readers over all the files in your directory:

 Collection c = new FileSequentialCollection("/u/nlp/data/gre/questions/", "", false);
 ReaderIteratorFactory rif = new ReaderIteratorFactory(c);
 

Next you need to make an IteratorFromReaderFactory which will take the java.io.Readers vended by the ReaderIteratorFactory, split them up into documents (Strings) and then convert the Strings into Objects. In this case we want to keep everything between each set of tags so we would use a BeginEndTokenizerFactory. You would also need to write a class which extends Function and whose apply method converts the String between the tags into Puzzle objects.

 public class PuzzleParser implements Function {
   public Object apply (Object o) {
     String s = (String)o;
     ...
     Puzzle p = new Puzzle(...);
     ...
     return p;
   }
 }
 

Now to build the IteratorFromReaderFactory:

 IteratorFromReaderFactory rtif = new BeginEndTokenizerFactory("", "", new PuzzleParser());
 

Now, to create your ObjectBank you just give it the ReaderIteratorFactory and IteratorFromReaderFactory that you just created:

 ObjectBank puzzles = new ObjectBank(rif, rtif);
 

Now, if you get a new set of puzzles that are located elsewhere and formatted differently you create a new ObjectBank for reading them in and use that ObjectBank instead with only trivial changes (or possible none at all if the ObjectBank is read in on a constructor) to your code. Or even better, if someone else wants to use your code to evaluate their puzzles, which are located elsewhere and formatted differently, they already know what they have to do to make your code work for them.

Author:
Jenny Finkel Serialized Form

Nested Class Summary
static class ObjectBank.PathToFileFunction
          This is handy for having getLineIterator return a collection of files for feeding into another ObjectBank.
 
Field Summary
protected  IteratorFromReaderFactory<E> ifrf
           
protected  ReaderIteratorFactory rif
           
 
Constructor Summary
ObjectBank(ReaderIteratorFactory rif, IteratorFromReaderFactory<E> ifrf)
          This creates a new ObjectBank with the given ReaderIteratorFactory and ObjectIteratorFactory.
 
Method Summary
 boolean add(E o)
          Unsupported Operation.
 boolean addAll(Collection<? extends E> c)
          Unsupported Operation.
 void clear()
           
 void clearMemory()
          If you are keeping the contents in memory, this will clear hte memory, and they will be recomputed the next time iterator() is called.
 boolean contains(Object o)
          Can be slow.
 boolean containsAll(Collection<?> c)
          Can be slow.
static
<X> ObjectBank<X>
getLineIterator(Collection<?> filesStringsAndReaders, Function<String,X> op)
           
static
<X> ObjectBank<X>
getLineIterator(Collection<?> filesStringsAndReaders, Function<String,X> op, String encoding)
           
static ObjectBank<String> getLineIterator(Collection<?> filesStringsAndReaders, String encoding)
           
static ObjectBank<String> getLineIterator(File file)
           
static
<X> ObjectBank<X>
getLineIterator(File file, Function<String,X> op)
           
static
<X> ObjectBank<X>
getLineIterator(File file, Function<String,X> op, String encoding)
           
static ObjectBank<String> getLineIterator(File file, String encoding)
           
static ObjectBank<String> getLineIterator(Reader reader)
           
static
<X> ObjectBank<X>
getLineIterator(Reader reader, Function<String,X> op)
           
static ObjectBank<String> getLineIterator(String filename)
           
static
<X> ObjectBank<X>
getLineIterator(String filename, Function<String,X> op)
           
static ObjectBank<String> getLineIterator(String filename, String encoding)
           
 boolean isEmpty()
           
 Iterator<E> iterator()
           
 void keepInMemory(boolean keep)
          Tells the ObjectBank to store all of its contents in memory so that it doesn't have to be recomputed each time you iterate through it.
 boolean remove(Object o)
          Unsupported Operation.
 boolean removeAll(Collection<?> c)
          Unsupported Operation.
 boolean retainAll(Collection<?> c)
          Unsupported Operation.
 int size()
          Can be slow.
 Object[] toArray()
          Can be slow.
<T> T[]
toArray(T[] o)
          Can be slow.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface java.util.Collection
equals, hashCode
 

Field Detail

rif

protected ReaderIteratorFactory rif

ifrf

protected IteratorFromReaderFactory<E> ifrf
Constructor Detail

ObjectBank

public ObjectBank(ReaderIteratorFactory rif,
                  IteratorFromReaderFactory<E> ifrf)
This creates a new ObjectBank with the given ReaderIteratorFactory and ObjectIteratorFactory.

Parameters:
rif - The ReaderIteratorFactory from which to get Readers
ifrf - The IteratorFromReaderFactory which turns java.io.Readers into Iterators of Objects
Method Detail

getLineIterator

public static ObjectBank<String> getLineIterator(String filename)

getLineIterator

public static <X> ObjectBank<X> getLineIterator(String filename,
                                                Function<String,X> op)

getLineIterator

public static ObjectBank<String> getLineIterator(String filename,
                                                 String encoding)

getLineIterator

public static ObjectBank<String> getLineIterator(Reader reader)

getLineIterator

public static <X> ObjectBank<X> getLineIterator(Reader reader,
                                                Function<String,X> op)

getLineIterator

public static ObjectBank<String> getLineIterator(File file)

getLineIterator

public static <X> ObjectBank<X> getLineIterator(File file,
                                                Function<String,X> op)

getLineIterator

public static ObjectBank<String> getLineIterator(File file,
                                                 String encoding)

getLineIterator

public static <X> ObjectBank<X> getLineIterator(File file,
                                                Function<String,X> op,
                                                String encoding)

getLineIterator

public static <X> ObjectBank<X> getLineIterator(Collection<?> filesStringsAndReaders,
                                                Function<String,X> op)

getLineIterator

public static ObjectBank<String> getLineIterator(Collection<?> filesStringsAndReaders,
                                                 String encoding)

getLineIterator

public static <X> ObjectBank<X> getLineIterator(Collection<?> filesStringsAndReaders,
                                                Function<String,X> op,
                                                String encoding)

iterator

public Iterator<E> iterator()
Specified by:
iterator in interface Iterable<E>
Specified by:
iterator in interface Collection<E>

keepInMemory

public void keepInMemory(boolean keep)
Tells the ObjectBank to store all of its contents in memory so that it doesn't have to be recomputed each time you iterate through it. This is useful when the data is small enough that it can be kept in memory, but reading/processing it is expensive/slow. Defaults to false.

Parameters:
keep - Whether to keep contents in memory

clearMemory

public void clearMemory()
If you are keeping the contents in memory, this will clear hte memory, and they will be recomputed the next time iterator() is called.


isEmpty

public boolean isEmpty()
Specified by:
isEmpty in interface Collection<E>

contains

public boolean contains(Object o)
Can be slow. Usage not recommended.

Specified by:
contains in interface Collection<E>

containsAll

public boolean containsAll(Collection<?> c)
Can be slow. Usage not recommended.

Specified by:
containsAll in interface Collection<E>

size

public int size()
Can be slow. Usage not recommended.

Specified by:
size in interface Collection<E>

clear

public void clear()
Specified by:
clear in interface Collection<E>

toArray

public Object[] toArray()
Can be slow. Usage not recommended.

Specified by:
toArray in interface Collection<E>

toArray

public <T> T[] toArray(T[] o)
Can be slow. Usage not recommended.

Specified by:
toArray in interface Collection<E>

add

public boolean add(E o)
Unsupported Operation. If you wish to add a new data source, do so in the underlying ReaderIteratorFactory

Specified by:
add in interface Collection<E>

remove

public boolean remove(Object o)
Unsupported Operation. If you wish to remove a data source, do so in the underlying ReaderIteratorFactory

Specified by:
remove in interface Collection<E>

addAll

public boolean addAll(Collection<? extends E> c)
Unsupported Operation. If you wish to add new data sources, do so in the underlying ReaderIteratorFactory

Specified by:
addAll in interface Collection<E>

removeAll

public boolean removeAll(Collection<?> c)
Unsupported Operation. If you wish to remove data sources, remove, do so in the underlying ReaderIteratorFactory.

Specified by:
removeAll in interface Collection<E>

retainAll

public boolean retainAll(Collection<?> c)
Unsupported Operation. If you wish to retain only certain data sources, do so in the underlying ReaderIteratorFactory.

Specified by:
retainAll in interface Collection<E>


Stanford NLP Group