|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectedu.stanford.nlp.objectbank.ObjectBank<E>
public class ObjectBank<E>
The ObjectBank class is designed to make it easy to change the format/source of data read in by other classes and to standardize how data is read in javaNLP classes. This should make reuse of existing code (by non-authors of the code) easier because one has to just create a new ObjectBank which knows where to look for the data and how to turn it into Objects, and then use the new ObjectBank in the class. This will also make it easier to reuse code for reading in the same data.
An ObjectBank is a Collection of Objects. These objects are taken from input sources and then tokenized and parsed into the desired kind of Object. An ObjectBank requires a ReaderIteratorFactory and a IteratorFromReaderFactory. The ReaderIteratorFactory is used to get an Iterator over java.util.Readers which contain representations of the Objects. A ReaderIteratorFactory resembles a collection that takes input sources and dispenses Iterators over java.util.Readers of those sources. A IteratorFromReaderFactory is used to turn a single java.io.Reader into an Iterator over Objects. The IteratorFromReaderFactory splits the contents of the java.util.Reader into Strings and then parses them into appropriate Objects.getLineIterator method.
In its simplest use, it returns an ObjectBank<String>, which is a subclass of
Collection<String>. So, statements like these work:
for (String str : ObjectBank.getLineIterator(filename) {
System.out.println(str);
}
String[] strings = ObjectBank.getLineIterator(filename).toArray(new String[0]);
String[] strings = ObjectBank.getLineIterator(filename, "GB18030").toArray(new String[0]);
More complex uses of getLineIterator let you interpret each line of a file
as an object of arbitrary type via a transformer Function.
As an example of the general power of this class, suppose you have
a collection of files in the directory /u/nlp/data/gre/questions. Each file
contains several Puzzle documents which look like:
<puzzle>
<preamble> some text </preamble>
<question> some intro text
<answer> answer1 </answer>
<answer> answer2 </answer>
<answer> answer3 </answer>
<answer> answer4 </answer>
</question>
<question> another question
<answer> answer1 </answer>
<answer> answer2 </answer>
<answer> answer3 </answer>
<answer> answer4 </answer>
</question>
</puzzle>
First you need to build a ReaderIteratorFactory which will provide java.io.Readers
over all the files in your directory:
Collection c = new FileSequentialCollection("/u/nlp/data/gre/questions/", "", false);
ReaderIteratorFactory rif = new ReaderIteratorFactory(c);
Next you need to make an IteratorFromReaderFactory which will take the
java.io.Readers vended by the ReaderIteratorFactory, split them up into
documents (Strings) and
then convert the Strings into Objects. In this case we want to keep everything
between each set of
public class PuzzleParser implements Function {
public Object apply (Object o) {
String s = (String)o;
...
Puzzle p = new Puzzle(...);
...
return p;
}
}
Now to build the IteratorFromReaderFactory:
IteratorFromReaderFactory rtif = new BeginEndTokenizerFactory("", " ", new PuzzleParser());
Now, to create your ObjectBank you just give it the ReaderIteratorFactory and
IteratorFromReaderFactory that you just created:
ObjectBank puzzles = new ObjectBank(rif, rtif);Now, if you get a new set of puzzles that are located elsewhere and formatted differently you create a new ObjectBank for reading them in and use that ObjectBank instead with only trivial changes (or possible none at all if the ObjectBank is read in on a constructor) to your code. Or even better, if someone else wants to use your code to evaluate their puzzles, which are located elsewhere and formatted differently, they already know what they have to do to make your code work for them.
| Nested Class Summary | |
|---|---|
static class |
ObjectBank.PathToFileFunction
This is handy for having getLineIterator return a collection of files for feeding into another ObjectBank. |
| Field Summary | |
|---|---|
protected IteratorFromReaderFactory<E> |
ifrf
|
protected ReaderIteratorFactory |
rif
|
| Constructor Summary | |
|---|---|
ObjectBank(ReaderIteratorFactory rif,
IteratorFromReaderFactory<E> ifrf)
This creates a new ObjectBank with the given ReaderIteratorFactory and ObjectIteratorFactory. |
|
| Method Summary | ||
|---|---|---|
boolean |
add(E o)
Unsupported Operation. |
|
boolean |
addAll(java.util.Collection<? extends E> c)
Unsupported Operation. |
|
void |
clear()
|
|
void |
clearMemory()
If you are keeping the contents in memory, this will clear hte memory, and they will be recomputed the next time iterator() is called. |
|
boolean |
contains(java.lang.Object o)
Can be slow. |
|
boolean |
containsAll(java.util.Collection<?> c)
Can be slow. |
|
static
|
getLineIterator(java.util.Collection<?> filesStringsAndReaders,
Function<java.lang.String,X> op)
|
|
static
|
getLineIterator(java.util.Collection<?> filesStringsAndReaders,
Function<java.lang.String,X> op,
java.lang.String encoding)
|
|
static ObjectBank<java.lang.String> |
getLineIterator(java.util.Collection<?> filesStringsAndReaders,
java.lang.String encoding)
|
|
static ObjectBank<java.lang.String> |
getLineIterator(java.io.File file)
|
|
static
|
getLineIterator(java.io.File file,
Function<java.lang.String,X> op)
|
|
static
|
getLineIterator(java.io.File file,
Function<java.lang.String,X> op,
java.lang.String encoding)
|
|
static ObjectBank<java.lang.String> |
getLineIterator(java.io.File file,
java.lang.String encoding)
|
|
static ObjectBank<java.lang.String> |
getLineIterator(java.io.Reader reader)
|
|
static
|
getLineIterator(java.io.Reader reader,
Function<java.lang.String,X> op)
|
|
static ObjectBank<java.lang.String> |
getLineIterator(java.lang.String filename)
|
|
static
|
getLineIterator(java.lang.String filename,
Function<java.lang.String,X> op)
|
|
static ObjectBank<java.lang.String> |
getLineIterator(java.lang.String filename,
java.lang.String encoding)
|
|
boolean |
isEmpty()
|
|
java.util.Iterator<E> |
iterator()
|
|
void |
keepInMemory(boolean keep)
Tells the ObjectBank to store all of its contents in memory so that it doesn't have to be recomputed each time you iterate through it. |
|
boolean |
remove(java.lang.Object o)
Unsupported Operation. |
|
boolean |
removeAll(java.util.Collection<?> c)
Unsupported Operation. |
|
boolean |
retainAll(java.util.Collection<?> c)
Unsupported Operation. |
|
int |
size()
Can be slow. |
|
java.lang.Object[] |
toArray()
|
|
|
toArray(T[] o)
Can be slow. |
|
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Methods inherited from interface java.util.Collection |
|---|
equals, hashCode |
| Field Detail |
|---|
protected ReaderIteratorFactory rif
protected IteratorFromReaderFactory<E> ifrf
| Constructor Detail |
|---|
public ObjectBank(ReaderIteratorFactory rif,
IteratorFromReaderFactory<E> ifrf)
rif - The ReaderIteratorFactory from which to get Readersifrf - The IteratorFromReaderFactory which turns java.io.Readers
into Iterators of Objects| Method Detail |
|---|
public static ObjectBank<java.lang.String> getLineIterator(java.lang.String filename)
public static <X> ObjectBank<X> getLineIterator(java.lang.String filename,
Function<java.lang.String,X> op)
public static ObjectBank<java.lang.String> getLineIterator(java.lang.String filename,
java.lang.String encoding)
public static ObjectBank<java.lang.String> getLineIterator(java.io.Reader reader)
public static <X> ObjectBank<X> getLineIterator(java.io.Reader reader,
Function<java.lang.String,X> op)
public static ObjectBank<java.lang.String> getLineIterator(java.io.File file)
public static <X> ObjectBank<X> getLineIterator(java.io.File file,
Function<java.lang.String,X> op)
public static ObjectBank<java.lang.String> getLineIterator(java.io.File file,
java.lang.String encoding)
public static <X> ObjectBank<X> getLineIterator(java.io.File file,
Function<java.lang.String,X> op,
java.lang.String encoding)
public static <X> ObjectBank<X> getLineIterator(java.util.Collection<?> filesStringsAndReaders,
Function<java.lang.String,X> op)
public static ObjectBank<java.lang.String> getLineIterator(java.util.Collection<?> filesStringsAndReaders,
java.lang.String encoding)
public static <X> ObjectBank<X> getLineIterator(java.util.Collection<?> filesStringsAndReaders,
Function<java.lang.String,X> op,
java.lang.String encoding)
public java.util.Iterator<E> iterator()
iterator in interface java.lang.Iterable<E>iterator in interface java.util.Collection<E>public void keepInMemory(boolean keep)
keep - Whether to keep contents in memorypublic void clearMemory()
public boolean isEmpty()
isEmpty in interface java.util.Collection<E>public boolean contains(java.lang.Object o)
contains in interface java.util.Collection<E>public boolean containsAll(java.util.Collection<?> c)
containsAll in interface java.util.Collection<E>public int size()
size in interface java.util.Collection<E>public void clear()
clear in interface java.util.Collection<E>public java.lang.Object[] toArray()
toArray in interface java.util.Collection<E>public <T> T[] toArray(T[] o)
toArray in interface java.util.Collection<E>public boolean add(E o)
add in interface java.util.Collection<E>public boolean remove(java.lang.Object o)
remove in interface java.util.Collection<E>public boolean addAll(java.util.Collection<? extends E> c)
addAll in interface java.util.Collection<E>public boolean removeAll(java.util.Collection<?> c)
removeAll in interface java.util.Collection<E>public boolean retainAll(java.util.Collection<?> c)
retainAll in interface java.util.Collection<E>
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||