Package edu.stanford.nlp.util

A collection of useful general-purpose utility classes.

See:
          Description

Interface Summary
CoreMap Base type for all annotatable core objects.
Factory<T> A generified factory class which creates instances of a particular type.
FileProcessor Interface for a Visitor pattern for Files.
Filter<T> Filter is an interface for predicate objects which respond to the accept method.
Function<T1,T2> An interface for classes that act as a function transforming one object to another.
HasInterval<E extends Comparable<E>> HasInterval interface
Heap<E> Heap interface.
Index<E> Minimalist interface for implementations of Index.
PriorityQueue<E> A Set that also represents an ordering of its elements, and responds quickly to add(), changePriority(), removeFirst(), and getFirst() method calls.
Scored Scored: This is a simple interface that says that an object can answer requests for the score, or goodness of the object.
TypesafeMap<BASE> Type signature for a class that supports the basic operations required of a typesafe heterogeneous map.
TypesafeMap.Key<BASE,VALUE> Base type of keys for the map.
 

Class Summary
AbstractIterator<E> Iterator with remove() defined to throw an UnsupportedOperationException.
ArrayCoreMap Base implementation of CoreMap backed by Java Arrays.
ArrayHeap<E> Implements a heap as an ArrayList.
ArrayMap<K,V> Map backed by an Array.
ArrayUtils Static utility methods for operating on arrays.
Beam<T> Implements a finite beam, taking a comparator (default is ScoredComparator.ASCENDING_COMPARATOR, the MAX object according to the comparator is the one to be removed) and a beam size on construction (default is 100).
BinaryHeapPriorityQueue<E> PriorityQueue with explicit double priority values.
ByteStreamGobbler Stream Gobbler that read and write bytes (can be used to gobble byte based stdout from a process.exec into a file)
CollectionFactory<T> Factory for vending Collections.
CollectionFactory.ArrayListFactory<T>  
CollectionFactory.HashSetFactory<T>  
CollectionFactory.LinkedListFactory<T>  
CollectionFactory.SizedArrayListFactory<T>  
CollectionFactory.TreeSetFactory<T>  
CollectionUtils Collection of useful static methods for working with Collections.
CollectionValuedMap<K,V> Map from keys to Collections.
ConcatenationIterator<T> Iterator that represents the concatenation of two other iterators.
ConcurrentHashSet<E> A thin wrapper on a ConcurrentHashMap, turning it into a ConcurrentHashSet.
DeltaCollectionValuedMap<K,V> Implementation of CollectionValuedMap that appears to store an "original" map and changes to that map.
DeltaMap<K,V> A Map which wraps an original Map, and only stores the changes (deltas) from the original Map.
ErasureUtils Class to gather unsafe operations into one place.
FilePathProcessor The FilePathProcessor traverses a directory structure and applies the processFile method to files meeting some criterion.
FilteredIterator<T> Iterator that suppresses items in another iterator based on a filter function.
Filters Some simple implementations of the Filter interface.
FixedPrioritiesPriorityQueue<E> A priority queue based on a binary heap.
Generics A collection of utilities to make dealing with Java generics less painful and verbose.
HashableCoreMap An extension of ArrayCoreMap with an immutable set of key,value pairs that is used for equality and hashcode comparisons.
HashIndex<E> An Index is a collection that maps between an Object vocabulary and a contiguous non-negative integer index series beginning (inclusively) at 0.
IdentityHashSet<E> This class provides a IdentityHashMap-backed implementation of the Set interface.
Interner<T> For interning (canonicalizing) things.
Interval<E extends Comparable<E>> Represents a interval of a generic type E that is comparable.
IntPair  
IntQuadruple  
IntTriple  
IntTuple A tuple of int.
IntUni Just a single integer
MapFactory<K,V> A factory class for vending different sorts of Maps.
Maps Utilities for Maps, including inverting, composing, and support for list/set values.
MemoryMonitor Utilities for monitoring memory use, including peak memory use.
MemoryMonitor.PeakMemoryMonitor This class offers a simple way to track the peak memory used by a program.
MetaClass A meta class using Java's reflection library.
MetaClass.ClassFactory<T>  
MutableDouble A class for Double objects that you can change.
MutableInteger A class for Integer objects that you can change.
PaddedList<E> A PaddedList wraps another list, presenting an apparently infinite list by padding outside the real confines of the list with a default value.
Pair<T1,T2> Pair is a Class for holding mutable pairs of objects.
PropertiesUtils  
ReflectionLoading The goal of this class is to make it easier to load stuff by reflection.
ScoredComparator ScoredComparator allows one to compare Scored things.
ScoredObject<T> Wrapper class for holding a scored object
Sets Utilities for sets.
StreamGobbler Reads the output of a process started by Process.exec() Adapted from: http://www.velocityreviews.com/forums/t130884-process-runtimeexec-causes-subprocess-hang.html
StringUtils StringUtils is a class for random String things, including output formatting and command line argument parsing.
SystemUtils Useful methods for running shell commands, getting the process ID, checking memory usage, etc.
SystemUtils.ProcessOutputStream Helper class that acts as a output stream to a process
Timing A class for measuring how long things take.
Triple<T1,T2,T3> Class representing an ordered triple of objects, possibly typed.
XMLUtils Provides some utilities for dealing with XML files, both by properly parsing them and by using the methods of a desperate Perl hacker.
XMLUtils.XMLTag  
 

Enum Summary
Interval.RelType RelType gives the basic types of relations between two intervals
 

Exception Summary
HashableCoreMap.HashableCoreMapException An exception thrown when attempting to change the value associated with an (immutable) hash key in a HashableCoreMap.
MetaClass.ClassCreationException  
MetaClass.ConstructorNotFoundException  
ReflectionLoading.ReflectionLoadingException This class encapsulates all of the exceptions that can be thrown when loading something by reflection.
SystemUtils.ProcessException Runtime exception thrown by execute.
 

Package edu.stanford.nlp.util Description

A collection of useful general-purpose utility classes. Below is a selection of some of the most useful utility classes, along with a brief description and sample use. Consult the class comments for more details on any of these classes.

edu.stanford.nlp.util.Counter

Specialized Map for storing numeric counts for objects. Makes it easy to get/set/increment the count of an object and find the max/argmax. Also makes it easy to prune counts above/below a threshold and get the total count of all or a subset of the objects. Exposes a Comparator that can sort the keySet or entrySet by count.

Some useful methods: argmax, averageCount, comparator, incrementCount, keysAbove(threshold), max, normalize, totalCount

Example: generate a unigram language model for a Document with low counts (<3 counts) pruned:

Counter wordProbs = new Counter();
for(int i = 0; i < document.size(); i++) {
    wordProbs.incrementCount(document.get(i));

wordProbs.removeAll(wordProbs.keysBelow(3)); // prune low counts
wordProbs.normalize(); // convert to probability distribution

Example: find the Integer param that yields the best value of some computeScore method (that returns an int or double):

Counter paramScores = new Counter();
for(int param=0; param<10; param++) {
    paramScores.setCount(new Integer(param), computeScore(param));

Integer bestParam=(Integer)paramScores.argmax();

edu.stanford.nlp.util.Filter

Interface to accept or reject Objects. A Filter implements boolean accept(Object obj). This can represent any binary predicate, such as "lowercase Strings", "numbers above a threshold", "trees where a VP dominates an NP and PP", and so on. Particularly useful in conjunction with Filters, which contains some basic filters as well as a method for filtering an array of Objects or a Collection. Another example is Counter's totalCount(Filter), which returns the sum of all counts in the Counter whose keys pass the filter.

edu.stanford.nlp.util.Filters

Static class with some useful Filter implementations and utility methods for working with Filters. Contains Filters that always accept or reject, Filters that accept or reject an Object if it's in a given Collection, as well as several composite Filters. Contains methods for creating a new Filter that is the AND/OR of two Filters, or the NOT of a Filter. You can make a Filter that runs a given Appliable on all Objects before comparing them--this is useful when you have a collection of complex objects and you want to accept/reject based on one of their sub-objects or method values.  Finally, you can filter an Object[] through a Filter to return a new Object[] with only the accepeted values, or retainAll elements in a Collection that pass a Filter.

Some useful methods: andFilter(Filter, Filter), collectionAcceptFilter(Collection), filter(Object[], Filter)retainAll(Collection, Filter), transformedFilter(Filter, Appliable)

Example: Filter an array of Strings to retain only those with length less than 10:

Filter filter = new Filter() {
    public boolean accept(Object obj) {
        return (((String)obj).length < 10);
   

String[] shortStrings = (String[])Filters.filter(allStrings, filter);

edu.stanford.nlp.util.EntryValueComparator

Comparator for sorting Map keys and entries. If you use the empty Constructor, this Comparator will compare Map.Entry objects by comparing their values. If you pass a Map into the constructor, the Comparator can sort either the Map's keySet or entrySet. You can also pass an ascending flag to optionally reverse natural sorting order.

Sort a Map's keys by their values (descending order):

List keys = new ArrayList(map.keySet());
Collections.sort(keys, new EntryValueComparator(map, false));

Sort a Map's entries by their values (normal order):

List entries = new ArrayList(map.entrySet());
Collections.sort(entries, new EntryValueComparator());

edu.stanford.nlp.util.Index

List that also maintains a constant-time reverse-lookup of indices for its Objects. Often one uses a List to associate a unique index with each Object (e.g. controlled vocbulary, feature map, etc.). Index offers constant-time performance for both index -> Object (get) and Object -> index (indexOf) as well as for contains(Object). Otherwise it behaves like a normal list. Index also supports lock() and unlock() to ensure that it's only modified when desired. Another useful method is int[] indices(List elems), which maps each elem to its index.

Some useful methods: add(Object), contains(Object), get(index), indexOf(Object), lock()

edu.stanford.nlp.util.StringUtils

Static class with lots of useful String manipulation and formatting methods. Many of these methods will be familiar to perl users: join, split, trim, find, lookingAt, and matches. There are also useful methods for padding Strings/Objects with spaces on the right or left for printing even-width table columns: leftPad, pad. Finally, there are convenience methods for reading in all the text in a File or at a URL: slurpFile, slurpURL, as well as a method for making a "clean" filename from a String (where all spaces are turned into hyphens and non-alphanum chars become underscores): fileNameClean.

Example: print a comma-separated list of numbers:

System.out.println(StringUtils.pad(nums, ", "));

Example: print a 2D array of numbers with 8-char cells:

for(int i = 0; i < nums.length; i++) {
    for(int j = 0; j < nums[i].length; j++) {
        System.out.print(StringUtils.leftPad(nums[i][j], 8));
   
    System.out.println();

Example: get a List of lines in a file (ignoring blank lines):

String fileContents = StringUtils.slurpFile(new File("filename"));
List lines = StringUtils.split(fileContents, "[\r\n]+");

edu.stanford.nlp.util.Timing

Static class for measuring how long something takes to execute. To use, call startTime before running the code in question. Call tick to print an intermediate update, and endTime to finish the timing and print the result. You can optionally pass a descriptive string and PrintStream to tick and endTime for more control over what gets printed where.

Example: time reading in a big file and transforming it:

Timing.startTime();
String bigFileContents = StringUtils.slurpFile(bigFile);
Timing.tick("read in big file", System.err);
String output = costlyTransform(bigFileContents);
Timing.endTime("transformed big file", System.err);

Other packages with some useful utilies

edu.stanford.nlp.io
Contains some useful classes for traversing file systems to get lists of files, writing encoded output, and so on.
edu.stanford.nlp.process
Contains many useful text-filtering classes (they work on Documents from the dbm package).
edu.stanford.nlp.stats
Contains some useful classes for tracking statistics (counts) and performing various calculations (e.g. precision/recall)
edu.stanford.nlp.swing
Contains utilities for working with Swing GUIs, e.g. adding icons to your buttons, representing a GUI for properties, adding undo/redo support, adding smart text selection, etc.
edu.stanford.nlp.web
Contains some classes for doing programmatic web searches and parsing web pages.

Questionable classes in util

Numberer: this is sort of a duplicate of Index, but adds a level of namespaces on top. But it's widely used and doesn't quite seem worth removing.



Stanford NLP Group