edu.stanford.nlp.util
Class StringUtils

java.lang.Object
  |
  +--edu.stanford.nlp.util.StringUtils

public class StringUtils
extends Object

Stringtils is a class for random String things.

Author:
Dan Klein, Christopher Manning, Tim Grow (grow@stanford.edu)

Method Summary
static String exactlyN(Object obj, int totalChars)
          Pad or trim the toString value of the given Object.
static String exactlyN(String inStr, int num)
          Pad or trim so as to produce a string of exactly a certain length.
static String fileNameClean(String s)
          Returns a "clean" version of the given filename in which spaces have been converted to dashes and all non-alphaneumeric chars are underscores.
static boolean find(String str, String regex)
          Say whether this regular expression can be found inside this String.
static String join(List l)
          Joins elems with a space.
static String join(List l, String glue)
          Joins each elem in the List with the given glue.
static String join(Object[] elements)
          Joins elems with a space.
static String join(Object[] elements, String glue)
          Joins each elem in the array with the given glue.
static String leftPad(double d, int totalChars)
           
static String leftPad(int i, int totalChars)
           
static String leftPad(Object obj, int totalChars)
           
static String leftPad(String str, int totalChars)
          Pads the given String to the left with spaces to ensure that it's at least totalChars long.
static boolean lookingAt(String str, String regex)
          Say whether this regular expression can be found at the beginning of this String.
static boolean matches(String str, String regex)
          Say whether this regular expression matches this String.
static String pad(Object obj, int totalChars)
          Pads the toString value of the given Object.
static String pad(String str, int totalChars)
          Return a String of length a minimum of totalChars characters by padding the input String str with spaces.
static String ptb2Text(List ptbWords)
          Returns a presentable version of the given PTB-tokenized words.
static String ptb2Text(String ptbText)
          Returns a presentable version of the given PTB-tokenized text.
static String slurpFile(File file)
          Returns all the text in the given File.
static String slurpURL(URL u)
          Returns all the text at the given URL.
static List split(String s)
          Splits on whitespace (\\s+).
static List split(String str, String regex)
          Splits the given string using the given regex as delimiters.
static String trim(Object obj, int maxWidth)
           
static String trim(String s, int maxWidth)
          Returns s if it's at most maxWidth chars, otherwise chops right side to fit.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

find

public static boolean find(String str,
                           String regex)
Say whether this regular expression can be found inside this String. This method provides one of the two "missing" convenience methods for regular expressions in the String class in JDK1.4. This is the one you'll want to use all the time if you're used to Perl. What were they smoking?

Parameters:
str - String to search for match in
regex - String to compile as the regular expression
Returns:
Whether the regex can be found in str

lookingAt

public static boolean lookingAt(String str,
                                String regex)
Say whether this regular expression can be found at the beginning of this String. This method provides one of the two "missing" convenience methods for regular expressions in the String class in JDK1.4.

Parameters:
str - String to search for match at start of
regex - String to compile as the regular expression
Returns:
Whether the regex can be found at the start of str

matches

public static boolean matches(String str,
                              String regex)
Say whether this regular expression matches this String. This method is the same as the String.matches() method, and is included just to give a call that is parallel to the other static regex methods in this class.

Parameters:
str - String to search for match at start of
regex - String to compile as the regular expression
Returns:
Whether the regex matches the whole of this str

slurpFile

public static String slurpFile(File file)
                        throws IOException
Returns all the text in the given File.

IOException

slurpURL

public static String slurpURL(URL u)
                       throws IOException
Returns all the text at the given URL.

IOException

join

public static String join(List l,
                          String glue)
Joins each elem in the List with the given glue. For example, given a list of Integers, you can create a comma-separated list by calling join(numbers, ", ").


join

public static String join(Object[] elements,
                          String glue)
Joins each elem in the array with the given glue. For example, given a list of ints, you can create a comma-separated list by calling join(numbers, ", ").


join

public static String join(List l)
Joins elems with a space.


join

public static String join(Object[] elements)
Joins elems with a space.


split

public static List split(String s)
Splits on whitespace (\\s+).


split

public static List split(String str,
                         String regex)
Splits the given string using the given regex as delimiters. This method is the same as the String.split() method (except it throws the results in a List), and is included just to give a call that is parallel to the other static regex methods in this class.

Parameters:
str - String to split up
regex - String to compile as the regular expression
Returns:
List of Strings resulting from splitting on the regex

pad

public static String pad(String str,
                         int totalChars)
Return a String of length a minimum of totalChars characters by padding the input String str with spaces. If str is already longer than totalChars, it is returned unchanged.


pad

public static String pad(Object obj,
                         int totalChars)
Pads the toString value of the given Object.


exactlyN

public static String exactlyN(String inStr,
                              int num)
Pad or trim so as to produce a string of exactly a certain length.

Parameters:
inStr - The String to be padded or truncated
num - The desired length

exactlyN

public static String exactlyN(Object obj,
                              int totalChars)
Pad or trim the toString value of the given Object.


leftPad

public static String leftPad(String str,
                             int totalChars)
Pads the given String to the left with spaces to ensure that it's at least totalChars long.


leftPad

public static String leftPad(Object obj,
                             int totalChars)

leftPad

public static String leftPad(int i,
                             int totalChars)

leftPad

public static String leftPad(double d,
                             int totalChars)

trim

public static String trim(String s,
                          int maxWidth)
Returns s if it's at most maxWidth chars, otherwise chops right side to fit.


trim

public static String trim(Object obj,
                          int maxWidth)

fileNameClean

public static String fileNameClean(String s)
Returns a "clean" version of the given filename in which spaces have been converted to dashes and all non-alphaneumeric chars are underscores.


ptb2Text

public static String ptb2Text(String ptbText)
Returns a presentable version of the given PTB-tokenized text. PTB tokenization splits up punctuation and does various other things that makes simply joining the tokens with spaces look bad. So join the tokens with space and run it through this method to produce nice looking text. It's not perfect, but it works pretty well.


ptb2Text

public static String ptb2Text(List ptbWords)
Returns a presentable version of the given PTB-tokenized words. Pass in a List of Words or Strings, or a Document and this method will join the words with spaces and call ptb2Text(String) on the output. This method will check if the elements in the list are subtypes of Word, and if so, it will take the word() values to prevent additional text from creeping in (e.g. POS tags). Otherwise the toString value will be used.



Stanford NLP Group