edu.stanford.nlp.process
Class WordShapeClassifier

java.lang.Object
  extended by edu.stanford.nlp.process.WordShapeClassifier

public class WordShapeClassifier
extends Object

Provides static methods which map any String to another String indicative of its "word shape" -- e.g., whether capitalized, numeric, etc. Different implementations may implement quite different, normally language specific ideas of what word shapes are useful.

Author:
Christopher Manning, Dan Klein

Field Summary
static int NOWORDSHAPE
           
static int WORDSHAPECHINESE
           
static int WORDSHAPECHRIS1
           
static int WORDSHAPECHRIS2
           
static int WORDSHAPECHRIS2USELC
           
static int WORDSHAPECHRIS3
           
static int WORDSHAPECHRIS3USELC
           
static int WORDSHAPECHRIS4
           
static int WORDSHAPEDAN1
           
static int WORDSHAPEDAN2
           
static int WORDSHAPEDAN2BIO
           
static int WORDSHAPEDAN2BIOUSELC
           
static int WORDSHAPEDAN2USELC
           
static int WORDSHAPEDIGITS
           
static int WORDSHAPEJENNY1
           
static int WORDSHAPEJENNY1USELC
           
 
Method Summary
static int lookupShaper(String name)
          Look up a shaper by a short String name.
static void main(String[] args)
          Usage: java edu.stanford.nlp.process.WordShapeClassifier [-wordShape name] string+
where name is an argument to lookupShaper.
static String wordShape(String inStr, int wordShaper)
          Specify the String and the int identifying which word shaper to use and this returns the result of using that wordshaper on the String.
static String wordShape(String inStr, int wordShaper, Collection<String> knownLCWords)
          Specify the string and the int identifying which word shaper to use and this returns the result of using that wordshaper on the String.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NOWORDSHAPE

public static final int NOWORDSHAPE
See Also:
Constant Field Values

WORDSHAPEDAN1

public static final int WORDSHAPEDAN1
See Also:
Constant Field Values

WORDSHAPECHRIS1

public static final int WORDSHAPECHRIS1
See Also:
Constant Field Values

WORDSHAPEDAN2

public static final int WORDSHAPEDAN2
See Also:
Constant Field Values

WORDSHAPEDAN2USELC

public static final int WORDSHAPEDAN2USELC
See Also:
Constant Field Values

WORDSHAPEDAN2BIO

public static final int WORDSHAPEDAN2BIO
See Also:
Constant Field Values

WORDSHAPEDAN2BIOUSELC

public static final int WORDSHAPEDAN2BIOUSELC
See Also:
Constant Field Values

WORDSHAPEJENNY1

public static final int WORDSHAPEJENNY1
See Also:
Constant Field Values

WORDSHAPEJENNY1USELC

public static final int WORDSHAPEJENNY1USELC
See Also:
Constant Field Values

WORDSHAPECHRIS2

public static final int WORDSHAPECHRIS2
See Also:
Constant Field Values

WORDSHAPECHRIS2USELC

public static final int WORDSHAPECHRIS2USELC
See Also:
Constant Field Values

WORDSHAPECHRIS3

public static final int WORDSHAPECHRIS3
See Also:
Constant Field Values

WORDSHAPECHRIS3USELC

public static final int WORDSHAPECHRIS3USELC
See Also:
Constant Field Values

WORDSHAPECHRIS4

public static final int WORDSHAPECHRIS4
See Also:
Constant Field Values

WORDSHAPEDIGITS

public static final int WORDSHAPEDIGITS
See Also:
Constant Field Values

WORDSHAPECHINESE

public static final int WORDSHAPECHINESE
See Also:
Constant Field Values
Method Detail

lookupShaper

public static int lookupShaper(String name)
Look up a shaper by a short String name.

Parameters:
name - Shaper name. Known names have patterns along the lines of: dan[12](bio)?(UseLC)?, jenny1(useLC)?, chris[1234](useLC)?.
Returns:
An integer constant for the shaper

wordShape

public static String wordShape(String inStr,
                               int wordShaper)
Specify the String and the int identifying which word shaper to use and this returns the result of using that wordshaper on the String.

Parameters:
inStr - String to calculate word shape of
wordShaper - Constant for which shaping formula to use
Returns:
The wordshape String

wordShape

public static String wordShape(String inStr,
                               int wordShaper,
                               Collection<String> knownLCWords)
Specify the string and the int identifying which word shaper to use and this returns the result of using that wordshaper on the String.

Parameters:
inStr - String to calculate word shape of
wordShaper - Constant for which shaping formula to use
knownLCWords - A Collection of known lowercase words, which some shapers use to decide the class of capitalized words. Note: while this code works with any Collection, you should provide a Set for decent performance. If this parameter is null or empty, then this option is not used (capitalized words are treated the same, regardless of whether the lowercased version of the String has been seen).
Returns:
The wordshape String

main

public static void main(String[] args)
Usage: java edu.stanford.nlp.process.WordShapeClassifier [-wordShape name] string+
where name is an argument to lookupShaper. Known names have patterns along the lines of: dan[12](bio)?(UseLC)?, jenny1(useLC)?, chris[1234](useLC)?. If you don't specify a word shape function, you get chris1.

Parameters:
args - Command-line arguments, as above.


Stanford NLP Group