|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.stanford.nlp.process.WordShapeClassifier
public class WordShapeClassifier
Provides static methods which map any String to another String indicative of its "word shape" -- e.g., whether capitalized, numeric, etc. Different implementations may implement quite different, normally language specific ideas of what word shapes are useful.
Field Summary | |
---|---|
static int |
NOWORDSHAPE
|
static int |
WORDSHAPECHRIS1
|
static int |
WORDSHAPECHRIS2
|
static int |
WORDSHAPECHRIS2USELC
|
static int |
WORDSHAPECHRIS3
|
static int |
WORDSHAPECHRIS3USELC
|
static int |
WORDSHAPECHRIS4
|
static int |
WORDSHAPEDAN1
|
static int |
WORDSHAPEDAN2
|
static int |
WORDSHAPEDAN2BIO
|
static int |
WORDSHAPEDAN2BIOUSELC
|
static int |
WORDSHAPEDAN2USELC
|
static int |
WORDSHAPEJENNY1
|
static int |
WORDSHAPEJENNY1USELC
|
Method Summary | |
---|---|
static void |
addKnownLowerCaseWords(Collection words)
|
static Set |
getKnownLowerCaseWords()
|
static int |
lookupShaper(String name)
|
static void |
main(String[] args)
Usage: java edu.stanford.nlp.process.WordShapeClassifier
[-wordShape name] string+ where name is an argument to lookupShaper . |
static void |
setKnownLowerCaseWords(Set words)
|
static boolean |
usesLC(int shape)
Returns true if the specified word shaper uses known lower case words. |
static String |
wordShape(String inStr,
int wordShaper)
Specify the string and the int identifying which word shaper to use and this returns the result of using that wordshaper on the word. |
static String |
wordShape(String inStr,
int wordShaper,
boolean markKnownLC)
|
static String |
wordShape(String inStr,
int wordShaper,
boolean markKnownLC,
Set knownLCWords)
|
static String |
wordShape(String inStr,
int wordShaper,
Set knownLCWords)
Specify the string and the int identifying which word shaper to use and this returns the result of using that wordshaper on the word. |
static String |
wordShapeChris1(String s)
|
static String |
wordShapeChris2(String s,
boolean markKnownLC,
boolean omitIfInBoundary)
This one picks up on Dan2 ideas, but seeks to make less distinctions mid sequence by sorting for long words, but to maintain extra distinctions for short words. |
static String |
wordShapeChris4(String s,
boolean markKnownLC,
boolean omitIfInBoundary)
This one picks up on Dan2 ideas, but seeks to make less distinctions mid sequence by sorting for long words, but to maintain extra distinctions for short words, by always recording the class of the first and last two characters of the word. |
static String |
wordShapeDan1(String s)
A fairly basic 5-way classifier, that notes digits, and upper and lower case, mixed, and non-alphanumeric. |
static String |
wordShapeDan2(String s,
boolean markKnownLC)
A fine-grained word shape classifier, that equivalence classes. |
static String |
wordShapeDan2Bio(String s,
boolean useKnownLC)
Returns a fine-grained word shape classifier, that equivalence classes lower and upper case and digits, and collapses sequences of the same type, but keeps all punctuation. |
static String |
wordShapeJenny1(String s,
boolean markKnownLC)
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int NOWORDSHAPE
public static final int WORDSHAPEDAN1
public static final int WORDSHAPECHRIS1
public static final int WORDSHAPEDAN2
public static final int WORDSHAPEDAN2USELC
public static final int WORDSHAPEDAN2BIO
public static final int WORDSHAPEDAN2BIOUSELC
public static final int WORDSHAPEJENNY1
public static final int WORDSHAPEJENNY1USELC
public static final int WORDSHAPECHRIS2
public static final int WORDSHAPECHRIS2USELC
public static final int WORDSHAPECHRIS3
public static final int WORDSHAPECHRIS3USELC
public static final int WORDSHAPECHRIS4
Method Detail |
---|
public static int lookupShaper(String name)
public static boolean usesLC(int shape)
public static String wordShape(String inStr, int wordShaper)
public static String wordShape(String inStr, int wordShaper, Set knownLCWords)
public static String wordShape(String inStr, int wordShaper, boolean markKnownLC)
public static String wordShape(String inStr, int wordShaper, boolean markKnownLC, Set knownLCWords)
public static String wordShapeDan1(String s)
public static String wordShapeDan2(String s, boolean markKnownLC)
s
- The String whose shape is to be returnedmarkKnownLC
- Whether to mark words whose lower case form is
found in the previously initialized list of known
lower case words
addKnownLowerCaseWords(Collection)
public static String wordShapeJenny1(String s, boolean markKnownLC)
public static String wordShapeChris2(String s, boolean markKnownLC, boolean omitIfInBoundary)
omitIfInBoundary
- If true, character classes present in the
first or last two letters of the word are not also registered
as classes that appear in the middle of the word.public static String wordShapeChris4(String s, boolean markKnownLC, boolean omitIfInBoundary)
omitIfInBoundary
- If true, character classes present in the
first or last two letters of the word are not also registered
as classes that appear in the middle of the word.public static Set getKnownLowerCaseWords()
public static void setKnownLowerCaseWords(Set words)
public static void addKnownLowerCaseWords(Collection words)
public static String wordShapeDan2Bio(String s, boolean useKnownLC)
public static String wordShapeChris1(String s)
public static void main(String[] args)
java edu.stanford.nlp.process.WordShapeClassifier
[-wordShape name] string+
name
is an argument to lookupShaper
.
Known names have patterns along the lines of: dan[12](bio)?(UseLC)?,
jenny1(useLC)?, chris[1234](useLC)?.
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |