|
Java Access to WordNet | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Object | +--danbikel.wordnet.Morphy
Implementation of morphological analyzer Morphy that is part of WordNet.
| Field Summary | |
protected static String[][] |
adjSuffixes
Array of two-element String arrays, where the first element of each two-element array is a potential adjective suffix and the second element is a replacement suffix to canonicalize the word form. |
protected boolean[] |
allCached
|
protected HashMap[] |
cache
|
protected WNFile[] |
excFiles
Array of WNFile objects, representing exception lists for each of the three morphable parts of speech: nouns, verbs and adjectives. |
protected static String |
extension
Filename extension for WordNet exception list files. |
protected static String |
fileSep
Cache of the file separator obtained by calling System.getProperty(java.lang.String) with "file.separator". |
protected static String[][] |
nounSuffixes
Array of two-element String arrays, where the first element of each two-element array is a potential noun suffix and the second element is a replacement suffix to canonicalize the word form. |
protected static String[] |
prepArr
Array of prepositions. |
protected static HashSet |
prepositions
Hashed set of String objects in prepArr, used
for determining if a collocation contains a preposition. |
protected static String[][][] |
suffixArr
Array containing references to suffix arrays nounSuffixes, verbSuffixes and
adjSuffixes. |
protected static String[][] |
verbSuffixes
Array of two-element String arrays, where the first element of each two-element array is a potential verb suffix and the second element is a replacement suffix to canonicalize the word form. |
protected WordNet |
wn
Reference to WordNet object. |
| Constructor Summary | |
Morphy(WordNet wn)
Initialize a new Morphy object for finding lemmas for instances of words or collocations. |
|
| Method Summary | |
void |
cacheAll()
|
void |
cacheAll(int posIdx)
|
void |
cacheAll(String pos)
|
protected static boolean |
containsNonAlnum(String s)
Predicate for determining if any characters in s
are non-alphanumeric. |
static int |
countWords(String s,
char separator)
Same as countWords, but
with word-delimiters specifically set to separator,
' ' and '_'. |
static int |
countWords(String s,
String delim)
Counts number of tokens in s separated by
the delimiter characters specified by delim. |
protected String[] |
exceptionLookup(String str,
String pos)
Looks up exceptions for str in the WordNet
exception file for pos |
protected static String[][] |
getSuffix(int posIdx)
Gets the suffix array for a particular part of speech, either nounSuffixes, verbSuffixes or
adjSuffixes. |
protected static String[][] |
getSuffix(String pos)
Gets the suffix array for a particular part of speech, either nounSuffixes, verbSuffixes or
adjSuffixes. |
protected boolean |
hasPrep(String str)
Returns true if any word in (assumed collocated) str other than first or last contains a
preposition. |
static void |
main(String[] args)
Simple test driver for Morphy, reading one word/collocation per line from System.in, and spitting out possible morphs for each of the three morphable parts of speech: nouns, verbs and ajdectives. |
protected String[] |
morphPrep(String str)
Tries to find morphs for str, which is assumed to be a
collocation containing a verb, a preposition and a noun. |
String[] |
morphStr(String origstr,
String pos)
Tries several techniques on origstr to find
possible base forms (lemmas). |
protected String |
morphWord(String word,
String pos)
Tries to find morph of a single word |
protected String |
wordBase(String word,
String[][] suffixes,
int sufIdx)
Tries to form the base form of a word using information in a suffix lookup table. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
protected static String fileSep
System.getProperty(java.lang.String) with "file.separator".
protected static String[][] nounSuffixes
protected static String[][] verbSuffixes
protected static String[][] adjSuffixes
protected static String[][][] suffixArr
nounSuffixes, verbSuffixes and
adjSuffixes.
Each of these suffix arrays is an array of two-element String arrays, where
the first element is a potential suffix and the second element is a
possible replacement suffix when attempting to find a base form of a word.
getSuffix(int),
getSuffix(String),
wordBase(String, String[][], int)protected static final String[] prepArr
protected static final HashSet prepositions
prepArr, used
for determining if a collocation contains a preposition.
protected static String extension
protected WordNet wn
protected WNFile[] excFiles
protected HashMap[] cache
protected boolean[] allCached
| Constructor Detail |
public Morphy(WordNet wn)
wn - reference to a WordNet object| Method Detail |
public void cacheAll()
public void cacheAll(String pos)
public void cacheAll(int posIdx)
protected static final String[][] getSuffix(String pos)
nounSuffixes, verbSuffixes or
adjSuffixes.
pos - the part of speech
protected static final String[][] getSuffix(int posIdx)
nounSuffixes, verbSuffixes or
adjSuffixes.
posIdx - the part of speech
public static final int countWords(String s,
char separator)
countWords, but
with word-delimiters specifically set to separator,
' ' and '_'.
s - string in which to count wordsseparator - word-separator (in addition to ' ' and '_')
s
public static final int countWords(String s,
String delim)
s separated by
the delimiter characters specified by delim.
s - string in which to count wordsdelim - String object whose characters are treated as delimiters
when finding words in s
s
protected String[] exceptionLookup(String str,
String pos)
str in the WordNet
exception file for pos
str - word or collocation to look uppos - part of speech of str
protected boolean hasPrep(String str)
str other than first or last contains a
preposition.
str - collocationprotected static boolean containsNonAlnum(String s)
s
are non-alphanumeric.
s - the string to examine.
true if at least one character is not a letter or
digit; false otherwise.Character.isLetterOrDigit(char)
protected String wordBase(String word,
String[][] suffixes,
int sufIdx)
word - word with possible suffix to be replacedsuffixes - suffix array: array of two-element String arrayssufIdx - index into suffixes
word ends with suffixes[sufIdx][0],
strips of this suffix and returns the stem concatenated with
suffixes[sufIdx][1]; otherwise returns null.
protected String morphWord(String word,
String pos)
word - word to be morphedpos - part of speech of word to be morphed
wordprotected String[] morphPrep(String str)
str, which is assumed to be a
collocation containing a verb, a preposition and a noun.
Assumes str's first word is a verb and its last
word a noun, with an intervening word a preposition.
The following steps are tried, in this order:
str - collocation to morph containing verb, preposition and noun
str
public String[] morphStr(String origstr,
String pos)
origstr to find
possible base forms (lemmas).
morphStr in interface MorphyRemoteorigstr - word or collocation, separated either by whitespace, '_' or
'-', to find lemma ofpos - part of speech of origstr
origstr, possibly of
length 0 if no lemmas could be foundpublic static void main(String[] args)
args -
|
Java Access to WordNet | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||