|
Java Access to WordNet | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--danbikel.wordnet.Morphy
Implementation of morphological analyzer Morphy that is part of WordNet.
Field Summary | |
protected static String[][] |
adjSuffixes
Array of two-element String arrays, where the first element of each two-element array is a potential adjective suffix and the second element is a replacement suffix to canonicalize the word form. |
protected boolean[] |
allCached
|
protected HashMap[] |
cache
|
protected WNFile[] |
excFiles
Array of WNFile objects, representing exception lists for each of the three morphable parts of speech: nouns, verbs and adjectives. |
protected static String |
extension
Filename extension for WordNet exception list files. |
protected static String |
fileSep
Cache of the file separator obtained by calling System.getProperty(java.lang.String) with "file.separator" . |
protected static String[][] |
nounSuffixes
Array of two-element String arrays, where the first element of each two-element array is a potential noun suffix and the second element is a replacement suffix to canonicalize the word form. |
protected static String[] |
prepArr
Array of prepositions. |
protected static HashSet |
prepositions
Hashed set of String objects in prepArr , used
for determining if a collocation contains a preposition. |
protected static String[][][] |
suffixArr
Array containing references to suffix arrays nounSuffixes , verbSuffixes and
adjSuffixes . |
protected static String[][] |
verbSuffixes
Array of two-element String arrays, where the first element of each two-element array is a potential verb suffix and the second element is a replacement suffix to canonicalize the word form. |
protected WordNet |
wn
Reference to WordNet object. |
Constructor Summary | |
Morphy(WordNet wn)
Initialize a new Morphy object for finding lemmas for instances of words or collocations. |
Method Summary | |
void |
cacheAll()
|
void |
cacheAll(int posIdx)
|
void |
cacheAll(String pos)
|
protected static boolean |
containsNonAlnum(String s)
Predicate for determining if any characters in s
are non-alphanumeric. |
static int |
countWords(String s,
char separator)
Same as countWords , but
with word-delimiters specifically set to separator ,
' ' and '_'. |
static int |
countWords(String s,
String delim)
Counts number of tokens in s separated by
the delimiter characters specified by delim . |
protected String[] |
exceptionLookup(String str,
String pos)
Looks up exceptions for str in the WordNet
exception file for pos |
protected static String[][] |
getSuffix(int posIdx)
Gets the suffix array for a particular part of speech, either nounSuffixes , verbSuffixes or
adjSuffixes . |
protected static String[][] |
getSuffix(String pos)
Gets the suffix array for a particular part of speech, either nounSuffixes , verbSuffixes or
adjSuffixes . |
protected boolean |
hasPrep(String str)
Returns true if any word in (assumed collocated) str other than first or last contains a
preposition. |
static void |
main(String[] args)
Simple test driver for Morphy, reading one word/collocation per line from System.in, and spitting out possible morphs for each of the three morphable parts of speech: nouns, verbs and ajdectives. |
protected String[] |
morphPrep(String str)
Tries to find morphs for str , which is assumed to be a
collocation containing a verb, a preposition and a noun. |
String[] |
morphStr(String origstr,
String pos)
Tries several techniques on origstr to find
possible base forms (lemmas). |
protected String |
morphWord(String word,
String pos)
Tries to find morph of a single word |
protected String |
wordBase(String word,
String[][] suffixes,
int sufIdx)
Tries to form the base form of a word using information in a suffix lookup table. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
protected static String fileSep
System.getProperty(java.lang.String)
with "file.separator"
.
protected static String[][] nounSuffixes
protected static String[][] verbSuffixes
protected static String[][] adjSuffixes
protected static String[][][] suffixArr
nounSuffixes
, verbSuffixes
and
adjSuffixes
.
Each of these suffix arrays is an array of two-element String arrays, where
the first element is a potential suffix and the second element is a
possible replacement suffix when attempting to find a base form of a word.
getSuffix(int)
,
getSuffix(String)
,
wordBase(String, String[][], int)
protected static final String[] prepArr
protected static final HashSet prepositions
prepArr
, used
for determining if a collocation contains a preposition.
protected static String extension
protected WordNet wn
protected WNFile[] excFiles
protected HashMap[] cache
protected boolean[] allCached
Constructor Detail |
public Morphy(WordNet wn)
wn
- reference to a WordNet objectMethod Detail |
public void cacheAll()
public void cacheAll(String pos)
public void cacheAll(int posIdx)
protected static final String[][] getSuffix(String pos)
nounSuffixes
, verbSuffixes
or
adjSuffixes
.
pos
- the part of speech
protected static final String[][] getSuffix(int posIdx)
nounSuffixes
, verbSuffixes
or
adjSuffixes
.
posIdx
- the part of speech
public static final int countWords(String s, char separator)
countWords
, but
with word-delimiters specifically set to separator
,
' ' and '_'.
s
- string in which to count wordsseparator
- word-separator (in addition to ' ' and '_')
s
public static final int countWords(String s, String delim)
s
separated by
the delimiter characters specified by delim
.
s
- string in which to count wordsdelim
- String object whose characters are treated as delimiters
when finding words in s
s
protected String[] exceptionLookup(String str, String pos)
str
in the WordNet
exception file for pos
str
- word or collocation to look uppos
- part of speech of str
protected boolean hasPrep(String str)
str
other than first or last contains a
preposition.
str
- collocationprotected static boolean containsNonAlnum(String s)
s
are non-alphanumeric.
s
- the string to examine.
true
if at least one character is not a letter or
digit; false
otherwise.Character.isLetterOrDigit(char)
protected String wordBase(String word, String[][] suffixes, int sufIdx)
word
- word with possible suffix to be replacedsuffixes
- suffix array: array of two-element String arrayssufIdx
- index into suffixes
word
ends with suffixes[sufIdx][0]
,
strips of this suffix and returns the stem concatenated with
suffixes[sufIdx][1]
; otherwise returns null.protected String morphWord(String word, String pos)
word
- word to be morphedpos
- part of speech of word to be morphed
word
protected String[] morphPrep(String str)
str
, which is assumed to be a
collocation containing a verb, a preposition and a noun.
Assumes str
's first word is a verb and its last
word a noun, with an intervening word a preposition.
The following steps are tried, in this order:
str
- collocation to morph containing verb, preposition and noun
str
public String[] morphStr(String origstr, String pos)
origstr
to find
possible base forms (lemmas).
morphStr
in interface MorphyRemote
origstr
- word or collocation, separated either by whitespace, '_' or
'-', to find lemma ofpos
- part of speech of origstr
origstr
, possibly of
length 0 if no lemmas could be foundpublic static void main(String[] args)
args
-
|
Java Access to WordNet | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |