Java Access to WordNet

Class Morphy

All Implemented Interfaces:
MorphyRemote, Remote, Serializable

public class Morphy
extends Object
implements Serializable, MorphyRemote

Implementation of morphological analyzer Morphy that is part of WordNet.

See Also:
Serialized Form

Field Summary
protected static String[][] adjSuffixes
          Array of two-element String arrays, where the first element of each two-element array is a potential adjective suffix and the second element is a replacement suffix to canonicalize the word form.
protected  boolean[] allCached
protected  HashMap[] cache
protected  WNFile[] excFiles
          Array of WNFile objects, representing exception lists for each of the three morphable parts of speech: nouns, verbs and adjectives.
protected static String extension
          Filename extension for WordNet exception list files.
protected static String fileSep
          Cache of the file separator obtained by calling System.getProperty(java.lang.String) with "file.separator".
protected static String[][] nounSuffixes
          Array of two-element String arrays, where the first element of each two-element array is a potential noun suffix and the second element is a replacement suffix to canonicalize the word form.
protected static String[] prepArr
          Array of prepositions.
protected static HashSet prepositions
          Hashed set of String objects in prepArr, used for determining if a collocation contains a preposition.
protected static String[][][] suffixArr
          Array containing references to suffix arrays nounSuffixes, verbSuffixes and adjSuffixes.
protected static String[][] verbSuffixes
          Array of two-element String arrays, where the first element of each two-element array is a potential verb suffix and the second element is a replacement suffix to canonicalize the word form.
protected  WordNet wn
          Reference to WordNet object.
Constructor Summary
Morphy(WordNet wn)
          Initialize a new Morphy object for finding lemmas for instances of words or collocations.
Method Summary
 void cacheAll()
 void cacheAll(int posIdx)
 void cacheAll(String pos)
protected static boolean containsNonAlnum(String s)
          Predicate for determining if any characters in s are non-alphanumeric.
static int countWords(String s, char separator)
          Same as countWords, but with word-delimiters specifically set to separator, ' ' and '_'.
static int countWords(String s, String delim)
          Counts number of tokens in s separated by the delimiter characters specified by delim.
protected  String[] exceptionLookup(String str, String pos)
          Looks up exceptions for str in the WordNet exception file for pos
protected static String[][] getSuffix(int posIdx)
          Gets the suffix array for a particular part of speech, either nounSuffixes, verbSuffixes or adjSuffixes.
protected static String[][] getSuffix(String pos)
          Gets the suffix array for a particular part of speech, either nounSuffixes, verbSuffixes or adjSuffixes.
protected  boolean hasPrep(String str)
          Returns true if any word in (assumed collocated) str other than first or last contains a preposition.
static void main(String[] args)
          Simple test driver for Morphy, reading one word/collocation per line from, and spitting out possible morphs for each of the three morphable parts of speech: nouns, verbs and ajdectives.
protected  String[] morphPrep(String str)
          Tries to find morphs for str, which is assumed to be a collocation containing a verb, a preposition and a noun.
 String[] morphStr(String origstr, String pos)
          Tries several techniques on origstr to find possible base forms (lemmas).
protected  String morphWord(String word, String pos)
          Tries to find morph of a single word
protected  String wordBase(String word, String[][] suffixes, int sufIdx)
          Tries to form the base form of a word using information in a suffix lookup table.
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail


protected static String fileSep
Cache of the file separator obtained by calling System.getProperty(java.lang.String) with "file.separator".


protected static String[][] nounSuffixes
Array of two-element String arrays, where the first element of each two-element array is a potential noun suffix and the second element is a replacement suffix to canonicalize the word form. These arrays are accessed by getSuffix.


protected static String[][] verbSuffixes
Array of two-element String arrays, where the first element of each two-element array is a potential verb suffix and the second element is a replacement suffix to canonicalize the word form. These arrays are accessed by getSuffix.


protected static String[][] adjSuffixes
Array of two-element String arrays, where the first element of each two-element array is a potential adjective suffix and the second element is a replacement suffix to canonicalize the word form. These arrays are accessed by getSuffix.


protected static String[][][] suffixArr
Array containing references to suffix arrays nounSuffixes, verbSuffixes and adjSuffixes. Each of these suffix arrays is an array of two-element String arrays, where the first element is a potential suffix and the second element is a possible replacement suffix when attempting to find a base form of a word.

See Also:
getSuffix(int), getSuffix(String), wordBase(String, String[][], int)


protected static final String[] prepArr
Array of prepositions.


protected static final HashSet prepositions
Hashed set of String objects in prepArr, used for determining if a collocation contains a preposition.


protected static String extension
Filename extension for WordNet exception list files.


protected WordNet wn
Reference to WordNet object.


protected WNFile[] excFiles
Array of WNFile objects, representing exception lists for each of the three morphable parts of speech: nouns, verbs and adjectives.


protected HashMap[] cache


protected boolean[] allCached
Constructor Detail


public Morphy(WordNet wn)
Initialize a new Morphy object for finding lemmas for instances of words or collocations.

wn - reference to a WordNet object
Method Detail


public void cacheAll()


public void cacheAll(String pos)


public void cacheAll(int posIdx)


protected static final String[][] getSuffix(String pos)
Gets the suffix array for a particular part of speech, either nounSuffixes, verbSuffixes or adjSuffixes.

pos - the part of speech
an array of two-element String arrays


protected static final String[][] getSuffix(int posIdx)
Gets the suffix array for a particular part of speech, either nounSuffixes, verbSuffixes or adjSuffixes.

posIdx - the part of speech
an array of two-element String arrays


public static final int countWords(String s,
                                   char separator)
Same as countWords, but with word-delimiters specifically set to separator, ' ' and '_'.

s - string in which to count words
separator - word-separator (in addition to ' ' and '_')
number of words in s


public static final int countWords(String s,
                                   String delim)
Counts number of tokens in s separated by the delimiter characters specified by delim.

s - string in which to count words
delim - String object whose characters are treated as delimiters when finding words in s
number of words in s


protected String[] exceptionLookup(String str,
                                   String pos)
Looks up exceptions for str in the WordNet exception file for pos

str - word or collocation to look up
pos - part of speech of str
array of words found in exception list, or null if none exists


protected boolean hasPrep(String str)
Returns true if any word in (assumed collocated) str other than first or last contains a preposition.

str - collocation


protected static boolean containsNonAlnum(String s)
Predicate for determining if any characters in s are non-alphanumeric.

s - the string to examine.
true if at least one character is not a letter or digit; false otherwise.
See Also:


protected String wordBase(String word,
                          String[][] suffixes,
                          int sufIdx)
Tries to form the base form of a word using information in a suffix lookup table.

word - word with possible suffix to be replaced
suffixes - suffix array: array of two-element String arrays
sufIdx - index into suffixes
If word ends with suffixes[sufIdx][0], strips of this suffix and returns the stem concatenated with suffixes[sufIdx][1]; otherwise returns null.


protected String morphWord(String word,
                           String pos)
Tries to find morph of a single word

word - word to be morphed
pos - part of speech of word to be morphed
possible morph of word


protected String[] morphPrep(String str)
Tries to find morphs for str, which is assumed to be a collocation containing a verb, a preposition and a noun. Assumes str's first word is a verb and its last word a noun, with an intervening word a preposition. The following steps are tried, in this order:
  1. try to morph final word if there are more than two words; save
  2. if first word contains non alphanumeric chars, return null
  3. return exception-list-morphed verb + rest of collocation if in WN
  4. return exception-list-morphed verb + rest of collocation with morphed final word, if in WN
  5. foreach possible suffix for the first word (assumed to be a verb), if a suffix is found, replace with an alternative suffix (table lookup) and try the following two things in the following order:
    1. return verb with new ending + rest of collocation, if in WN
    2. return verb with new ending + rest of collocation with morphed final word, if in WN
  6. return first word + rest if different from original string (???)
  7. return first word + rest of collocation with morphed final word

str - collocation to morph containing verb, preposition and noun
morph of str


public String[] morphStr(String origstr,
                         String pos)
Tries several techniques on origstr to find possible base forms (lemmas).

Specified by:
morphStr in interface MorphyRemote
origstr - word or collocation, separated either by whitespace, '_' or '-', to find lemma of
pos - part of speech of origstr
array of possible lemmas for origstr, possibly of length 0 if no lemmas could be found


public static void main(String[] args)
Simple test driver for Morphy, reading one word/collocation per line from, and spitting out possible morphs for each of the three morphable parts of speech: nouns, verbs and ajdectives.

args -

Java Access to WordNet

Author: Dan Bikel