Java Access to WordNet

danbikel.wordnet
Class Morphy

java.lang.Object
  |
  +--danbikel.wordnet.Morphy
All Implemented Interfaces:
MorphyRemote, Remote, Serializable

public class Morphy
extends Object
implements Serializable, MorphyRemote

Implementation of morphological analyzer Morphy that is part of WordNet.

See Also:
Serialized Form

Field Summary
protected static String[][] adjSuffixes
          Array of two-element String arrays, where the first element of each two-element array is a potential adjective suffix and the second element is a replacement suffix to canonicalize the word form.
protected  boolean[] allCached
           
protected  HashMap[] cache
           
protected  WNFile[] excFiles
          Array of WNFile objects, representing exception lists for each of the three morphable parts of speech: nouns, verbs and adjectives.
protected static String extension
          Filename extension for WordNet exception list files.
protected static String fileSep
          Cache of the file separator obtained by calling System.getProperty(java.lang.String) with "file.separator".
protected static String[][] nounSuffixes
          Array of two-element String arrays, where the first element of each two-element array is a potential noun suffix and the second element is a replacement suffix to canonicalize the word form.
protected static String[] prepArr
          Array of prepositions.
protected static HashSet prepositions
          Hashed set of String objects in prepArr, used for determining if a collocation contains a preposition.
protected static String[][][] suffixArr
          Array containing references to suffix arrays nounSuffixes, verbSuffixes and adjSuffixes.
protected static String[][] verbSuffixes
          Array of two-element String arrays, where the first element of each two-element array is a potential verb suffix and the second element is a replacement suffix to canonicalize the word form.
protected  WordNet wn
          Reference to WordNet object.
 
Constructor Summary
Morphy(WordNet wn)
          Initialize a new Morphy object for finding lemmas for instances of words or collocations.
 
Method Summary
 void cacheAll()
           
 void cacheAll(int posIdx)
           
 void cacheAll(String pos)
           
protected static boolean containsNonAlnum(String s)
          Predicate for determining if any characters in s are non-alphanumeric.
static int countWords(String s, char separator)
          Same as countWords, but with word-delimiters specifically set to separator, ' ' and '_'.
static int countWords(String s, String delim)
          Counts number of tokens in s separated by the delimiter characters specified by delim.
protected  String[] exceptionLookup(String str, String pos)
          Looks up exceptions for str in the WordNet exception file for pos
protected static String[][] getSuffix(int posIdx)
          Gets the suffix array for a particular part of speech, either nounSuffixes, verbSuffixes or adjSuffixes.
protected static String[][] getSuffix(String pos)
          Gets the suffix array for a particular part of speech, either nounSuffixes, verbSuffixes or adjSuffixes.
protected  boolean hasPrep(String str)
          Returns true if any word in (assumed collocated) str other than first or last contains a preposition.
static void main(String[] args)
          Simple test driver for Morphy, reading one word/collocation per line from System.in, and spitting out possible morphs for each of the three morphable parts of speech: nouns, verbs and ajdectives.
protected  String[] morphPrep(String str)
          Tries to find morphs for str, which is assumed to be a collocation containing a verb, a preposition and a noun.
 String[] morphStr(String origstr, String pos)
          Tries several techniques on origstr to find possible base forms (lemmas).
protected  String morphWord(String word, String pos)
          Tries to find morph of a single word
protected  String wordBase(String word, String[][] suffixes, int sufIdx)
          Tries to form the base form of a word using information in a suffix lookup table.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

fileSep

protected static String fileSep
Cache of the file separator obtained by calling System.getProperty(java.lang.String) with "file.separator".


nounSuffixes

protected static String[][] nounSuffixes
Array of two-element String arrays, where the first element of each two-element array is a potential noun suffix and the second element is a replacement suffix to canonicalize the word form. These arrays are accessed by getSuffix.


verbSuffixes

protected static String[][] verbSuffixes
Array of two-element String arrays, where the first element of each two-element array is a potential verb suffix and the second element is a replacement suffix to canonicalize the word form. These arrays are accessed by getSuffix.


adjSuffixes

protected static String[][] adjSuffixes
Array of two-element String arrays, where the first element of each two-element array is a potential adjective suffix and the second element is a replacement suffix to canonicalize the word form. These arrays are accessed by getSuffix.


suffixArr

protected static String[][][] suffixArr
Array containing references to suffix arrays nounSuffixes, verbSuffixes and adjSuffixes. Each of these suffix arrays is an array of two-element String arrays, where the first element is a potential suffix and the second element is a possible replacement suffix when attempting to find a base form of a word.

See Also:
getSuffix(int), getSuffix(String), wordBase(String, String[][], int)

prepArr

protected static final String[] prepArr
Array of prepositions.


prepositions

protected static final HashSet prepositions
Hashed set of String objects in prepArr, used for determining if a collocation contains a preposition.


extension

protected static String extension
Filename extension for WordNet exception list files.


wn

protected WordNet wn
Reference to WordNet object.


excFiles

protected WNFile[] excFiles
Array of WNFile objects, representing exception lists for each of the three morphable parts of speech: nouns, verbs and adjectives.


cache

protected HashMap[] cache

allCached

protected boolean[] allCached
Constructor Detail

Morphy

public Morphy(WordNet wn)
Initialize a new Morphy object for finding lemmas for instances of words or collocations.

Parameters:
wn - reference to a WordNet object
Method Detail

cacheAll

public void cacheAll()

cacheAll

public void cacheAll(String pos)

cacheAll

public void cacheAll(int posIdx)

getSuffix

protected static final String[][] getSuffix(String pos)
Gets the suffix array for a particular part of speech, either nounSuffixes, verbSuffixes or adjSuffixes.

Parameters:
pos - the part of speech
Returns:
an array of two-element String arrays

getSuffix

protected static final String[][] getSuffix(int posIdx)
Gets the suffix array for a particular part of speech, either nounSuffixes, verbSuffixes or adjSuffixes.

Parameters:
posIdx - the part of speech
Returns:
an array of two-element String arrays

countWords

public static final int countWords(String s,
                                   char separator)
Same as countWords, but with word-delimiters specifically set to separator, ' ' and '_'.

Parameters:
s - string in which to count words
separator - word-separator (in addition to ' ' and '_')
Returns:
number of words in s

countWords

public static final int countWords(String s,
                                   String delim)
Counts number of tokens in s separated by the delimiter characters specified by delim.

Parameters:
s - string in which to count words
delim - String object whose characters are treated as delimiters when finding words in s
Returns:
number of words in s

exceptionLookup

protected String[] exceptionLookup(String str,
                                   String pos)
Looks up exceptions for str in the WordNet exception file for pos

Parameters:
str - word or collocation to look up
pos - part of speech of str
Returns:
array of words found in exception list, or null if none exists

hasPrep

protected boolean hasPrep(String str)
Returns true if any word in (assumed collocated) str other than first or last contains a preposition.

Parameters:
str - collocation

containsNonAlnum

protected static boolean containsNonAlnum(String s)
Predicate for determining if any characters in s are non-alphanumeric.

Parameters:
s - the string to examine.
Returns:
true if at least one character is not a letter or digit; false otherwise.
See Also:
Character.isLetterOrDigit(char)

wordBase

protected String wordBase(String word,
                          String[][] suffixes,
                          int sufIdx)
Tries to form the base form of a word using information in a suffix lookup table.

Parameters:
word - word with possible suffix to be replaced
suffixes - suffix array: array of two-element String arrays
sufIdx - index into suffixes
Returns:
If word ends with suffixes[sufIdx][0], strips of this suffix and returns the stem concatenated with suffixes[sufIdx][1]; otherwise returns null.

morphWord

protected String morphWord(String word,
                           String pos)
Tries to find morph of a single word

Parameters:
word - word to be morphed
pos - part of speech of word to be morphed
Returns:
possible morph of word

morphPrep

protected String[] morphPrep(String str)
Tries to find morphs for str, which is assumed to be a collocation containing a verb, a preposition and a noun. Assumes str's first word is a verb and its last word a noun, with an intervening word a preposition. The following steps are tried, in this order:
  1. try to morph final word if there are more than two words; save
  2. if first word contains non alphanumeric chars, return null
  3. return exception-list-morphed verb + rest of collocation if in WN
  4. return exception-list-morphed verb + rest of collocation with morphed final word, if in WN
  5. foreach possible suffix for the first word (assumed to be a verb), if a suffix is found, replace with an alternative suffix (table lookup) and try the following two things in the following order:
    1. return verb with new ending + rest of collocation, if in WN
    2. return verb with new ending + rest of collocation with morphed final word, if in WN
  6. return first word + rest if different from original string (???)
  7. return first word + rest of collocation with morphed final word

Parameters:
str - collocation to morph containing verb, preposition and noun
Returns:
morph of str

morphStr

public String[] morphStr(String origstr,
                         String pos)
Tries several techniques on origstr to find possible base forms (lemmas).

Specified by:
morphStr in interface MorphyRemote
Parameters:
origstr - word or collocation, separated either by whitespace, '_' or '-', to find lemma of
pos - part of speech of origstr
Returns:
array of possible lemmas for origstr, possibly of length 0 if no lemmas could be found

main

public static void main(String[] args)
Simple test driver for Morphy, reading one word/collocation per line from System.in, and spitting out possible morphs for each of the three morphable parts of speech: nouns, verbs and ajdectives.

Parameters:
args -

Java Access to WordNet

Author: Dan Bikel