|
Java Access to WordNet | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--danbikel.wordnet.WordNet
The main object through which the WordNet database is accessed.
Field Summary | |
static String |
adjPos
The constant representing an adjective in WordNet. |
static int |
adjPosIdx
The integer constant representing an adjective in WordNet. |
static String |
advPos
The constant representing an adverb in WordNet. |
static int |
advPosIdx
The integer constant representing an adverb in WordNet. |
protected boolean[] |
allDataEntriesCached
Array indexed by part of speech integer indicating whether all data entries for that part of speech have been cached. |
protected boolean[] |
allIndexEntriesCached
Array indexed by part of speech integer indicating whether all index entries for that part of speech have been cached. |
protected Hashtable[] |
allParentsCache
Cache used by getAllParentsBFS(java.lang.String, int) . |
protected Set[] |
backEdges
Sets indexed by part of speech integer that contain erroneous back edges in WordNet for those parts of speech. |
protected Map |
canonical
A reflexive map used for canonicalizing various types of objects. |
protected Hashtable[] |
childrenCache
Cache of synset children. |
protected String |
className
A cache of this.getClass().getName() . |
protected Hashtable[] |
dataEntryCache
Cache of WNDataEntry objects. |
protected static String |
fileSep
Cache of System.getProperty("file.separator") . |
protected Hashtable[] |
indexEntryCache
Cache of WNIndexEntry objects. |
protected Set[] |
knownLemmas
Sets indexed by part of speech integer of all known lemmas (index entries) for that part of speech. |
protected Set[] |
knownSynsets
Sets indexed by part of speech integer of all known synsets (data entries) for that part of speech. |
protected boolean |
lemmasAndSynsetsCached
True if cacheLemmasAndSynsets() has been called. |
static int |
maxPosIdx
The maximum integer constant used to represent parts of speech. |
static int |
minPosIdx
The minimum integer constant used to represent parts of speech. |
static String |
nounPos
The constant representing a noun in WordNet. |
static int |
nounPosIdx
The integer constant representing a noun in WordNet. |
static int |
numPartsOfSpeech
The total number of parts of speech in WordNet, collapsing adjective satellites to be adjectives, i.e., 4. |
protected Hashtable[] |
parentsCache
Cache of synset parents. |
protected static String[] |
partsOfSpeech
An array that forms a mapping from the integer representations of parts of speech to their canonical String representations. |
protected static int[] |
posIdxMap
A mapping between part of speech names and and the corresponding integers for those parts of speech. |
protected static String[] |
posNames
An array that forms a mapping from the integer representations of parts of speech to their unabbreviated names. |
static boolean |
secretFlag
Debugging flag. |
protected Set[] |
subtreesExplored
Sets indexed by part of speech integer that contain subtrees already explored by detectCycle(java.util.Set, java.lang.String, int) . |
static String |
v15
The String object representing version 1.5 of WordNet. |
static String |
v16
The String object representing version 1.6 of WordNet. |
static String |
verbPos
The constant representing a verb in WordNet. |
static int |
verbPosIdx
The integer constant representing a verb in WordNet. |
Constructor Summary | |
WordNet()
Constructs a new WordNet object with default index and data files and the default caching option. |
|
WordNet(boolean useCache)
Constructs a new WordNet object with default index and data files. |
|
WordNet(String WNHOME)
Constructs a new WordNet object with default index and data files and the default caching option. |
|
WordNet(String WNHOME,
boolean useCache)
Constructs a new WordNet object with default index and data files. |
|
WordNet(String verbIndexFilename,
String verbDataFilename,
String nounIndexFilename,
String nounDataFilename,
String adjIndexFilename,
String adjDataFilename,
String advIndexFilename,
String advDataFilename,
boolean useCache)
Constructs this WordNet object with the specified index and data files and boolean as to whether to perform caching. |
Method Summary | |
void |
cacheAll()
Caches the index and data entries for all four parts of speech. |
void |
cacheAll(String pos)
Caches all index and data entries for the specified part of speech. |
protected void |
cacheLemmasAndSynsets()
Caches all lemmas and synsets. |
void |
clearAllParentsCache()
Clears the cache used by getAllParentsBFS(java.lang.String, int) for all parts of
speech. |
void |
clearAllParentsCache(int posIdx)
Clears the cache used by getAllParentsBFS(java.lang.String, int) for the specified
part of speech. |
void |
clearAllParentsCache(String pos)
Clears the cache used by getAllParentsBFS(java.lang.String, int) for the specified
part of speech. |
void |
clearCache()
Clears the index and data entry caches. |
static Hashtable |
clearHalfIfNeeded(Hashtable h,
int maxSize)
Utility method that clears every other element from the hashtable h , thereby resulting in a random replacement strategy
(assuming a random hash-bucket dispersion). |
boolean |
contains(String word,
String pos)
Returns whether a given word and part of speech exists in the database. |
protected void |
detectCycle(Set synsetsSeen,
String pos,
int synset)
Detects cycles in the hypernym subgraph of synset . |
void |
dontGrabGlosses()
|
boolean |
exists(String word,
String pos)
Returns whether a given word and part of speech exists in the database. |
int[][] |
getAllParentsBFS(String pos,
int synset)
Returns a breadth-first search of the hypernym subgraph of synset . |
int[] |
getAllSynsets(String word,
String pos)
Gets all synsets for word and pos . |
Enumeration |
getCachedDataEntries(int posIdx)
Returns an Enumeraton to allow iteration over all cached
data entries. |
Enumeration |
getCachedDataEntries(String pos)
Returns an Enumeraton to allow iteration over all cached
data entries. |
Enumeration |
getCachedIndexEntries(int posIdx)
Returns an Enumeraton to allow iteration over all cached
index entries. |
Enumeration |
getCachedIndexEntries(String pos)
Returns an Enumeraton to allow iteration over all cached
index entries. |
static String |
getCanonicalPos(String pos)
Returns the canonical String object for the specified
part of speech. |
WNPointer[] |
getChildPointers(String pos,
int synset)
Gets the child (hyponym) pointers for the synset specified by pos and synset , as defined by pointers that pass
through ChildSemanticPointerFilter . |
int[] |
getChildren(String pos,
int synset)
Gets all synsets that are children (hyponyms) of synset . |
WNDataEntry |
getDataEntry(String pos,
int synset)
Gets the WNDataEntry object for pos and synset . |
boolean |
getGrabGlosses()
|
WNIndexEntry |
getIndexEntry(String word,
String pos)
Gets the WNIndexEntry for word and pos ,
printing errors if there is no such word-pos pair in the database. |
WNIndexEntry |
getIndexEntry(String word,
String pos,
boolean printErrors)
Gets the WNIndexEntry for word and pos . |
int |
getMaxDataCacheSize()
Returns the maximum size for data entry caching, or -1 if the cache size is unlimited. |
int |
getMaxIndexCacheSize()
Returns the maximum size for index entry caching, or -1 if the cache size is unlimited. |
WNPointer[] |
getParentPointers(String pos,
int synset)
Gets the parent (hypernym) pointers for the synset specified by pos and synset , as defined by pointers that pass
through ParentSemanticPointerFilter . |
int[] |
getParents(String pos,
int synset)
Gets all synsets that are parents (hypernyms) of synset . |
static String |
getPos(int idx)
Returns the String representation of the part of speech specified by idx . |
static int |
getPosIdx(char pos)
Returns the index of the specified WordNet part of speech. |
static int |
getPosIdx(String pos)
Returns the index of the WordNet part of speech specified by pos . |
static String |
getPosName(int idx)
Gets the unabbreviated name for the WordNet part of speech specified by idx . |
static String |
getPosName(String pos)
Gets the unabbreviated name for pos . |
int |
getSenseNum(String word,
String pos,
int synset)
Gets the sense number for the passed synset of
word and pos . |
int |
getSynset(String word,
String pos,
int senseNum)
Gets the senseNum th synset for word
and pos . |
WNWord |
getSynsetName(String pos,
int synset)
Gets the name of a synset. |
boolean |
getUseAllParentsCache()
Returns whether getAllParentsBFS(java.lang.String, int) is doing caching. |
boolean |
getUseCache()
Returns whether this WordNet object is doing caching. |
protected static String |
getVersion()
Gets the value of the environment variable (property) WNDBVERSION. |
String |
getWNDataDir()
Gets the full path of the directory of the database files. |
String |
getWNDBversion()
Returns the String that represents the current version
of the WordNet database used by this object. |
protected static String |
getWNHOME()
Gets the environment variable WNHOME. |
void |
grabGlosses()
Indicates whether this WordNet object should grab glosses
from the data files when constructor its internal WNDataEntry
objects. |
static boolean |
isAdjPos(String pos)
Returns true of pos is string-equal to
adjPos . |
static boolean |
isAdjSatellite(String pos)
Returns true if and only if pos is string-equal
to "s" , representing an adjective satellite
in the WordNet database. |
static boolean |
isAdvPos(String pos)
Returns true of pos is string-equal to
advPos . |
static boolean |
isNounPos(String pos)
Returns true of pos is string-equal to
nounPos . |
static boolean |
isVerbPos(String pos)
Returns true of pos is string-equal to
verbPos . |
static String |
look(String key,
RandomAccessFile file)
Returns the line of file that begins with key ,
using a binary search. |
static void |
main(String[] args)
Allows users to save a WordNet object to a Java object file, with control over how much is cached. usage: <WNHOME> [-cache <n | v | a | r | all>]* <object output filename> |
static int[] |
makeUnique(int[] arr)
Utility method that returns an array containing the unique elements of arr . |
static Vector |
makeUnique(Vector v)
Utility method to make all the elements of the Vector v
pairwise unique, as determined by their equals methods. |
static Vector |
makeVector(int[] arr)
Makes a Vector object of Integer objects,
initialized from the int array. |
static boolean |
notAdjPos(String pos)
Returns true of pos is not string-equal to
adjPos . |
static boolean |
notAdvPos(String pos)
Returns true of pos is not string-equal to
advPos . |
static boolean |
notNounPos(String pos)
Returns true of pos is not string-equal to
nounPos . |
static boolean |
notVerbPos(String pos)
Returns true of pos is not string-equal to
verbPos . |
protected static void |
printError(String msg)
Prints the error message msg , prepended by a class identifier
as would be printed by getClass().getName() . |
void |
save(String filename)
Saves this WordNet object to the file specified by filename . |
void |
setAllParentsCache(boolean useAllParentsCache)
Indicates whether getAllParentsBFS(java.lang.String, int) should perform caching. |
void |
setMaxDataCacheSize(int newsize)
Sets the maximum cache size for the data entry cache. |
void |
setMaxIndexCacheSize(int newsize)
Sets the maximum cache size for the index entry cache. |
void |
setUseCache(boolean useCache)
Sets the caching mode specified by useCache . |
protected void |
setWNDBversion(String version)
Sets the version of the WordNet database files used for lookup to be version . |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
protected static final String fileSep
System.getProperty("file.separator")
.
public static final String v15
setWNDBversion(java.lang.String)
public static final String v16
setWNDBversion(java.lang.String)
public static final String verbPos
public static final String nounPos
public static final String adjPos
public static final String advPos
public static final int nounPosIdx
public static final int verbPosIdx
public static final int adjPosIdx
public static final int advPosIdx
public static final int minPosIdx
public static final int maxPosIdx
public static final int numPartsOfSpeech
protected static final int[] posIdxMap
protected static final String[] partsOfSpeech
protected static final String[] posNames
protected final String className
this.getClass().getName()
.
protected Hashtable[] dataEntryCache
WNDataEntry
objects.
protected Hashtable[] indexEntryCache
WNIndexEntry
objects.
protected Hashtable[] parentsCache
getParents(java.lang.String, int)
protected Hashtable[] childrenCache
getChildren(java.lang.String, int)
protected boolean[] allDataEntriesCached
protected boolean[] allIndexEntriesCached
protected Set[] knownLemmas
protected Set[] knownSynsets
protected boolean lemmasAndSynsetsCached
cacheLemmasAndSynsets()
has been called.
protected Hashtable[] allParentsCache
getAllParentsBFS(java.lang.String, int)
.
protected Set[] backEdges
detectCycle(java.util.Set, java.lang.String, int)
protected Set[] subtreesExplored
detectCycle(java.util.Set, java.lang.String, int)
.
protected Map canonical
public static boolean secretFlag
Constructor Detail |
public WordNet()
WNHOME + fileSep + "dict" + fileSep
, where
WNHOME
is defined by a system property.
public WordNet(boolean useCache)
WNHOME + fileSep +
"dict" + fileSep
. If WNHOME
was not set as
a system property, then a RuntimeException
is thrown.
useCache
- indicates whether to perform caching of index and data file
entries.public WordNet(String WNHOME)
WNHOME + fileSep + "dict" + fileSep
.
WNHOME
- specifies the directory where WordNet is installedpublic WordNet(String WNHOME, boolean useCache)
WNHOME + fileSep +
"dict" + fileSep
.
WNHOME
- specifies the directory where WordNet is installeduseCache
- indicates whether to perform caching of index and data file
entries.public WordNet(String verbIndexFilename, String verbDataFilename, String nounIndexFilename, String nounDataFilename, String adjIndexFilename, String adjDataFilename, String advIndexFilename, String advDataFilename, boolean useCache)
verbIndexFilename
- the index file for verbs.verbDataFilename
- the data file for verbs.nounIndexFilename
- the index file for nouns.nounDataFilename
- the data file for nouns.adjIndexFilename
- the index file for adjectives.adjDataFilename
- the data file for adjectives.advIndexFilename
- the index file for adverbs.advDataFilename
- the data file for adverbs.useCache
- indicates whether to perform caching.Method Detail |
protected static final String getWNHOME()
null
.
protected static final String getVersion()
v16
.
public boolean contains(String word, String pos)
word
- a lemma in WordNet.pos
- a WordNet part of speech.public boolean exists(String word, String pos)
contains(java.lang.String, java.lang.String)
.
word
- a lemma in WordNet.pos
- a WordNet part of speech.public WNIndexEntry getIndexEntry(String word, String pos)
word
and pos
,
printing errors if there is no such word-pos pair in the database.
word
- a lemma in WordNet.pos
- a WordNet part of speech.public WNIndexEntry getIndexEntry(String word, String pos, boolean printErrors)
word
and pos
.
word
- a lemma in WordNet.pos
- a WordNet part of speech.printErrors
- print errors using printError(java.lang.String)
if true.public WNDataEntry getDataEntry(String pos, int synset)
pos
and synset
.
pos
- a WordNet part of speech.synset
- a synset in WordNet.public int[] getAllSynsets(String word, String pos)
word
and pos
.
word
- a lemma in WordNet.pos
- a WordNet part of speech.WNIndexEntry.getSynsets()
public int getSynset(String word, String pos, int senseNum)
senseNum
th synset for word
and pos
. Note that this is the
senseNum - 1
st element in the array
returned by getAllSynsets(java.lang.String, java.lang.String)
, as sense numbers are 1-indexed.
public int getSenseNum(String word, String pos, int synset)
synset
of
word
and pos
.
N.B.: This method is O(n), where n is the number of
synsets for word
, pos
.
public WNPointer[] getParentPointers(String pos, int synset)
pos
and synset
, as defined by pointers that pass
through ParentSemanticPointerFilter
.
pos
- a WordNet part of speech.synset
- a WordNet synset.public int[] getParents(String pos, int synset)
synset
.
A cache is used for these parent-synset int arrays.
pos
- a WordNet part of speech.synset
- a WordNet synset.public WNPointer[] getChildPointers(String pos, int synset)
pos
and synset
, as defined by pointers that pass
through ChildSemanticPointerFilter
.
pos
- a WordNet part of speechsynset
- a WordNet synsetpublic int[] getChildren(String pos, int synset)
synset
.
A cache is used for these child-synset int arrays.
pos
- a WordNet part of speech.synset
- a WordNet synset.public WNWord getSynsetName(String pos, int synset)
pos
- a WordNet part of speech.synset
- a WordNet synset.
WNWord
object that represents the first lemma
in the list of lemmas for synset
in the database.protected void detectCycle(Set synsetsSeen, String pos, int synset)
synset
.
The hypernym graph is supposed to be acyclic, but errors in the database
(or by grind(1WN)) necessitate the use of this method. All back edges
are added to backEdges
.
synsetsSeen
- a set of synsets seen during the depth-first search;
expected to be empty for the initial call of this method.pos
- a WordNet part of speech.synset
- a WordNet synset.getAllParentsBFS(java.lang.String, int)
public int[][] getAllParentsBFS(String pos, int synset)
synset
.
pos
- a WordNet part of speech.synset
- a WordNet synset.
int
arrays, where the
ith element contains all synsets that have a hyponym
path of length i to synset
. Thus, the first element
is guaranteed to be an array of length 1 containing only
synset
. Furthermore, since adjectives and adverbs do not
have hypernyms, the return value for these two parts of speech is always
of length 1.protected void cacheLemmasAndSynsets()
useCache
is true
.
public void cacheAll()
public void cacheAll(String pos)
useCache
is false
, prints
an error message and returns (no exception is thrown).
pos
- a WordNet part of speech.public void clearCache()
public void clearAllParentsCache()
getAllParentsBFS(java.lang.String, int)
for all parts of
speech.
public void clearAllParentsCache(int posIdx)
getAllParentsBFS(java.lang.String, int)
for the specified
part of speech.
posIdx
- a WordNet part of speech index.public void clearAllParentsCache(String pos)
getAllParentsBFS(java.lang.String, int)
for the specified
part of speech.
pos
- a WordNet part of speech.public boolean getUseCache()
public boolean getUseAllParentsCache()
getAllParentsBFS(java.lang.String, int)
is doing caching.
public int getMaxIndexCacheSize()
public int getMaxDataCacheSize()
public Enumeration getCachedDataEntries(String pos)
Enumeraton
to allow iteration over all cached
data entries.
public Enumeration getCachedDataEntries(int posIdx)
Enumeraton
to allow iteration over all cached
data entries.
public Enumeration getCachedIndexEntries(String pos)
Enumeraton
to allow iteration over all cached
index entries.
public Enumeration getCachedIndexEntries(int posIdx)
Enumeraton
to allow iteration over all cached
index entries.
public String getWNDataDir()
public boolean getGrabGlosses()
public void setUseCache(boolean useCache)
useCache
.
useCache
- if true, causes this object to cache all subsequent
index and data entry lookups and cacheLemmasAndSynsets()
is
called immediately; if false, causes all subsequent lookups to use
file accesses (but does not clear the data and index caches).public void grabGlosses()
WordNet
object should grab glosses
from the data files when constructor its internal WNDataEntry
objects.
WNDataEntry.getGloss()
public void dontGrabGlosses()
public void setAllParentsCache(boolean useAllParentsCache)
getAllParentsBFS(java.lang.String, int)
should perform caching.
public void setMaxIndexCacheSize(int newsize)
newsize
- the maximum index entry cache size; a value of -1 indicates
an unlimited cache size.public void setMaxDataCacheSize(int newsize)
newsize
- the maximum data entry cache size; a value of -1 indicates
an unlimited cache size.public static final boolean isVerbPos(String pos)
pos
is string-equal to
verbPos
.
public static final boolean notVerbPos(String pos)
pos
is not string-equal to
verbPos
.
public static final boolean isNounPos(String pos)
pos
is string-equal to
nounPos
.
public static final boolean notNounPos(String pos)
pos
is not string-equal to
nounPos
.
public static final boolean isAdjPos(String pos)
pos
is string-equal to
adjPos
.
public static final boolean notAdjPos(String pos)
pos
is not string-equal to
adjPos
.
public static final boolean isAdvPos(String pos)
pos
is string-equal to
advPos
.
public static final boolean notAdvPos(String pos)
pos
is not string-equal to
advPos
.
public static final String getPosName(String pos)
pos
.
pos
- a WordNet part of speech.public static final String getPosName(int idx)
idx
.
idx
- a WordNet part of speech index.public static final String getPos(int idx)
idx
.
idx
- a WordNet part of speech inex.public static final int getPosIdx(String pos)
pos
.
public static final int getPosIdx(char pos)
public static final boolean isAdjSatellite(String pos)
true
if and only if pos
is string-equal
to "s"
, representing an adjective satellite
in the WordNet database.
public static final String getCanonicalPos(String pos)
String
object for the specified
part of speech. If pos
is string-equal to
"s"
(indicating an adjective satellite), then
the object adjPos
is returned.
public static final Hashtable clearHalfIfNeeded(Hashtable h, int maxSize)
h
, thereby resulting in a random replacement strategy
(assuming a random hash-bucket dispersion).
h
- the hashtable object to be half-clearedmaxSize
- the threshold at which to clear half of h
;
if h.size() < maxSize
, then no clearing is performed.public static final Vector makeUnique(Vector v)
Vector v
pairwise unique, as determined by their equals
methods.
public static final int[] makeUnique(int[] arr)
arr
.
public static final Vector makeVector(int[] arr)
Vector
object of Integer
objects,
initialized from the int
array.
protected static void printError(String msg)
msg
, prepended by a class identifier
as would be printed by getClass().getName()
.
public static String look(String key, RandomAccessFile file)
file
that begins with key
,
using a binary search.
protected void setWNDBversion(String version)
version
. Subclasses should use this method if they are only
going to be used with a particular version of the WordNet database.
Runtime specification of the database version is possible by setting
the system property WNDBVERSION
to either
"1.5"
or "1.6"
on the
command line.
public String getWNDBversion()
String
that represents the current version
of the WordNet database used by this object.
public void save(String filename) throws IOException
filename
.
IOException
public static void main(String[] args)
args
- an argument vector that should conform to that expected
by the usage statement above.
|
Java Access to WordNet | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |