It is a long-standing dream of AI to have algorithms automatically read and obtain knowledge from text. By applying a learning algorithm to parsed text, we have developed methods that can automatically identify the concepts in the text and the relations between them. For example, reading the phrase "heavy water rich in the doubly heavy hydrogen atom called deuterium", our algorithm learns (and adds to its semantic network) the fact that deuterium is a type of atom (Snow et al., 2005). By applying this procedure (and extensions: Snow et al., 2006, Snow et al., 2007) to large amounts of text, our algorithms automatically acquires hundreds of thousands of items of world knowledge, and uses these to produce significantly enhanced versions of WordNet (made freely available online). WordNet (a laboriously hand-coded database) is a \major NLP resource, but has proven to be very expensive to manually build and maintain. By automatically inducing knowledge to add to WordNet, our work provides an even greater NLP resource (e.g., significantly greater precision/recall in identifying various relations), but at a tiny fraction of the cost.
The Stanford Wordnet Project Homepage: http://ai.stanford.edu/~rion/swn
For any comments or questions, please e-mail Rion Snow.