The Stanford Natural Language Processing Group

Knowledge Acquisition from Text

Dekang Lin
Google Inc.

Abstract

Text is arguably the richest repository of human knowledge. Two approaches have commonly been adopted in knowledge acquisition from text. One is to define specific patterns and extract instances matching these patterns from a text collection. This has been used to find relationships between words, such as is-a and part-whole. Another approach is based on indirect associations between words in text, as exemplified by many methods for computing word similarity. I will present several extensions and generalizations of the previous algorithms and show that seemingly deep linguistic or world knowledge may be acquired with superficial statistics.