To interpret a natural language utterance as a native speaker would, two conceptually distinct tasks must be accomplished. First, the internal structure of the utterance must be identified with a unique symbolic representation. Second, that symbolic representation must be interpreted semantically, and reconciled with background knowledge in a way such that inference is possible. The QuASI project attempts to address these tasks by connecting two current lines of research in natural language processing.
The first line of research involves recent work in corpus-based statistical NLP techniques, which in the last decade have achieved considerable results in correctly identifying and extracting natural language structures from unstructured text. A major part of our research is the automatic extraction of of semantic relationships from natural language text. We have made considerable progress in dealing with compound nouns: identifying abbreviations in text with the compound nouns they correspond to, and distinguishing the variety of semantic relationships that may hold between the members of a compound noun phrase.
We also have work in progress on identifying semantic roles distributed throughout sentences. We use the FrameNet semantic role and frame ontology, in which an individual target word can evoke a frame, which carries with it a set of semantic roles that are associated with words and phrases in the sentence that participate with the target word in the construal of an event. So a Directional Motion event can have associated with it a Theme that undergoes motion; a Source from which motion initiates; a Path along which motion continues; and a Goal, the endpoint of motion.
Mortars are shorter, heavier weapons than cannons, designed to lob an explosive shell high into the air so that [<Theme>it] dropsTgt [<Path>down] [<Goal>on the target] [<Source>from the sky].
In order to accurately identify semantic roles for a target in a given sentence, we start with parsing, which assigns hierarchical structures to the sentence in accordance with the functional relationships between sentence elements. Part of our research focuses on improved parsing techniques. One recent development in this area is a factored parser which builds syntactic and semantic sentence structures in parallel and then combines them, allowing fast, exact parsing without sacrificing accuracy.
We then use a simple, generative semantic role model to identify which phrases are assigned semantic roles by the target.
The second phase of the project involves research on reasoning based on semantic representations. We model semantic interpretation through simulated executable semantics in an extended Petri Net formalism, and use a combination of dynamic simulation and probabilistic reasoning for inference. More recently we have connected the FrameNet role and frame ontology with our model of the executable schema. We are also interested in the Semantic Web as an open-ended source of Web-based semantic information, and have done work linking Semantic Web ontologies with our model of inference through simulation.
Some QuASI Software is available.
This work was supported in part by the Advanced Research and Development Activity (ARDA)'s Advanced Question Answering for Intelligence (AQUAINT) Program.