The QuASI Project: Question Answering with Statistics and Inference

To interpret a natural language utterance as a native speaker would, two conceptually distinct tasks must be accomplished. First, the internal structure of the utterance must be identified with a unique symbolic representation. Second, that symbolic representation must be interpreted semantically, and reconciled with background knowledge in a way such that inference is possible. The QuASI project attempts to address these tasks by connecting two current lines of research in natural language processing.

The first line of research involves recent work in corpus-based statistical NLP techniques, which in the last decade have achieved considerable results in correctly identifying and extracting natural language structures from unstructured text. A major part of our research is the automatic extraction of of semantic relationships from natural language text. We have made considerable progress in dealing with compound nouns: identifying abbreviations in text with the compound nouns they correspond to, and distinguishing the variety of semantic relationships that may hold between the members of a compound noun phrase.

We also have work in progress on identifying semantic roles distributed throughout sentences. We use the FrameNet semantic role and frame ontology, in which an individual target word can evoke a frame, which carries with it a set of semantic roles that are associated with words and phrases in the sentence that participate with the target word in the construal of an event. So a Directional Motion event can have associated with it a Theme that undergoes motion; a Source from which motion initiates; a Path along which motion continues; and a Goal, the endpoint of motion.

Mortars are shorter, heavier weapons than cannons, designed to lob an explosive shell high into the air so that [_<Theme>it] drops^Tgt [_<Path>down] [_<Goal>on the target] [_<Source>from the sky].

In order to accurately identify semantic roles for a target in a given sentence, we start with parsing, which assigns hierarchical structures to the sentence in accordance with the functional relationships between sentence elements. Part of our research focuses on improved parsing techniques. One recent development in this area is a factored parser which builds syntactic and semantic sentence structures in parallel and then combines them, allowing fast, exact parsing without sacrificing accuracy.

We then use a simple, generative semantic role model to identify which phrases are assigned semantic roles by the target.

The second phase of the project involves research on reasoning based on semantic representations. We model semantic interpretation through simulated executable semantics in an extended Petri Net formalism, and use a combination of dynamic simulation and probabilistic reasoning for inference. More recently we have connected the FrameNet role and frame ontology with our model of the executable schema. We are also interested in the Semantic Web as an open-ended source of Web-based semantic information, and have done work linking Semantic Web ontologies with our model of inference through simulation.

Selected Publications

Compound Nouns
- Ariel Schwartz and Marti Hearst. 2003. A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text. In the proceedings of the Pacific Symposium on Biocomputing (PSB 2003) Kauai, Jan 2003. [PDF]
- Barbara Rosario and Marti Hearst. Classifying the Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy. In the Proceedings of EMNLP '01, Pittsburgh, PA, June 2001. [PDF]
Parsing
- Dan Klein and Christopher D. Manning. 2002. Fast Exact Inference with a Factored Model for Natural Language Parsing. To appear in Suzanna Becker, Sebastian Thrun, and Klaus Obermayer (eds), Advances in Neural Information Processing Systems 15 (NIPS 2002). [PDF]
Inference
- Nancy Chang, Srini Narayanan, and Miriam R.L. Petruck. 2002. Putting Frames in Perspective. In Proceedings of COLING 2002. [PDF]
- Srini Narayanan and Sheila A. McIlraith. 2002. Simulation, Verification and Automated Composition of Web Services. Eleventh International World Wide Web Conference. [PDF]

[Complete list of publications]

Project Personnel

Berkeley
- Marti Hearst
- Barbara Rosario
- Ariel Schwartz
International Computer Science Institute
- Nancy Chang
- Jerry Feldman
- Srini Narayanan
Stanford
- Roger Levy
- Christopher Manning
- Cindi Thompson

Software

Some QuASI Software is available.

Acknowledgements

This work was supported in part by the Advanced Research and Development Activity (ARDA)'s Advanced Question Answering for Intelligence (AQUAINT) Program.

Roger Levy

Last modified: Thu Jan 23 11:00:23 PST 2003