Focusing on Linguistic Representations

Chris Manning
Stanford University

Abstract

In recent years, much of computational linguistics has come to look like just a branch of applied machine learning, and the general perception has been that learning about and doing machine learning is where all the cool stuff is. This strategy was very successful for a while -- indeed, it revolutionized NLP and placed it on a more scientific and broader base. But it's never been the whole story: much of the difference between systems has always been in the representations and the choice of features of different systems. Developing these is a matter of doing good linguistics (if in a somewhat more empirical and quantitative way than theoretical linguists in the U.S. normally consider). Looking to the future, it seems to me much less likely that profound progress is going to come from new machine learning technology (we've pretty much caught up with what statisticians and machine learning people have discovered in 400 years), and much more likely that progress is going to come from better linguistics. This is both because this side of things has been largely ignored for a decade, and because issues of representation become more prominent when people are focusing on deeper problems of NLP involving semantics and discourse. In this talk I will look at recent work of mine and others on information extraction, parsing, and grammar induction, from the viewpoint of emphasizing how much of the action is in developing good linguistic models.

Bio

Christopher Manning is an assistant professor of computer science and linguistics at Stanford University. Previously, he has held faculty positions at Carnegie Mellon University and the University of Sydney. His research interests include probabilistic natural language processing, syntax, computational lexicography, information extraction and text mining. He is the author of three books, including Foundations of Statistical Natural Language Processing (MIT Press, 1999, with Hinrich Schuetze).