LSA 354: Statistical Parsing

LSA 354
Seven Lectures on Statistical Parsing
LSA 2007 Linguistic Institute

COURSE INFORMATION

Instructor

Chris Manning, manning@cs.stanford.edu Office: Gates 158. Office hours: Thu 10-12.

Time
T/F 8:00-9:45 AM. (I, of course, apologize for the time, which isn't the one I would have chosen either.)

Location
Hewlett 101. (That's the SEQ Teaching Center. Not the huge room.)

Reading
There is no assigned text. However, if you want some good background to get you started look at:

The new edition of Jurafsky and Martin. 2008. Speech and Language Processing, especially Chapters 10-12. Currently available free online.

Manning and Schütze. 1999. Foundations of Statistical Natural Language Processing, especially Chapters 11-12. That link should let you read the book iff you are on Stanford campus.

Otherwise, we'll make good use of the lecture handouts and free, online conference papers.

Description
Over the last decade, statistical parsing has transformed our ability to produce automatic, high-accuracy parses of arbitrary human language text. This course aims to teach from the basics up to the state-of-the-art in this domain. It will begin by reviewing the phenomena that motivated statistical approaches to parsing, context-free grammars (CFGs), and probabilistic CFGs. Next it will present basic parsing algorithms, concentrating on generalized CKY and A* parsing algorithms, and discuss treebanks, their design and nature, and the methods of building and evaluating parsers based on them. The course will then turn to the well-known and successful Collins and Charniak generative parsing models of the late 1990s, and discuss issues such as smoothing, head lexicalization, engineering for efficiency, and what kinds of information parsers use and need. Finally, we will turn to discriminative methods of parsing, and discuss both parse re-ranking techniques and the direct construction of discriminative parsers.

Prerequisites
Prerequisites: Reasonable familiarity and competence with mathematical notation, probability, algorithmic thinking, and programming. (We're just going to dive into statistical parsing assuming that you already know about empirical computational linguistics, probabilities, and algorithms. If you should be taking LSA 325, I hope you are!)

Required Work
If enrolled for credit, you need to complete a project for this class! You should do one of the following:

The Stanford cs224n parsing assignment. This is highly recommended if you don't have previous experience writing a statistical parser. But it's plenty of work. The project is to be completed in Java -- we provide a fair bit of starter code.

Writing a parser within NLTK (in Python). One obvious hole in NLTK's coverage is that there is no dependency parser. But there are various other options, such as extending the bottom up PCFG chart parser to be an A* parser.
A written problem set.
Adapt an existing parser to work with another language or treebank (this may be very little or considerable work), and then to do some substantive experimentation on how differences in the annotation scheme or language affect parser performance, and what could be done to improve it, and to write some stuff up about that.

Another project with the agreement of the instructor.

You should also look through the first two pages of the Stanford cs224n parsing assignment for a bit of introductory information on the computing environment at Stanford: machines you can log in to, where the Penn Treebank lives, etc.

SCHEDULE

Date

HW

Lec

Topic and Readings

Fri July 6
1.
Overview of Course, Parsing and Statistical Parsing [ppt] [pdf]

Motivation for statistical parsing, recursive phrase structure. Attachment decisions and probabilities. The Penn Treebank. Top-down, and bottom-up parsing.

Introductory background reading: Eugene Charniak. 1997. Statistical techniques for natural language parsing. AI Magazine.

Jurafsky and Martin, 12.0-12.3, if you don't know much syntax.
Jurafsky and Martin, 12.4, for treebanks.
Jurafsky and Martin, 13.0-13.3 for parsing as search

Tue July 10
Assignments announced.
2.
PCFGs and the CKY algorithm [ppt] [pdf]

PCFGs. Grammar transformations. Recursive parsing and memoization. Dynamic programming for parsing: the CKY algorithm.

Jurafsky and Martin, sections 13.4.0-13.4.1 CKY parsing

Jurafsky and Martin, sections 14.0-14.2, PCFGs and probabilistic CKY parsing

Fri July 13
[Black Friday!] 3.
Generalized CKY parsing and unlexicalized parsing [ppt] [pdf]

Unaries and empties. Parser evaluation. Improving the context-freedom assumptions of grammars: accurate unlexicalized parsing.

Dan Klein and Christopher D. Manning. 2001. Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank. Proceedings of the 39th Annual Meeting of the ACL, pp. 330-337.
Dan Klein and Christopher D. Manning. 2003. Accurate Unlexicalized Parsing. ACL 2003, pp. 423-430.

Tue July 17
4.
Search in parsing and lexicalized probabilistic parsing [ppt] [pdf]

Beam parsing. Agenda-based (chart) parsing. A* parsing. Lexicalized probabilistic context-free grammars: The Charniak (1997) model.

Dan Klein and Christopher D. Manning. 2001. Parsing and Hypergraphs. Proceedings of the 7th International Workshop on Parsing Technologies (IWPT-2001), pp. 123-134.
Dan Klein and Christopher D. Manning. 2003. A* Parsing: Fast Exact Viterbi Parse Selection. HLT-NAACL 2003.
Sharon A. Caraballo and Eugene Charniak. 1998. New Figures of Merit for Best-First Probabilistic Chart Parsing. Computational Linguistics 24: 275-298.
Slav Petrov and Dan Klein. 2007. Improved Inference for Unlexicalized Parsing. North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2007).
Eugene Charniak. 1997. Statistical parsing with a context-free grammar and word statistics. Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI 1997).

Fri July 20
5.
Treebanks and statistical parsing [ppt] [pdf]

Lexicalized parsing: Collins (1997/1999). The status of information in treebanks.

Michael Collins. 1997. Three Generative, Lexicalised Models for Statistical Parsing. Proceedings of the 35th Annual Meeting of the ACL, Madrid.
Michael Collins. 2003. Head-Driven Statistical Models for Natural Language Parsing
Daniel M. Bikel. 2004. A Distributional Analysis of a Lexicalized Statistical Parsing Model. EMNLP 2004.

Tue July 24
Assignments due
6.
Multilingual parsing and dependency parsing [ppt] [pdf]

Roger Levy and Christopher D. Manning. 2003. Is it harder to parse Chinese, or the Chinese Treebank?. ACL 2003, pp. 439-446.
Eisner, Jason and Giorgio Satta (1999). Efficient parsing for bilexical context-free grammars and head automaton grammars. Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 457-464.

Fri July 27
7.
Discriminative parsing [ppt] [pdf]

An introduction to discriminative parsing. Features in discriminative parsers, presented using some of Mark Johnson's slides on discriminative reranking.

Michael Collins. 2004. Parameter Estimation for Statistical Parsing Models: Theory and Practice of Distribution-Free Methods. In Harry Bunt, John Carroll and Giorgio Satta, editors, New Developments in Parsing Technology, Kluwer.
Ryan McDonald, Koby Crammer and Fernando Pereira. 2005. Online Large-Margin Training of Dependency Parsers. 43rd Annual Meeting of the Association for Computational Linguistics.
Eugene Charniak and Mark Johnson. 2005. Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking 43rd Annual Meeting of the Association for Computational Linguistics.

Musical finale.

COURSE INFORMATION
Instructor	Chris Manning, manning@cs.stanford.edu Office: Gates 158. Office hours: Thu 10-12.
Time	T/F 8:00-9:45 AM. (I, of course, apologize for the time, which isn't the one I would have chosen either.)
Location	Hewlett 101. (That's the SEQ Teaching Center. Not the huge room.)
Reading	There is no assigned text. However, if you want some good background to get you started look at: The new edition of Jurafsky and Martin. 2008. Speech and Language Processing, especially Chapters 10-12. Currently available free online. Manning and Schütze. 1999. Foundations of Statistical Natural Language Processing, especially Chapters 11-12. That link should let you read the book iff you are on Stanford campus. Otherwise, we'll make good use of the lecture handouts and free, online conference papers.
Description	Over the last decade, statistical parsing has transformed our ability to produce automatic, high-accuracy parses of arbitrary human language text. This course aims to teach from the basics up to the state-of-the-art in this domain. It will begin by reviewing the phenomena that motivated statistical approaches to parsing, context-free grammars (CFGs), and probabilistic CFGs. Next it will present basic parsing algorithms, concentrating on generalized CKY and A* parsing algorithms, and discuss treebanks, their design and nature, and the methods of building and evaluating parsers based on them. The course will then turn to the well-known and successful Collins and Charniak generative parsing models of the late 1990s, and discuss issues such as smoothing, head lexicalization, engineering for efficiency, and what kinds of information parsers use and need. Finally, we will turn to discriminative methods of parsing, and discuss both parse re-ranking techniques and the direct construction of discriminative parsers.
Prerequisites	Prerequisites: Reasonable familiarity and competence with mathematical notation, probability, algorithmic thinking, and programming. (We're just going to dive into statistical parsing assuming that you already know about empirical computational linguistics, probabilities, and algorithms. If you should be taking LSA 325, I hope you are!)
Required Work	If enrolled for credit, you need to complete a project for this class! You should do one of the following: The Stanford cs224n parsing assignment. This is highly recommended if you don't have previous experience writing a statistical parser. But it's plenty of work. The project is to be completed in Java -- we provide a fair bit of starter code. Writing a parser within NLTK (in Python). One obvious hole in NLTK's coverage is that there is no dependency parser. But there are various other options, such as extending the bottom up PCFG chart parser to be an A* parser. A written problem set. Adapt an existing parser to work with another language or treebank (this may be very little or considerable work), and then to do some substantive experimentation on how differences in the annotation scheme or language affect parser performance, and what could be done to improve it, and to write some stuff up about that. Another project with the agreement of the instructor. You should also look through the first two pages of the Stanford cs224n parsing assignment for a bit of introductory information on the computing environment at Stanford: machines you can log in to, where the Penn Treebank lives, etc.

	SCHEDULE
Date	HW	Lec	Topic and Readings
Fri July 6		1.	Overview of Course, Parsing and Statistical Parsing [ppt] [pdf] Motivation for statistical parsing, recursive phrase structure. Attachment decisions and probabilities. The Penn Treebank. Top-down, and bottom-up parsing. Introductory background reading: Eugene Charniak. 1997. Statistical techniques for natural language parsing. AI Magazine. Jurafsky and Martin, 12.0-12.3, if you don't know much syntax. Jurafsky and Martin, 12.4, for treebanks. Jurafsky and Martin, 13.0-13.3 for parsing as search
Tue July 10	Assignments announced.	2.	PCFGs and the CKY algorithm [ppt] [pdf] PCFGs. Grammar transformations. Recursive parsing and memoization. Dynamic programming for parsing: the CKY algorithm. Jurafsky and Martin, sections 13.4.0-13.4.1 CKY parsing Jurafsky and Martin, sections 14.0-14.2, PCFGs and probabilistic CKY parsing
Fri July 13 [Black Friday!]		3.	Generalized CKY parsing and unlexicalized parsing [ppt] [pdf] Unaries and empties. Parser evaluation. Improving the context-freedom assumptions of grammars: accurate unlexicalized parsing. Dan Klein and Christopher D. Manning. 2001. Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank. Proceedings of the 39th Annual Meeting of the ACL, pp. 330-337. Dan Klein and Christopher D. Manning. 2003. Accurate Unlexicalized Parsing. ACL 2003, pp. 423-430.
Tue July 17		4.	Search in parsing and lexicalized probabilistic parsing [ppt] [pdf] Beam parsing. Agenda-based (chart) parsing. A* parsing. Lexicalized probabilistic context-free grammars: The Charniak (1997) model. Dan Klein and Christopher D. Manning. 2001. Parsing and Hypergraphs. Proceedings of the 7th International Workshop on Parsing Technologies (IWPT-2001), pp. 123-134. Dan Klein and Christopher D. Manning. 2003. A* Parsing: Fast Exact Viterbi Parse Selection. HLT-NAACL 2003. Sharon A. Caraballo and Eugene Charniak. 1998. New Figures of Merit for Best-First Probabilistic Chart Parsing. Computational Linguistics 24: 275-298. Slav Petrov and Dan Klein. 2007. Improved Inference for Unlexicalized Parsing. North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2007). Eugene Charniak. 1997. Statistical parsing with a context-free grammar and word statistics. Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI 1997).
Fri July 20		5.	Treebanks and statistical parsing [ppt] [pdf] Lexicalized parsing: Collins (1997/1999). The status of information in treebanks. Michael Collins. 1997. Three Generative, Lexicalised Models for Statistical Parsing. Proceedings of the 35th Annual Meeting of the ACL, Madrid. Michael Collins. 2003. Head-Driven Statistical Models for Natural Language Parsing Daniel M. Bikel. 2004. A Distributional Analysis of a Lexicalized Statistical Parsing Model. EMNLP 2004.
Tue July 24	Assignments due	6.	Multilingual parsing and dependency parsing [ppt] [pdf] Roger Levy and Christopher D. Manning. 2003. Is it harder to parse Chinese, or the Chinese Treebank?. ACL 2003, pp. 439-446. Eisner, Jason and Giorgio Satta (1999). Efficient parsing for bilexical context-free grammars and head automaton grammars. Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 457-464.
Fri July 27		7.	Discriminative parsing [ppt] [pdf] An introduction to discriminative parsing. Features in discriminative parsers, presented using some of Mark Johnson's slides on discriminative reranking. Michael Collins. 2004. Parameter Estimation for Statistical Parsing Models: Theory and Practice of Distribution-Free Methods. In Harry Bunt, John Carroll and Giorgio Satta, editors, New Developments in Parsing Technology, Kluwer. Ryan McDonald, Koby Crammer and Fernando Pereira. 2005. Online Large-Margin Training of Dependency Parsers. 43rd Annual Meeting of the Association for Computational Linguistics. Eugene Charniak and Mark Johnson. 2005. Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking 43rd Annual Meeting of the Association for Computational Linguistics. Musical finale.