First Semester, 1999
Department of Linguistics, University of Sydney
Christopher Manning
Lecturer: Chris Manning
Transient Building, 243B
Phone: 9351-7516
Email: cmanning@mail.usyd.edu.au
Office Hours: TBA. See the sign on the door.
Lecture times: Tue 2-4, (13 weeks)
Lecture location: Transient 202. However, in later weeks,
we will sometimes spend the second hour of class in
a computer lab.
Credit points: 4
This course will be a general introduction to the foundations of, and selected topics in, computational linguistics, covering both standard rule-based approaches to understanding and producing human languages, and more recent approaches to practical language engineering.
Topics include: (i) Introduction and history; (ii) Parsing and generation with phrase structure grammars, tabular and chart parsing, feature grammars, grammatical topics such as gaps, movement, and semantics; (iii) Corpora and text processing including markup, regular expression searching, collocations, concordances, clustering, and corpus-based parsing and disambiguation; (iv) An introduction to problems and approaches in speech recognition and production, information retrieval and extraction, machine translation, and understanding and generating conversational natural language.
Computer Applications in Linguistics is a course that teaches some of the kind of software packages that are often used in linguistics and humanities computing more generally for purposes of finding key words in online texts, building dictionaries of languages (such as when doing fieldwork), analyzing spectrograms, and similar tasks. It is a course suitable for second year students, and is a "tools" course which teaches a basic familiarity in computer tools for doing linguistic things.
Computational Linguistics is a 3/4th year course. It is devoted to understanding the kinds of algorithms that people use to get computers to be able to understand and produce audio and written material in human languages, such as parsing algorithms, dialogue systems, and so on.
I will expect all students to have a certain willingness and enthusiasm to spend time learning how to use various computer programs. Students are expected to be familiar with the basic concepts of phonology, morphology, and syntax, and have basic familiarity with computers. Assignments/project work will be chosen to be both suitable for and to extend different students' areas of competence coming into the course -- i.e., it is expected that some people will know more about linguistics and others about computer science. You do not have to know how to program to do this course, but, if you do, you will be expected to do some, and if you don't you'll be expected to learn some basic scripting, as well as writing grammars, lexicons, and so on.
Assessment will be based on assignments and project work. You will be expected to do five (hopefully not too onerous) assignments. The lowest mark will be dropped, and the rest will contribute equally to the final mark. Assignments not handed in on time will be penalized unless an extension is negotiated (for medical or religious reasons, etc.). The assignments will count for 60% (15% each) and the project for 40%. The project will be chosen jointly by student and lecturer and will allow extended work on one topic. It should involve something more than just library research, but this might vary between writing a program or doing some kind of corpus-based study or evaluation of existing systems.
I will try to make the assignments not too long and gruesome. However, it's important to realize that you can easily get stumped by a computer problem (whether hardware, general software, or the specific software used by this class), and if you're thrashing around trying to solve such a problem without really knowing what's happening, then you can easily waste hours. Therefore, it is important to work on the assignment early, so that if any problems develop you can ask for help early (as well as asking me, it's fine to ask other students for help if you are having difficulty understanding how to install or use the software, or things like that).
Also, computer systems commonly crash or freeze, floppies get lost,
and so on. You should save what you are working on regularly, and
always keep a backup copy, preferably in a different place. When
working on programs and grammars, a common situation is that while
trying to move from something that 90% works to something that
completely works, you instead move to something that is completely
broken. To avoid much wasted time and premature hair loss, it's
really important that you save lots of half-working solutions so that
you can go back and look at them for ideas, or completely return to
them later. Give them all different names (Ass3 - Almost
works or Ass3 - 17/9 21:18 or whatever).
In addition, I will make availabe various additional readings, handouts, and software guides.
For your reference, two other well-known texts on Computational Linguistics/Natural Language Processing are the following. They are more oriented around code and programs than even Allen.
Allen, Ch. 1. (and Ch. 2 if you don't know much linguistics!)
The course begins with simple formal models of language, and shows how they can be used for modelling simple linguistic phenomena, and how tools based on these formalisms can give practical help to a linguist in searching and using text corpora). It then progresses through the study of the richer world of context free grammars and feature or attribute-value grammars and methods for representing them, and parsing with them, including how to handle features of natural language ranging from long distance movement to semantic interpretation. Finally, there will be a number of lectures giving overviews of other topics.
http://www.sultry.arts.usyd.edu.au/cmanning/courses/compling/