Foundations of Statistical Natural Language Processing
Christopher D. Manning and Hinrich Schütze
Chapter 1: Introduction
Links referred to in the text
The
Brown corpus
is available from ICAME (a tagged version of the Brown corpus is also available with the Penn Treebank)
The Lancaster-Oslo-Bergen corpus is available from
ICAME
Susanne corpus
The Penn Treebank project
(
LDC Catalog entry
)
Canadian Hansards:
searchable interface for 1986-93
and
LDC catalog entry
.
WordNet
Tom Sawyer:
local copy
or available from and prepared by
The University of Virginia Electronic Text Center
, the
Oxford Text Archive
, and
Project Gutenberg
(perhaps try the
Sailor's Project Gutenberg site mirror
).
Teaching materials
Acrobat slides for figures and tables in section 1.4 Dirty Hands
Other links
Wentian Li on Zipf's Law
The Hansards (British Columbia)
Java interfaces to WordNet:
JWNL
(jdidion)
Dan Bikel's
JWordNet