Foundations of Statistical Natural Language Processing
Christopher D. Manning and Hinrich Schütze
Chapter 15: Topics in Information Retrieval
Links referred to in the text
Other links of interest
- We've got a new
Retrieval book, with slides available in draft form.
- For testing out small examples of SVD, and other vector and matrix
computations, your best bet is MatLab. (If you're
a student, you can probably find copies on student computer clusters.)
Or you can try the mostly compatible, free
- Various other matrix software including other SVD packages can be
found at the NetLib website.
- Online courses at Virginia Tech:
- Santosh Vempala on
sampling for faster SVD
- Stemming: there's now an Official Porter stemmer page by the original author (with versions in C, Java, Perl, and other languages). Another
stemmer C code (Frakes & Fox),
another Porter stemmer page,
stemmer C code (Linh Huynh),
stemmer page and
control of stemming (UMass),
Snowball (new stemmer
generator by Porter), and
Lancaster (Paice/Husk) stemming algorithm (with notes on others).
page about TextTiling by Marti Hearst -- which includes a link to
source code implementing the algorithm
Segmentation systems by Freddy Choi: C99, his own system, and an
implementation of the TextTiling system in Java.
- Omseek, formerly
Omsee, formerly Open Muscat, an open source search engine
- Lucene, another open source
text search engine, written in Java (by Doug Cutting)
- The Dragon Toolkit. An
information retrieval and text mining (text classification, clustering,
summarization and topic modeling) toolkit.
- mg, a text search engine by
the authors of Managing Gigabytes (C source)
- Namazu, an open
source C/Perl text search engine.
resources from Glasgow.
- Word document
frequencies computed from the web (UC Berkeley digital
Christopher Manning and Hinrich Schütze -- last modified