Foundations of Statistical Natural Language Processing
Christopher D. Manning and Hinrich Schütze
Chapter 15: Topics in Information Retrieval
Links referred to in the text
Teaching materials
Other links of interest
- We've got a new
Information
Retrieval book, with slides available in draft form.
- For testing out small examples of SVD, and other vector and matrix
computations, your best bet is MatLab. (If you're
a student, you can probably find copies on student computer clusters.)
Or you can try the mostly compatible, free
Octave.
- Various other matrix software including other SVD packages can be
found at the NetLib website.
- Online courses at Virginia Tech:
- Santosh Vempala on
sampling for faster SVD
- Stemming: there's now an Official Porter stemmer page by the original author (with versions in C, Java, Perl, and other languages). Another
Porter
stemmer C code (Frakes & Fox),
another Porter stemmer page,
Lovins
stemmer C code (Linh Huynh),
Unofficial Lovins
stemmer page and
Lovins sourceforge
site,
corpus-based
control of stemming (UMass),
Snowball (new stemmer
generator by Porter), and
The
Lancaster (Paice/Husk) stemming algorithm (with notes on others).
- A
page about TextTiling by Marti Hearst -- which includes a link to
source code implementing the algorithm
- Text
Segmentation systems by Freddy Choi: C99, his own system, and an
implementation of the TextTiling system in Java.
- Omseek, formerly
Omsee, formerly Open Muscat, an open source search engine
- Lucene, another open source
text search engine, written in Java (by Doug Cutting)
- The Dragon Toolkit. An
information retrieval and text mining (text classification, clustering,
summarization and topic modeling) toolkit.
- mg, a text search engine by
the authors of Managing Gigabytes (C source)
- Namazu, an open
source C/Perl text search engine.
- IR
resources from Glasgow.
- Word document
frequencies computed from the web (UC Berkeley digital
libraries project)
Christopher Manning and Hinrich Schütze -- last modified