next up previous contents index
Next: Index Up: irbook Previous: References and further reading   Contents   Index

Bibliography

Aberer, Karl.
2001.
P-Grid: A self-organizing access structure for P2P information systems.
In Proc. International Conference on Cooperative Information Systems, pp. 179-194. Springer.

Aizerman, Mark A., Emmanuel M. Braverman, and Lev I. Rozonoér.
1964.
Theoretical foundations of the potential function method in pattern recognition learning.
Automation and Remote Control 25: 821-837.

Akaike, Hirotugu.
1974.
A new look at the statistical model identification.
IEEE Transactions on automatic control 19 (6): 716-723.

Allan, James.
2005.
HARD track overview in TREC 2005: High accuracy retrieval from documents.
In Proc. TREC.

Allan, James, Ron Papka, and Victor Lavrenko.
1998.
On-line new event detection and tracking.
In Proc. SIGIR, pp. 37-45. ACM Press.
DOI: doi.acm.org/10.1145/290941.290954.

Allwein, Erin L., Robert E. Schapire, and Yoram Singer.
2000.
Reducing multiclass to binary: A unifying approach for margin classifiers.
JMLR 1: 113-141.
URL: www.jmlr.org/papers/volume1/allwein00a/allwein00a.pdf.

Alonso, Omar, Sandeepan Banerjee, and Mark Drake.
2006.
GIO: A semantic web application using the information grid framework.
In Proc. WWW, pp. 857-858. ACM Press.
DOI: doi.acm.org/10.1145/1135777.1135913.

Altingövde, Ismail Sengör, Engin Demir, Fazli Can, and Özgür Ulusoy.
2008.
Incremental cluster-based retrieval using compressed cluster-skipping inverted files.
TOIS.
To appear.

Amer-Yahia, Sihem, Chavdar Botev, Jochen Dörre, and Jayavel Shanmugasundaram.
2006.
XQuery full-text extensions explained.
IBM Systems Journal 45 (2): 335-352.

Amer-Yahia, Sihem, Pat Case, Thomas Rölleke, Jayavel Shanmugasundaram, and Gerhard Weikum.
2005.
Report on the DB/IR panel at SIGMOD 2005.
SIGMOD Record 34 (4): 71-74.
DOI: doi.acm.org/10.1145/1107499.1107514.

Amer-Yahia, Sihem, and Mounia Lalmas.
2006.
XML search: Languages, INEX and scoring.
SIGMOD Record 35 (4): 16-23.
DOI: doi.acm.org/10.1145/1228268.1228271.

Anagnostopoulos, Aris, Andrei Z. Broder, and Kunal Punera.
2006.
Effective and efficient classification on a search-engine model.
In Proc. CIKM, pp. 208-217. ACM Press.
DOI: doi.acm.org/10.1145/1183614.1183648.

Anderberg, Michael R.
1973.
Cluster analysis for applications.
Academic Press.

Andoni, Alexandr, Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab Mirrokni.
2006.
Locality-sensitive hashing using stable distributions.
In Nearest Neighbor Methods in Learning and Vision: Theory and Practice. MIT Press.

Anh, Vo Ngoc, Owen de Kretser, and Alistair Moffat.
2001.
Vector-space ranking with effective early termination.
In Proc. SIGIR, pp. 35-42. ACM Press.

Anh, Vo Ngoc, and Alistair Moffat.
2005.
Inverted index compression using word-aligned binary codes.
IR 8 (1): 151-166.
DOI: dx.doi.org/10.1023/B:INRT.0000048490.99518.5c.

Anh, Vo Ngoc, and Alistair Moffat.
2006a.
Improved word-aligned binary compression for text indexing.
IEEE Transactions on Knowledge and Data Engineering 18 (6): 857-861.

Anh, Vo Ngoc, and Alistair Moffat.
2006b.
Pruned query evaluation using pre-computed impacts.
In Proc. SIGIR, pp. 372-379. ACM Press.
DOI: doi.acm.org/10.1145/1148170.1148235.

Anh, Vo Ngoc, and Alistair Moffat.
2006c.
Structured index organizations for high-throughput text querying.
In Proc. SPIRE, pp. 304-315. Springer.

Apté, Chidanand, Fred Damerau, and Sholom M. Weiss.
1994.
Automated learning of decision rules for text categorization.
TOIS 12 (1): 233-251.

Arthur, David, and Sergei Vassilvitskii.
2006.
How slow is the k-means method?
In Proc. ACM Symposium on Computational Geometry, pp. 144-153.

Arvola, Paavo, Marko Junkkari, and Jaana Kekäläinen.
2005.
Generalized contextualization method for XML information retrieval.
In Proc. CIKM, pp. 20-27.

Aslam, Javed A., and Emine Yilmaz.
2005.
A geometric interpretation and analysis of R-precision.
In Proc. CIKM, pp. 664-671. ACM Press.

Ault, Thomas Galen, and Yiming Yang.
2002.
Information filtering in TREC-9 and TDT-3: A comparative analysis.
IR 5 (2-3): 159-187.

Badue, Claudine Santos, Ricardo A. Baeza-Yates, Berthier Ribeiro-Neto, and Nivio Ziviani.
2001.
Distributed query processing using partitioned inverted files.
In Proc. SPIRE, pp. 10-20.

Baeza-Yates, Ricardo, Paolo Boldi, and Carlos Castillo.
2005.
The choice of a damping function for propagating importance in link-based ranking.
Technical report, Dipartimento di Scienze dell'Informazione, Università degli Studi di Milano.

Baeza-Yates, Ricardo, and Berthier Ribeiro-Neto.
1999.
Modern Information Retrieval.
Addison Wesley.

Bahle, Dirk, Hugh E. Williams, and Justin Zobel.
2002.
Efficient phrase querying with an auxiliary index.
In Proc. SIGIR, pp. 215-221. ACM Press.

Baldridge, Jason, and Miles Osborne.
2004.
Active learning and the total cost of annotation.
In Proc. Empirical Methods in Natural Language Processing, pp. 9-16.

Ball, G. H.
1965.
Data analysis in the social sciences: What about the details?
In Proc. Fall Joint Computer Conference, pp. 533-560. Spartan Books.

Banko, Michele, and Eric Brill.
2001.
Scaling to very very large corpora for natural language disambiguation.
In Proc. ACL.

Bar-Ilan, Judit, and Tatyana Gutman.
2005.
How do search engines respond to some non-English queries?
Journal of Information Science 31 (1): 13-28.

Bar-Yossef, Ziv, and Maxim Gurevich.
2006.
Random sampling from a search engine's index.
In Proc. WWW, pp. 367-376. ACM Press.
DOI: doi.acm.org/10.1145/1135777.1135833.

Barroso, Luiz André, Jeffrey Dean, and Urs Hölzle.
2003.
Web search for a planet: The Google cluster architecture.
IEEE Micro 23 (2): 22-28.
DOI: dx.doi.org/10.1109/MM.2003.1196112.

Bartell, Brian Theodore.
1994.
Optimizing ranking functions: A connectionist approach to adaptive information retrieval.
PhD thesis, University of California at San Diego, La Jolla, CA.

Bartell, Brian T., Garrison W. Cottrell, and Richard K. Belew.
1998.
Optimizing similarity using multi-query relevance feedback.
JASIS 49 (8): 742-761.

Barzilay, Regina, and Michael Elhadad.
1997.
Using lexical chains for text summarization.
In Workshop on Intelligent Scalable Text Summarization, pp. 10-17.

Bast, Holger, and Debapriyo Majumdar.
2005.
Why spectral retrieval works.
In Proc. SIGIR, pp. 11-18. ACM Press.
DOI: doi.acm.org/10.1145/1076034.1076040.

Basu, Sugato, Arindam Banerjee, and Raymond J. Mooney.
2004.
Active semi-supervision for pairwise constrained clustering.
In Proc. SIAM International Conference on Data Mining, pp. 333-344.

Beesley, Kenneth R.
1998.
Language identifier: A computer program for automatic natural-language identification of on-line text.
In Languages at Crossroads: Proc. Annual Conference of the American Translators Association, pp. 47-54.

Beesley, Kenneth R., and Lauri Karttunen.
2003.
Finite State Morphology.
CSLI Publications.

Bennett, Paul N.
2000.
Assessing the calibration of naive Bayes' posterior estimates.
Technical Report CMU-CS-00-155, School of Computer Science, Carnegie Mellon University.

Berger, Adam, and John Lafferty.
1999.
Information retrieval as statistical translation.
In Proc. SIGIR, pp. 222-229. ACM Press.

Berkhin, Pavel.
2005.
A survey on pagerank computing.
Internet Mathematics 2 (1): 73-120.

Berkhin, Pavel.
2006a.
Bookmark-coloring algorithm for personalized pagerank computing.
Internet Mathematics 3 (1): 41-62.

Berkhin, Pavel.
2006b.
A survey of clustering data mining techniques.
In Jacob Kogan, Charles Nicholas, and Marc Teboulleeds.), Grouping Multidimensional Data: Recent Advances in Clustering, pp. 25-71. Springer.

Berners-Lee, Tim, Robert Cailliau, Jean-Francois Groff, and Bernd Pollermann.
1992.
World-Wide Web: The information universe.
Electronic Networking: Research, Applications and Policy 1 (2): 74-82.
URL: citeseer.ist.psu.edu/article/berners-lee92worldwide.html.

Berry, Michael, and Paul Young.
1995.
Using latent semantic indexing for multilanguage information retrieval.
Computers and the Humanities 29 (6): 413-429.

Berry, Michael W., Susan T. Dumais, and Gavin W. O'Brien.
1995.
Using linear algebra for intelligent information retrieval.
SIAM Review 37 (4): 573-595.

Betsi, Stamatina, Mounia Lalmas, Anastasios Tombros, and Theodora Tsikrika.
2006.
User expectations from XML element retrieval.
In Proc. SIGIR, pp. 611-612. ACM Press.

Bharat, Krishna, and Andrei Broder.
1998.
A technique for measuring the relative size and overlap of public web search engines.
Computer Networks and ISDN Systems 30 (1-7): 379-388.
DOI: dx.doi.org/10.1016/S0169-7552(98)00127-5.

Bharat, Krishna, Andrei Broder, Monika Henzinger, Puneet Kumar, and Suresh Venkatasubramanian.
1998.
The connectivity server: Fast access to linkage information on the web.
In Proc. WWW, pp. 469-477.

Bharat, Krishna, Andrei Z. Broder, Jeffrey Dean, and Monika Rauch Henzinger.
2000.
A comparison of techniques to find mirrored hosts on the WWW.
JASIS 51 (12): 1114-1122.
URL: citeseer.ist.psu.edu/bharat99comparison.html.

Bharat, Krishna, and Monika R. Henzinger.
1998.
Improved algorithms for topic distillation in a hyperlinked environment.
In Proc. SIGIR, pp. 104-111. ACM Press.
URL: citeseer.ist.psu.edu/bharat98improved.html.

Bishop, Christopher M.
2006.
Pattern Recognition and Machine Learning.
Springer.

Blair, David C., and M. E. Maron.
1985.
An evaluation of retrieval effectiveness for a full-text document-retrieval system.
CACM 28 (3): 289-299.

Blanco, Roi, and Alvaro Barreiro.
2006.
TSP and cluster-based solutions to the reassignment of document identifiers.
IR 9 (4): 499-517.

Blanco, Roi, and Alvaro Barreiro.
2007.
Boosting static pruning of inverted files.
In Proc. SIGIR. ACM Press.

Blandford, Dan, and Guy Blelloch.
2002.
Index compression through document reordering.
In Proc. Data Compression Conference, p. 342. IEEE Computer Society.

Blei, David M., Andrew Y. Ng, and Michael I. Jordan.
2003.
Latent Dirichlet allocation.
JMLR 3: 993-1022.

Boldi, Paolo, Bruno Codenotti, Massimo Santini, and Sebastiano Vigna.
2002.
Ubicrawler: A scalable fully distributed web crawler.
In Proc. Australian World Wide Web Conference.
URL: citeseer.ist.psu.edu/article/boldi03ubicrawler.html.

Boldi, Paolo, Massimo Santini, and Sebastiano Vigna.
2005.
PageRank as a function of the damping factor.
In Proc. WWW.
URL: citeseer.ist.psu.edu/boldi05pagerank.html.

Boldi, Paolo, and Sebastiano Vigna.
2004a.
Codes for the World-Wide Web.
Internet Mathematics 2 (4): 405-427.

Boldi, Paolo, and Sebastiano Vigna.
2004b.
The WebGraph framework I: Compression techniques.
In Proc. WWW, pp. 595-601. ACM Press.

Boldi, Paolo, and Sebastiano Vigna.
2005.
Compressed perfect embedded skip lists for quick inverted-index lookups.
In Proc. SPIRE. Springer.

Boley, Daniel.
1998.
Principal direction divisive partitioning.
Data Mining and Knowledge Discovery 2 (4): 325-344.
DOI: dx.doi.org/10.1023/A:1009740529316.

Borodin, Allan, Gareth O. Roberts, Jeffrey S. Rosenthal, and Panayiotis Tsaparas.
2001.
Finding authorities and hubs from link structures on the World Wide Web.
In Proc. WWW, pp. 415-429.

Bourne, Charles P., and Donald F. Ford.
1961.
A study of methods for systematically abbreviating English words and names.
JACM 8 (4): 538-552.
DOI: doi.acm.org/10.1145/321088.321094.

Bradley, Paul S., and Usama M. Fayyad.
1998.
Refining initial points for K-means clustering.
In Proc. ICML, pp. 91-99.

Bradley, Paul S., Usama M. Fayyad, and Cory Reina.
1998.
Scaling clustering algorithms to large databases.
In Proc. KDD, pp. 9-15.

Brill, Eric, and Robert C. Moore.
2000.
An improved error model for noisy channel spelling correction.
In Proc. ACL, pp. 286-293.

Brin, Sergey, and Lawrence Page.
1998.
The anatomy of a large-scale hypertextual web search engine.
In Proc. WWW, pp. 107-117.

Brisaboa, Nieves R., Antonio Fariña, Gonzalo Navarro, and José R. Paramá.
2007.
Lightweight natural language text compression.
IR 10 (1): 1-33.

Broder, Andrei.
2002.
A taxonomy of web search.
SIGIR Forum 36 (2): 3-10.
DOI: doi.acm.org/10.1145/792550.792552.

Broder, Andrei, S. Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener.
2000.
Graph structure in the web.
Computer Networks 33 (1): 309-320.

Broder, Andrei Z., Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig.
1997.
Syntactic clustering of the web.
In Proc. WWW, pp. 391-404.

Brown, Eric W.
1995.
Execution Performance Issues in Full-Text Information Retrieval.
PhD thesis, University of Massachusetts, Amherst.

Buckley, Chris, James Allan, and Gerard Salton.
1994a.
Automatic routing and ad-hoc retrieval using SMART: TREC 2.
In Proc. TREC, pp. 45-55.

Buckley, Chris, and Gerard Salton.
1995.
Optimization of relevance feedback weights.
In Proc. SIGIR, pp. 351-357. ACM Press.
DOI: doi.acm.org/10.1145/215206.215383.

Buckley, Chris, Gerard Salton, and James Allan.
1994b.
The effect of adding relevance information in a relevance feedback environment.
In Proc. SIGIR, pp. 292-300. ACM Press.

Buckley, Chris, Amit Singhal, and Mandar Mitra.
1995.
New retrieval approaches using SMART: TREC 4.
In Proc. TREC.

Buckley, Chris, and Ellen M. Voorhees.
2000.
Evaluating evaluation measure stability.
In Proc. SIGIR, pp. 33-40.

Burges, Chris, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender.
2005.
Learning to rank using gradient descent.
In Proc. ICML.

Burges, Christopher J. C.
1998.
A tutorial on support vector machines for pattern recognition.
Data Mining and Knowledge Discovery 2 (2): 121-167.

Burner, Mike.
1997.
Crawling towards eternity: Building an archive of the World Wide Web.
Web Techniques Magazine 2 (5).

Burnham, Kenneth P., and David Anderson.
2002.
Model Selection and Multi-Model Inference.
Springer.

Bush, Vannevar.
1945.
As we may think.
The Atlantic Monthly.
URL: www.theatlantic.com/doc/194507/bush.

Büttcher, Stefan, and Charles L. A. Clarke.
2005a.
Indexing time vs. query time: Trade-offs in dynamic information retrieval systems.
In Proc. CIKM, pp. 317-318. ACM Press.
DOI: doi.acm.org/10.1145/1099554.1099645.

Büttcher, Stefan, and Charles L. A. Clarke.
2005b.
A security model for full-text file system search in multi-user environments.
In Proc. FAST.
URL: www.usenix.org/events/fast05/tech/buettcher.html.

Büttcher, Stefan, and Charles L. A. Clarke.
2006.
A document-centric approach to static index pruning in text retrieval systems.
In Proc. CIKM, pp. 182-189.
DOI: doi.acm.org/10.1145/1183614.1183644.

Büttcher, Stefan, Charles L. A. Clarke, and Brad Lushman.
2006.
Hybrid index maintenance for growing text collections.
In Proc. SIGIR, pp. 356-363. ACM Press.
DOI: doi.acm.org/10.1145/1148170.1148233.

Cacheda, Fidel, Victor Carneiro, Carmen Guerrero, and Ángel Viña.
2003.
Optimization of restricted searches in web directories using hybrid data structures.
In Proc. ECIR, pp. 436-451.

Callan, Jamie.
2000.
Distributed information retrieval.
In W. Bruce Crofted.), Advances in information retrieval, pp. 127-150. Kluwer.

Can, Fazli, Ismail Sengör Altingövde, and Engin Demir.
2004.
Efficiency and effectiveness of query processing in cluster-based retrieval.
Information Systems 29 (8): 697-717.
DOI: dx.doi.org/10.1016/S0306-4379(03)00062-0.

Can, Fazli, and Esen A. Ozkarahan.
1990.
Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases.
ACM Trans. Database Syst. 15 (4): 483-517.

Cao, Guihong, Jian-Yun Nie, and Jing Bai.
2005.
Integrating word relationships into language models.
In Proc. SIGIR, pp. 298-305. ACM Press.

Cao, Yunbo, Jun Xu, Tie-Yan Liu, Hang Li, Yalou Huang, and Hsiao-Wuen Hon.
2006.
Adapting Ranking SVM to document retrieval.
In Proc. SIGIR. ACM Press.

Carbonell, Jaime, and Jade Goldstein.
1998.
The use of MMR, diversity-based reranking for reordering documents and producing summaries.
In Proc. SIGIR, pp. 335-336. ACM Press.
DOI: doi.acm.org/10.1145/290941.291025.

Carletta, Jean.
1996.
Assessing agreement on classification tasks: The kappa statistic.
Computational Linguistics 22: 249-254.

Carmel, David, Doron Cohen, Ronald Fagin, Eitan Farchi, Michael Herscovici, Yoelle S. Maarek, and Aya Soffer.
2001.
Static index pruning for information retrieval systems.
In Proc. SIGIR, pp. 43-50. ACM Press.
DOI: doi.acm.org/10.1145/383952.383958.

Carmel, David, Yoelle S. Maarek, Matan Mandelbrod, Yosi Mass, and Aya Soffer.
2003.
Searching XML documents via XML fragments.
In Proc. SIGIR, pp. 151-158. ACM Press.
DOI: doi.acm.org/10.1145/860435.860464.

Caruana, Rich, and Alexandru Niculescu-Mizil.
2006.
An empirical comparison of supervised learning algorithms.
In Proc. ICML.

Castro, R. M., M. J. Coates, and R. D. Nowak.
2004.
Likelihood based hierarchical clustering.
IEEE Transactions in Signal Processing 52 (8): 2308-2321.

Cavnar, William B., and John M. Trenkle.
1994.
N-gram-based text categorization.
In Proc. SDAIR, pp. 161-175.

Chakrabarti, Soumen.
2002.
Mining the Web: Analysis of Hypertext and Semi Structured Data.
Morgan Kaufmann.

Chakrabarti, Soumen, Byron Dom, David Gibson, Jon Kleinberg, Prabhakar Raghavan, and Sridhar Rajagopalan.
1998.
Automatic resource list compilation by analyzing hyperlink structure and associated text.
In Proc. WWW.
URL: citeseer.ist.psu.edu/chakrabarti98automatic.html.

Chapelle, Olivier, Bernhard Schölkopf, and Alexander Zieneds.).
2006.
Semi-Supervised Learning.
MIT Press.

Chaudhuri, Surajit, Gautam Das, Vagelis Hristidis, and Gerhard Weikum.
2006.
Probabilistic information retrieval approach for ranking of database query results.
ACM Transactions on Database Systems 31 (3): 1134-1168.
DOI: doi.acm.org/10.1145/1166074.1166085.

Cheeseman, Peter, and John Stutz.
1996.
Bayesian classification (AutoClass): Theory and results.
In Advances in Knowledge Discovery and Data Mining, pp. 153-180. MIT Press.

Chen, Hsin-Hsi, and Chuan-Jie Lin.
2000.
A multilingual news summarizer.
In Proc. COLING, pp. 159-165.

Chen, Pai-Hsuen, Chih-Jen Lin, and Bernhard Schölkopf.
2005.
A tutorial on $\nu$-support vector machines.
Applied Stochastic Models in Business and Industry 21: 111-136.

Chiaramella, Yves, Philippe Mulhem, and Franck Fourel.
1996.
A model for multimedia information retrieval.
Technical Report 4-96, University of Glasgow.

Chierichetti, Flavio, Alessandro Panconesi, Prabhakar Raghavan, Mauro Sozio, Alessandro Tiberi, and Eli Upfal.
2007.
Finding near neighbors through cluster pruning.
In Proc. PODS.

Cho, Junghoo, and Hector Garcia-Molina.
2002.
Parallel crawlers.
In Proc. WWW, pp. 124-135. ACM Press.
DOI: doi.acm.org/10.1145/511446.511464.

Cho, Junghoo, Hector Garcia-Molina, and Lawrence Page.
1998.
Efficient crawling through URL ordering.
In Proc. WWW, pp. 161-172.

Chu-Carroll, Jennifer, John Prager, Krzysztof Czuba, David Ferrucci, and Pablo Duboue.
2006.
Semantic search via XML fragments: A high-precision approach to IR.
In Proc. SIGIR, pp. 445-452. ACM Press.
DOI: doi.acm.org/10.1145/1148170.1148247.

Clarke, Charles L.A., Gordon V. Cormack, and Elizabeth A. Tudhope.
2000.
Relevance ranking for one to three term queries.
IP&M 36: 291-311.

Cleverdon, Cyril W.
1991.
The significance of the Cranfield tests on index languages.
In Proc. SIGIR, pp. 3-12. ACM Press.

Coden, Anni R., Eric W. Brown, and Savitha Srinivasaneds.).
2002.
Information Retrieval Techniques for Speech Applications.
Springer.

Cohen, Paul R.
1995.
Empirical methods for artificial intelligence.
MIT Press.

Cohen, William W.
1998.
Integration of heterogeneous databases without common domains using queries based on textual similarity.
In Proc. SIGMOD, pp. 201-212. ACM Press.

Cohen, William W., Robert E. Schapire, and Yoram Singer.
1998.
Learning to order things.
In Proc. NIPS. The MIT Press.
URL: citeseer.ist.psu.edu/article/cohen98learning.html.

Cohen, William W., and Yoram Singer.
1999.
Context-sensitive learning methods for text categorization.
TOIS 17 (2): 141-173.

Comtet, Louis.
1974.
Advanced Combinatorics.
Reidel.

Cooper, William S., Aitao Chen, and Fredric C. Gey.
1994.
Full text retrieval based on probabilistic equations with coefficients fitted by logistic regression.
In Proc. TREC, pp. 57-66.

Cormen, Thomas H., Charles Eric Leiserson, and Ronald L. Rivest.
1990.
Introduction to Algorithms.
MIT Press.

Cover, Thomas M., and Peter E. Hart.
1967.
Nearest neighbor pattern classification.
IEEE Transactions on Information Theory 13 (1): 21-27.

Cover, Thomas M., and Joy A. Thomas.
1991.
Elements of Information Theory.
Wiley.

Crammer, Koby, and Yoram Singer.
2001.
On the algorithmic implementation of multiclass kernel-based machines.
JMLR 2: 265-292.

Creecy, Robert H., Brij M. Masand, Stephen J. Smith, and David L. Waltz.
1992.
Trading MIPS and memory for knowledge engineering.
CACM 35 (8): 48-64.
DOI: doi.acm.org/10.1145/135226.135228.

Crestani, Fabio, Mounia Lalmas, Cornelis J. Van Rijsbergen, and Iain Campbell.
1998.
Is this document relevant?
ldots probably: A survey of probabilistic models in information retrieval.
ACM Computing Surveys 30 (4): 528-552.
DOI: doi.acm.org/10.1145/299917.299920.

Cristianini, Nello, and John Shawe-Taylor.
2000.
Introduction to Support Vector Machines and Other Kernel-based Learning Methods.
Cambridge University Press.

Croft, W. Bruce.
1978.
A file organization for cluster-based retrieval.
In Proc. SIGIR, pp. 65-82. ACM Press.

Croft, W. Bruce, and David J. Harper.
1979.
Using probabilistic models of document retrieval without relevance information.
Journal of Documentation 35 (4): 285-295.

Croft, W. Bruce, and John Laffertyeds.).
2003.
Language Modeling for Information Retrieval.
Springer.

Crouch, Carolyn J.
1988.
A cluster-based approach to thesaurus construction.
In Proc. SIGIR, pp. 309-320. ACM Press.
DOI: doi.acm.org/10.1145/62437.62467.

Cucerzan, Silviu, and Eric Brill.
2004.
Spelling correction as an iterative process that exploits the collective knowledge of web users.
In Proc. Empirical Methods in Natural Language Processing.

Cutting, Douglas R., David R. Karger, and Jan O. Pedersen.
1993.
Constant interaction-time Scatter/Gather browsing of very large document collections.
In Proc. SIGIR, pp. 126-134. ACM Press.

Cutting, Douglas R., Jan O. Pedersen, David Karger, and John W. Tukey.
1992.
Scatter/Gather: A cluster-based approach to browsing large document collections.
In Proc. SIGIR, pp. 318-329. ACM Press.

Damerau, Fred J.
1964.
A technique for computer detection and correction of spelling errors.
CACM 7 (3): 171-176.
DOI: doi.acm.org/10.1145/363958.363994.

Davidson, Ian, and Ashwin Satyanarayana.
2003.
Speeding up k-means clustering by bootstrap averaging.
In ICDM 2003 Workshop on Clustering Large Data Sets.

Day, William H., and Herbert Edelsbrunner.
1984.
Efficient algorithms for agglomerative hierarchical clustering methods.
Journal of Classification 1: 1-24.

de Moura, Edleno Silva, Gonzalo Navarro, Nivio Ziviani, and Ricardo Baeza-Yates.
2000.
Fast and flexible word searching on compressed text.
TOIS 18 (2): 113-139.
DOI: doi.acm.org/10.1145/348751.348754.

Dean, Jeffrey, and Sanjay Ghemawat.
2004.
MapReduce: Simplified data processing on large clusters.
In Proc. Symposium on Operating System Design and Implementation.

Deerwester, Scott, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman.
1990.
Indexing by latent semantic analysis.
JASIS 41 (6): 391-407.

del Bimbo, Alberto.
1999.
Visual Information Retrieval.
Morgan Kaufmann.

Dempster, A.P., N.M. Laird, and D.B. Rubin.
1977.
Maximum likelihood from incomplete data via the EM algorithm.
Journal of the Royal Statistical Society Series B 39: 1-38.

Dhillon, Inderjit S.
2001.
Co-clustering documents and words using bipartite spectral graph partitioning.
In Proc. KDD, pp. 269-274.

Dhillon, Inderjit S., and Dharmendra S. Modha.
2001.
Concept decompositions for large sparse text data using clustering.
Machine Learning 42 (1/2): 143-175.
DOI: dx.doi.org/10.1023/A:1007612920971.

Di Eugenio, Barbara, and Michael Glass.
2004.
The kappa statistic: A second look.
Computational Linguistics 30 (1): 95-101.
DOI: dx.doi.org/10.1162/089120104773633402.

Dietterich, Thomas G.
2002.
Ensemble learning.
In Michael A. Arbibed.), The Handbook of Brain Theory and Neural Networks, 2nd edition. MIT Press.

Dietterich, Thomas G., and Ghulum Bakiri.
1995.
Solving multiclass learning problems via error-correcting output codes.
Journal of Artificial Intelligence Research 2: 263-286.

Dom, Byron E.
2002.
An information-theoretic external cluster-validity measure.
In Proc. UAI.

Domingos, Pedro.
2000.
A unified bias-variance decomposition for zero-one and squared loss.
In Proc. National Conference on Artificial Intelligence and Proc. Conference Innovative Applications of Artificial Intelligence, pp. 564-569. AAAI Press / The MIT Press.

Domingos, Pedro, and Michael J. Pazzani.
1997.
On the optimality of the simple Bayesian classifier under zero-one loss.
Machine Learning 29 (2-3): 103-130.
URL: citeseer.ist.psu.edu/domingos97optimality.html.

Downie, J. Stephen.
2006.
The Music Information Retrieval Evaluation eXchange (MIREX).
D-Lib Magazine 12 (12).

Duda, Richard O., Peter E. Hart, and David G. Stork.
2000.
Pattern Classification, 2nd edition.
Wiley-Interscience.

Dumais, Susan, John Platt, David Heckerman, and Mehran Sahami.
1998.
Inductive learning algorithms and representations for text categorization.
In Proc. CIKM, pp. 148-155. ACM Press.
DOI: doi.acm.org/10.1145/288627.288651.

Dumais, Susan T.
1993.
Latent semantic indexing (LSI) and TREC-2.
In Proc. TREC, pp. 105-115.

Dumais, Susan T.
1995.
Latent semantic indexing (LSI): TREC-3 report.
In Proc. TREC, pp. 219-230.

Dumais, Susan T., and Hao Chen.
2000.
Hierarchical classification of Web content.
In Proc. SIGIR, pp. 256-263. ACM Press.

Dunning, Ted.
1993.
Accurate methods for the statistics of surprise and coincidence.
Computational Linguistics 19 (1): 61-74.

Dunning, Ted.
1994.
Statistical identification of language.
Technical Report 94-273, Computing Research Laboratory, New Mexico State University.

Eckart, Carl, and Gale Young.
1936.
The approximation of a matrix by another of lower rank.
Psychometrika 1: 211-218.

El-Hamdouchi, Abdelmoula, and Peter Willett.
1986.
Hierarchic document classification using Ward's clustering method.
In Proc. SIGIR, pp. 149-156. ACM Press.
DOI: doi.acm.org/10.1145/253168.253200.

Elias, Peter.
1975.
Universal code word sets and representations of the integers.
IEEE Transactions on Information Theory 21 (2): 194-203.

Eyheramendy, Susana, David Lewis, and David Madigan.
2003.
On the Naive Bayes model for text categorization.
In International Workshop on Artificial Intelligence and Statistics. Society for Artificial Intelligence and Statistics.

Fallows, Deborah, 2004.
The internet and daily life.
URL: www.pewinternet.org/pdfs/PIP_Internet_and_Daily_Life.pdf.
Pew/Internet and American Life Project.

Fayyad, Usama M., Cory Reina, and Paul S. Bradley.
1998.
Initialization of iterative refinement clustering algorithms.
In Proc. KDD, pp. 194-198.

Fellbaum, Christiane D.
1998.
WordNet - An Electronic Lexical Database.
MIT Press.

Ferragina, Paolo, and Rossano Venturini.
2007.
Compressed permuterm indexes.
In Proc. SIGIR. ACM Press.

Forman, George.
2004.
A pitfall and solution in multi-class feature selection for text classification.
In Proc. ICML.

Forman, George.
2006.
Tackling concept drift by temporal inductive transfer.
In Proc. SIGIR, pp. 252-259. ACM Press.
DOI: doi.acm.org/10.1145/1148170.1148216.

Forman, George, and Ira Cohen.
2004.
Learning from little: Comparison of classifiers given little training.
In Proc. PKDD, pp. 161-172.

Fowlkes, Edward B., and Colin L. Mallows.
1983.
A method for comparing two hierarchical clusterings.
Journal of the American Statistical Association 78 (383): 553-569.
URL: www.jstor.org/view/01621459/di985957/98p0926l/0.

Fox, Edward A., and Whay C. Lee.
1991.
FAST-INV: A fast algorithm for building large inverted files.
Technical report, Virginia Polytechnic Institute & State University, Blacksburg, VA, USA.

Fraenkel, Aviezri S., and Shmuel T. Klein.
1985.
Novel compression of sparse bit-strings - preliminary report.
In Combinatorial Algorithms on Words, NATO ASI Series Vol F12, pp. 169-183. Springer.

Frakes, William B., and Ricardo Baeza-Yateseds.).
1992.
Information Retrieval: Data Structures and Algorithms.
Prentice Hall.

Fraley, Chris, and Adrian E. Raftery.
1998.
How many clusters? Which clustering method? Answers via model-based cluster analysis.
Computer Journal 41 (8): 578-588.

Friedl, Jeffrey E. F.
2006.
Mastering Regular Expressions, 3rd edition.
O'Reilly.

Friedman, Jerome H.
1997.
On bias, variance, 0/1-loss, and the curse-of-dimensionality.
Data Mining and Knowledge Discovery 1 (1): 55-77.

Friedman, Nir, and Moises Goldszmidt.
1996.
Building classifiers using Bayesian networks.
In Proc. National Conference on Artificial Intelligence, pp. 1277-1284.

Fuhr, Norbert.
1989.
Optimum polynomial retrieval functions based on the probability ranking principle.
TOIS 7 (3): 183-204.

Fuhr, Norbert.
1992.
Probabilistic models in information retrieval.
Computer Journal 35 (3): 243-255.

Fuhr, Norbert, Norbert Gövert, Gabriella Kazai, and Mounia Lalmaseds.).
2003a.
INitiative for the Evaluation of XML Retrieval (INEX). Proc. First INEX Workshop. ERCIM.

Fuhr, Norbert, and Kai Großjohann.
2004.
XIRQL: An XML query language based on information retrieval concepts.
TOIS 22 (2): 313-356.
URL: doi.acm.org/10.1145/984321.984326.

Fuhr, Norbert, and Mounia Lalmas.
2007.
Advances in XML retrieval: The INEX initiative.
In International Workshop on Research Issues in Digital Libraries.

Fuhr, Norbert, Mounia Lalmas, Saadia Malik, and Gabriella Kazaieds.).
2006.
Advances in XML Information Retrieval and Evaluation, 4th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2005. Springer.

Fuhr, Norbert, Mounia Lalmas, Saadia Malik, and Zoltán Szlávikeds.).
2005.
Advances in XML Information Retrieval, Third International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2004.
Springer.

Fuhr, Norbert, Mounia Lalmas, and Andrew Trotmaneds.).
2007.
Comparative Evaluation of XML Information Retrieval Systems, 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006.
Springer.

Fuhr, Norbert, Saadia Malik, and Mounia Lalmaseds.).
2003b.
INEX 2003 Workshop.
URL: inex.is.informatik.uni-duisburg.de:2003/proceedings.pdf.

Fuhr, Norbert, and Ulrich Pfeifer.
1994.
Probabilistic information retrieval as a combination of abstraction, inductive learning, and probabilistic assumptions.
TOIS 12 (1): 92-115.
DOI: doi.acm.org/10.1145/174608.174612.

Fuhr, Norbert, and Thomas Rölleke.
1997.
A probabilistic relational algebra for the integration of information retrieval and database systems.
TOIS 15 (1): 32-66.
DOI: doi.acm.org/10.1145/239041.239045.

Gaertner, Thomas, John W. Lloyd, and Peter A. Flach.
2002.
Kernels for structured data.
In Proc. International Conference on Inductive Logic Programming, pp. 66-83.

Gao, Jianfeng, Mu Li, Chang-Ning Huang, and Andi Wu.
2005.
Chinese word segmentation and named entity recognition: A pragmatic approach.
Computational Linguistics 31 (4): 531-574.

Gao, Jianfeng, Jian-Yun Nie, Guangyuan Wu, and Guihong Cao.
2004.
Dependence language model for information retrieval.
In Proc. SIGIR, pp. 170-177. ACM Press.

Garcia, Steven, Hugh E. Williams, and Adam Cannane.
2004.
Access-ordered indexes.
In Proc. Australasian Conference on Computer Science, pp. 7-14.

Garcia-Molina, Hector, Jennifer Widom, and Jeffrey D. Ullman.
1999.
Database System Implementation.
Prentice Hall.

Garfield, Eugene.
1955.
Citation indexes to science: A new dimension in documentation through association of ideas.
Science 122: 108-111.

Garfield, Eugene.
1976.
The permuterm subject index: An autobiographic review.
JASIS 27 (5-6): 288-291.

Geman, Stuart, Elie Bienenstock, and René Doursat.
1992.
Neural networks and the bias/variance dilemma.
Neural Computation 4 (1): 1-58.

Geng, Xiubo, Tie-Yan Liu, Tao Qin, and Hang Li.
2007.
Feature selection for ranking.
In Proc. SIGIR, pp. 407-414. ACM Press.

Gerrand, Peter.
2007.
Estimating linguistic diversity on the internet: A taxonomy to avoid pitfalls and paradoxes.
Journal of Computer-Mediated Communication 12 (4).
URL: jcmc.indiana.edu/vol12/issue4/gerrand.html.
article 8.

Gey, Fredric C.
1994.
Inferring probability of relevance using the method of logistic regression.
In Proc. SIGIR, pp. 222-231. ACM Press.

Ghamrawi, Nadia, and Andrew McCallum.
2005.
Collective multi-label classification.
In Proc. CIKM, pp. 195-200. ACM Press.
DOI: doi.acm.org/10.1145/1099554.1099591.

Glover, Eric, David M. Pennock, Steve Lawrence, and Robert Krovetz.
2002a.
Inferring hierarchical descriptions.
In Proc. CIKM, pp. 507-514. ACM Press.
DOI: doi.acm.org/10.1145/584792.584876.

Glover, Eric J., Kostas Tsioutsiouliklis, Steve Lawrence, David M. Pennock, and Gary W. Flake.
2002b.
Using web structure for classifying and describing web pages.
In Proc. WWW, pp. 562-569. ACM Press.
DOI: doi.acm.org/10.1145/511446.511520.

Gövert, Norbert, and Gabriella Kazai.
2003.
Overview of the INitiative for the Evaluation of XML retrieval (INEX) 2002.
In Fuhr et al. (2003b), pp. 1-17.
URL: inex.is.informatik.uni-duisburg.de:2003/proceedings.pdf.

Grabs, Torsten, and Hans-Jörg Schek.
2002.
Generating vector spaces on-the-fly for flexible XML retrieval.
In XML and Information Retrieval Workshop at SIGIR 2002.

Greiff, Warren R.
1998.
A theory of term weighting based on exploratory data analysis.
In Proc. SIGIR, pp. 11-19. ACM Press.

Grinstead, Charles M., and J. Laurie Snell.
1997.
Introduction to Probability, 2nd edition.
American Mathematical Society.
URL: www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/amsbook.mac.pdf.

Grossman, David A., and Ophir Frieder.
2004.
Information Retrieval: Algorithms and Heuristics, 2nd edition.
Springer.

Gusfield, Dan.
1997.
Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology.
Cambridge University Press.

Hamerly, Greg, and Charles Elkan.
2003.
Learning the $k$ in $k$-means.
In Proc. NIPS.
URL: books.nips.cc/papers/files/nips16/NIPS2003_AA36.pdf.

Han, Eui-Hong, and George Karypis.
2000.
Centroid-based document classification: Analysis and experimental results.
In Proc. PKDD, pp. 424-431.

Hand, David J.
2006.
Classifier technology and the illusion of progress.
Statistical Science 21: 1-14.

Hand, David J., and Keming Yu.
2001.
Idiot's Bayes: Not so stupid after all.
International Statistical Review 69 (3): 385-398.

Harman, Donna.
1991.
How effective is suffixing?
JASIS 42: 7-15.

Harman, Donna.
1992.
Relevance feedback revisited.
In Proc. SIGIR, pp. 1-10. ACM Press.

Harman, Donna, Ricardo Baeza-Yates, Edward Fox, and W. Lee.
1992.
Inverted files.
In Frakes and Baeza-Yates (1992), pp. 28-43.

Harman, Donna, and Gerald Candela.
1990.
Retrieving records from a gigabyte of text on a minicomputer using statistical ranking.
JASIS 41 (8): 581-589.

Harold, Elliotte Rusty, and Scott W. Means.
2004.
XML in a Nutshell, 3rd edition.
O'Reilly.

Harter, Stephen P.
1998.
Variations in relevance assessments and the measurement of retrieval effectiveness.
JASIS 47: 37-49.

Hartigan, J. A., and M. A. Wong.
1979.
A K-means clustering algorithm.
Applied Statistics 28: 100-108.

Hastie, Trevor, Robert Tibshirani, and Jerome H. Friedman.
2001.
The Elements of Statistical Learning: Data Mining, Inference, and Prediction.
Springer.

Hatzivassiloglou, Vasileios, Luis Gravano, and Ankineedu Maganti.
2000.
An investigation of linguistic features and clustering algorithms for topical document clustering.
In Proc. SIGIR, pp. 224-231. ACM Press.
DOI: doi.acm.org/10.1145/345508.345582.

Haveliwala, Taher.
2003.
Topic-sensitive PageRank: A context-sensitive ranking algorithm for web search.
IEEE Transactions on Knowledge and Data Engineering 15 (4): 784-796.
URL: citeseer.ist.psu.edu/article/haveliwala03topicsensitive.html.

Haveliwala, Taher H.
2002.
Topic-sensitive PageRank.
In Proc. WWW.
URL: citeseer.ist.psu.edu/haveliwala02topicsensitive.html.

Hayes, Philip J., and Steven P. Weinstein.
1990.
CONSTRUE/TIS: A system for content-based indexing of a database of news stories.
In Proc. Conference on Innovative Applications of Artificial Intelligence, pp. 49-66.

Heaps, Harold S.
1978.
Information Retrieval: Computational and Theoretical Aspects.
Academic Press.

Hearst, Marti A.
1997.
TextTiling: Segmenting text into multi-paragraph subtopic passages.
Computational Linguistics 23 (1): 33-64.

Hearst, Marti A.
2006.
Clustering versus faceted categories for information exploration.
CACM 49 (4): 59-61.
DOI: doi.acm.org/10.1145/1121949.1121983.

Hearst, Marti A., and Jan O. Pedersen.
1996.
Reexamining the cluster hypothesis.
In Proc. SIGIR, pp. 76-84. ACM Press.

Hearst, Marti A., and Christian Plaunt.
1993.
Subtopic structuring for full-length document access.
In Proc. SIGIR, pp. 59-68. ACM Press.
DOI: doi.acm.org/10.1145/160688.160695.

Heinz, Steffen, and Justin Zobel.
2003.
Efficient single-pass index construction for text databases.
JASIST 54 (8): 713-729.
DOI: dx.doi.org/10.1002/asi.10268.

Heinz, Steffen, Justin Zobel, and Hugh E. Williams.
2002.
Burst tries: A fast, efficient data structure for string keys.
TOIS 20 (2): 192-223.
DOI: doi.acm.org/10.1145/506309.506312.

Henzinger, Monika R., Allan Heydon, Michael Mitzenmacher, and Marc Najork.
2000.
On near-uniform URL sampling.
In Proc. WWW, pp. 295-308. North-Holland.
DOI: dx.doi.org/10.1016/S1389-1286(00)00055-4.

Herbrich, Ralf, Thore Graepel, and Klaus Obermayer.
2000.
Large margin rank boundaries for ordinal regression.
In Advances in Large Margin Classifiers, pp. 115-132. MIT Press.

Hersh, William, Chris Buckley, T. J. Leone, and David Hickam.
1994.
OHSUMED: An interactive retrieval evaluation and new large test collection for research.
In Proc. SIGIR, pp. 192-201. ACM Press.

Hersh, William R., Andrew Turpin, Susan Price, Benjamin Chan, Dale Kraemer, Lynetta Sacherek, and Daniel Olson.
2000a.
Do batch and user evaluation give the same results?
In Proc. SIGIR, pp. 17-24.

Hersh, William R., Andrew Turpin, Susan Price, Dale Kraemer, Daniel Olson, Benjamin Chan, and Lynetta Sacherek.
2001.
Challenging conventional assumptions of automated information retrieval with real users: Boolean searching and batch retrieval evaluations.
IP&M 37 (3): 383-402.

Hersh, William R., Andrew Turpin, Lynetta Sacherek, Daniel Olson, Susan Price, Benjamin Chan, and Dale Kraemer.
2000b.
Further analysis of whether batch and user evaluations give the same results with a question-answering task.
In Proc. TREC.

Hiemstra, Djoerd.
1998.
A linguistically motivated probabilistic model of information retrieval.
In Proc. ECDL, volume 1513 of LNCS, pp. 569-584.

Hiemstra, Djoerd.
2000.
A probabilistic justification for using tf.idf term weighting in information retrieval.
International Journal on Digital Libraries 3 (2): 131-139.

Hiemstra, Djoerd, and Wessel Kraaij.
2005.
A language-modeling approach to TREC.
In Voorhees and Harman (2005), pp. 373-395.

Hirai, Jun, Sriram Raghavan, Hector Garcia-Molina, and Andreas Paepcke.
2000.
WebBase: A repository of web pages.
In Proc. WWW, pp. 277-293.

Hofmann, Thomas.
1999a.
Probabilistic Latent Semantic Indexing.
In Proc. UAI.
URL: citeseer.ist.psu.edu/hofmann99probabilistic.html.

Hofmann, Thomas.
1999b.
Probabilistic Latent Semantic Indexing.
In Proc. SIGIR, pp. 50-57. ACM Press.
URL: citeseer.ist.psu.edu/article/hofmann99probabilistic.html.

Hollink, Vera, Jaap Kamps, Christof Monz, and Maarten de Rijke.
2004.
Monolingual document retrieval for European languages.
IR 7 (1): 33-52.

Hopcroft, John E., Rajeev Motwani, and Jeffrey D. Ullman.
2000.
Introduction to Automata Theory, Languages, and Computation, 2nd edition.
Addison Wesley.

Huang, Yifen, and Tom M. Mitchell.
2006.
Text clustering with extended user feedback.
In Proc. SIGIR, pp. 413-420. ACM Press.
DOI: doi.acm.org/10.1145/1148170.1148242.

Hubert, Lawrence, and Phipps Arabie.
1985.
Comparing partitions.
Journal of Classification 2: 193-218.

Hughes, Baden, Timothy Baldwin, Steven Bird, Jeremy Nicholson, and Andrew MacKinlay.
2006.
Reconsidering language identification for written language resources.
In Proc. International Conference on Language Resources and Evaluation, pp. 485-488.

Hull, David.
1993.
Using statistical testing in the evaluation of retrieval performance.
In Proc. SIGIR, pp. 329-338. ACM Press.

Hull, David.
1996.
Stemming algorithms - A case study for detailed evaluation.
JASIS 47 (1): 70-84.

Ide, E.
1971.
New experiments in relevance feedback.
In Salton (1971b), pp. 337-354.

Indyk, Piotr.
2004.
Nearest neighbors in high-dimensional spaces.
In J. E. Goodman and J. O'Rourkeeds.), Handbook of Discrete and Computational Geometry, 2nd edition, pp. 877-892. Chapman and Hall/CRC Press.

Ingwersen, Peter, and Kalervo Järvelin.
2005.
The Turn: Integration of Information Seeking and Retrieval in Context.
Springer.

Ittner, David J., David D. Lewis, and David D. Ahn.
1995.
Text categorization of low quality images.
In Proc. SDAIR, pp. 301-315.

Iwayama, Makoto, and Takenobu Tokunaga.
1995.
Cluster-based text categorization: A comparison of category search strategies.
In Proc. SIGIR, pp. 273-280. ACM Press.

Jackson, Peter, and Isabelle Moulinier.
2002.
Natural Language Processing for Online Applications: Text Retrieval, Extraction and Categorization.
John Benjamins.

Jacobs, Paul S., and Lisa F. Rau.
1990.
SCISOR: Extracting information from on-line news.
CACM 33: 88-97.

Jain, Anil, M. Narasimha Murty, and Patrick Flynn.
1999.
Data clustering: A review.
ACM Computing Surveys 31 (3): 264-323.

Jain, Anil K., and Richard C. Dubes.
1988.
Algorithms for Clustering Data.
Prentice Hall.

Jardine, N., and Cornelis Joost van Rijsbergen.
1971.
The use of hierarchic clustering in information retrieval.
Information Storage and Retrieval 7: 217-240.

Järvelin, Kalervo, and Jaana Kekäläinen.
2002.
Cumulated gain-based evaluation of IR techniques.
TOIS 20 (4): 422-446.

Jeh, Glen, and Jennifer Widom.
2003.
Scaling personalized web search.
In Proc. WWW, pp. 271-279. ACM Press.

Jensen, Finn V., and Finn B. Jensen.
2001.
Bayesian Networks and Decision Graphs.
Springer.

Jeong, Byeong-Soo, and Edward Omiecinski.
1995.
Inverted file partitioning schemes in multiple disk systems.
IEEE Transactions on Parallel and Distributed Systems 6 (2): 142-153.

Ji, Xiang, and Wei Xu.
2006.
Document clustering with prior knowledge.
In Proc. SIGIR, pp. 405-412. ACM Press.
DOI: doi.acm.org/10.1145/1148170.1148241.

Jing, Hongyan.
2000.
Sentence reduction for automatic text summarization.
In Proc. Conference on Applied Natural Language Processing, pp. 310-315.

Joachims, Thorsten.
1997.
A probabilistic analysis of the Rocchio algorithm with tfidf for text categorization.
In Proc. ICML, pp. 143-151. Morgan Kaufmann.

Joachims, Thorsten.
1998.
Text categorization with support vector machines: Learning with many relevant features.
In Proc. ECML, pp. 137-142. Springer.

Joachims, Thorsten.
1999.
Making large-scale SVM learning practical.
In B. Schölkopf, C. Burges, and A. Smolaeds.), Advances in Kernel Methods - Support Vector Learning. MIT Press.

Joachims, Thorsten.
2002a.
Learning to Classify Text Using Support Vector Machines.
Kluwer.

Joachims, Thorsten.
2002b.
Optimizing search engines using clickthrough data.
In Proc. KDD, pp. 133-142.

Joachims, Thorsten.
2006a.
Training linear SVMs in linear time.
In Proc. KDD, pp. 217-226. ACM Press.
DOI: doi.acm.org/10.1145/1150402.1150429.

Joachims, Thorsten.
2006b.
Transductive support vector machines.
In Chapelle et al. (2006), pp. 105-118.

Joachims, Thorsten, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay.
2005.
Accurately interpreting clickthrough data as implicit feedback.
In Proc. SIGIR, pp. 154-161. ACM Press.

Johnson, David, Vishv Malhotra, and Peter Vamplew.
2006.
More effective web search using bigrams and trigrams.
Webology 3 (4).
URL: www.webology.ir/2006/v3n4/a35.html.
Article 35.

Jurafsky, Dan, and James H. Martin.
2008.
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, 2nd edition.
Prentice Hall.

Käki, Mika.
2005.
Findex: Search result categories help users when document ranking fails.
In Proc. SIGCHI, pp. 131-140. ACM Press.
DOI: doi.acm.org/10.1145/1054972.1054991.

Kammenhuber, Nils, Julia Luxenburger, Anja Feldmann, and Gerhard Weikum.
2006.
Web search clickstreams.
In Proc. ACM SIGCOMM on Internet Measurement, pp. 245-250. ACM Press.

Kamps, Jaap, Maarten de Rijke, and Börkur Sigurbjörnsson.
2004.
Length normalization in XML retrieval.
In Proc. SIGIR, pp. 80-87. ACM Press.
DOI: doi.acm.org/10.1145/1008992.1009009.

Kamps, Jaap, Maarten Marx, Maarten de Rijke, and Börkur Sigurbjörnsson.
2006.
Articulating information needs in XML query languages.
TOIS 24 (4): 407-436.
DOI: doi.acm.org/10.1145/1185877.1185879.

Kamvar, Sepandar D., Dan Klein, and Christopher D. Manning.
2002.
Interpreting and extending classical agglomerative clustering algorithms using a model-based approach.
In Proc. ICML, pp. 283-290. Morgan Kaufmann.

Kannan, Ravi, Santosh Vempala, and Adrian Vetta.
2000.
On clusterings - Good, bad and spectral.
In Proc. Symposium on Foundations of Computer Science, pp. 367-377. IEEE Computer Society.

Kaszkiel, Marcin, and Justin Zobel.
1997.
Passage retrieval revisited.
In Proc. SIGIR, pp. 178-185. ACM Press.
DOI: doi.acm.org/10.1145/258525.258561.

Kaufman, Leonard, and Peter J. Rousseeuw.
1990.
Finding groups in data.
Wiley.

Kazai, Gabriella, and Mounia Lalmas.
2006.
eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval.
TOIS 24 (4): 503-542.
DOI: doi.acm.org/10.1145/1185883.

Kekäläinen, Jaana.
2005.
Binary and graded relevance in IR evaluations - Comparison of the effects on ranking of IR systems.
IP&M 41: 1019-1033.

Kekäläinen, Jaana, and Kalervo Järvelin.
2002.
Using graded relevance assessments in IR evaluation.
JASIST 53 (13): 1120-1129.

Kemeny, John G., and J. Laurie Snell.
1976.
Finite Markov Chains.
Springer.

Kent, Allen, Madeline M. Berry, Fred U. Luehrs, Jr., and J. W. Perry.
1955.
Machine literature searching VIII. Operational criteria for designing information retrieval systems.
American Documentation 6 (2): 93-101.

Kernighan, Mark D., Kenneth W. Church, and William A. Gale.
1990.
A spelling correction program based on a noisy channel model.
In Proc. ACL, pp. 205-210.

King, Benjamin.
1967.
Step-wise clustering procedures.
Journal of the American Statistical Association 69: 86-101.

Kishida, Kazuaki, Kuang-Hua Chen, Sukhoon Lee, Kazuko Kuriyama, Noriko Kando, Hsin-Hsi Chen, and Sung Hyon Myaeng.
2005.
Overview of CLIR task at the fifth NTCIR workshop.
In NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access. National Institute of Informatics.

Klein, Dan, and Christopher D. Manning.
2002.
Conditional structure versus conditional estimation in NLP models.
In Proc. Empirical Methods in Natural Language Processing, pp. 9-16.

Kleinberg, Jon M.
1997.
Two algorithms for nearest-neighbor search in high dimensions.
In Proc. ACM Symposium on Theory of Computing, pp. 599-608. ACM Press.
DOI: doi.acm.org/10.1145/258533.258653.

Kleinberg, Jon M.
1999.
Authoritative sources in a hyperlinked environment.
JACM 46 (5): 604-632.
URL: citeseer.ist.psu.edu/article/kleinberg98authoritative.html.

Kleinberg, Jon M.
2002.
An impossibility theorem for clustering.
In Proc. NIPS.

Knuth, Donald E.
1997.
The Art of Computer Programming, Volume 3: Sorting and Searching, 3rd edition.
Addison Wesley.

Ko, Youngjoong, Jinwoo Park, and Jungyun Seo.
2004.
Improving text categorization using the importance of sentences.
IP&M 40 (1): 65-79.

Koenemann, Jürgen, and Nicholas J. Belkin.
1996.
A case for interaction: A study of interactive information retrieval behavior and effectiveness.
In Proc. SIGCHI, pp. 205-212. ACM Press.
DOI: doi.acm.org/10.1145/238386.238487.

Ko\lcz, Aleksander, Vidya Prabakarmurthi, and Jugal Kalita.
2000.
Summarization as feature selection for text categorization.
In Proc. CIKM, pp. 365-370. ACM Press.

Ko\lcz, Aleksander, and Wen-Tau Yih.
2007.
Raising the baseline for high-precision text classifiers.
In Proc. KDD.

Koller, Daphne, and Mehran Sahami.
1997.
Hierarchically classifying documents using very few words.
In Proc. ICML, pp. 170-178.

Konheim, Alan G.
1981.
Cryptography: A Primer.
John Wiley & Sons.

Korfhage, Robert R.
1997.
Information Storage and Retrieval.
Wiley.

Kozlov, M. K., S. P. Tarasov, and L. G. Khachiyan.
1979.
Polynomial solvability of convex quadratic programming.
Soviet Mathematics Doklady 20: 1108-1111.
Translated from original in Doklady Akademiia Nauk SSR, 228 (1979).

Kraaij, Wessel, and Martijn Spitters.
2003.
Language models for topic tracking.
In W. B. Croft and J. Laffertyeds.), Language Modeling for Information Retrieval, pp. 95-124. Kluwer.

Kraaij, Wessel, Thijs Westerveld, and Djoerd Hiemstra.
2002.
The importance of prior probabilities for entry page search.
In Proc. SIGIR, pp. 27-34. ACM Press.

Krippendorff, Klaus.
2003.
Content Analysis: An Introduction to its Methodology.
Sage.

Krovetz, Bob.
1995.
Word sense disambiguation for large text databases.
PhD thesis, University of Massachusetts Amherst.

Kukich, Karen.
1992.
Techniques for automatically correcting words in text.
ACM Computing Surveys 24 (4): 377-439.
DOI: doi.acm.org/10.1145/146370.146380.

Kumar, Ravi, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins.
1999.
Trawling the Web for emerging cyber-communities.
Computer Networks 31 (11-16): 1481-1493.
URL: citeseer.ist.psu.edu/kumar99trawling.html.

Kumar, S. Ravi, Prabhakar Raghavan, Sridhar Rajagopalan, Dandapani Sivakumar, Andrew Tomkins, and Eli Upfal.
2000.
The Web as a graph.
In Proc. PODS, pp. 1-10. ACM Press.
URL: citeseer.ist.psu.edu/article/kumar00web.html.

Kupiec, Julian, Jan Pedersen, and Francine Chen.
1995.
A trainable document summarizer.
In Proc. SIGIR, pp. 68-73. ACM Press.

Kurland, Oren, and Lillian Lee.
2004.
Corpus structure, language models, and ad hoc information retrieval.
In Proc. SIGIR, pp. 194-201. ACM Press.
DOI: doi.acm.org/10.1145/1008992.1009027.

Lafferty, John, and Chengxiang Zhai.
2001.
Document language models, query models, and risk minimization for information retrieval.
In Proc. SIGIR, pp. 111-119. ACM Press.

Lafferty, John, and Chengxiang Zhai.
2003.
Probabilistic relevance models based on document and query generation.
In W. Bruce Croft and John Laffertyeds.), Language Modeling for Information Retrieval. Kluwer.

Lalmas, Mounia, Gabriella Kazai, Jaap Kamps, Jovan Pehcevski, Benjamin Piwowarski, and Stephen E. Robertson.
2007.
INEX 2006 evaluation measures.
In Fuhr et al. (2007), pp. 20-34.

Lalmas, Mounia, and Anastasios Tombros.
2007.
Evaluating XML retrieval effectiveness at INEX.
SIGIR Forum 41 (1): 40-57.
DOI: doi.acm.org/10.1145/1273221.1273225.

Lance, G. N., and W. T. Williams.
1967.
A general theory of classificatory sorting strategies 1. Hierarchical systems.
Computer Journal 9 (4): 373-380.

Langville, Amy, and Carl Meyer.
2006.
Google's PageRank and Beyond: The Science of Search Engine Rankings.
Princeton University Press.

Larsen, Bjornar, and Chinatsu Aone.
1999.
Fast and effective text mining using linear-time document clustering.
In Proc. KDD, pp. 16-22. ACM Press.
DOI: doi.acm.org/10.1145/312129.312186.

Larson, Ray R.
2005.
A fusion approach to XML structured document retrieval.
IR 8 (4): 601-629.
DOI: dx.doi.org/10.1007/s10791-005-0749-0.

Lavrenko, Victor, and W. Bruce Croft.
2001.
Relevance-based language models.
In Proc. SIGIR, pp. 120-127. ACM Press.

Lawrence, Steve, and C. Lee Giles.
1998.
Searching the World Wide Web.
Science 280 (5360): 98-100.
URL: citeseer.ist.psu.edu/lawrence98searching.html.

Lawrence, Steve, and C. Lee Giles.
1999.
Accessibility of information on the web.
Nature 500: 107-109.

Lee, Whay C., and Edward A. Fox.
1988.
Experimental comparison of schemes for interpreting Boolean queries.
Technical Report TR-88-27, Computer Science, Virginia Polytechnic Institute and State University.

Lempel, Ronny, and Shlomo Moran.
2000.
The stochastic approach for link-structure analysis (SALSA) and the TKC effect.
Computer Networks 33 (1-6): 387-401.
URL: citeseer.ist.psu.edu/lempel00stochastic.html.

Lesk, Michael.
1988.
Grab - Inverted indexes with low storage overhead.
Computing Systems 1: 207-220.

Lesk, Michael.
2004.
Understanding Digital Libraries, 2nd edition.
Morgan Kaufmann.

Lester, Nicholas, Alistair Moffat, and Justin Zobel.
2005.
Fast on-line index construction by geometric partitioning.
In Proc. CIKM, pp. 776-783. ACM Press.
DOI: doi.acm.org/10.1145/1099554.1099739.

Lester, Nicholas, Justin Zobel, and Hugh E. Williams.
2006.
Efficient online index maintenance for contiguous inverted lists.
IP&M 42 (4): 916-933.
DOI: dx.doi.org/10.1016/j.ipm.2005.09.005.

Levenshtein, Vladimir I.
1965.
Binary codes capable of correcting spurious insertions and deletions of ones.
Problems of Information Transmission 1: 8-17.

Lew, Michael S.
2001.
Principles of Visual Information Retrieval.
Springer.

Lewis, David D.
1995.
Evaluating and optimizing autonomous text classification systems.
In Proc. SIGIR. ACM Press.

Lewis, David D.
1998.
Naive (Bayes) at forty: The independence assumption in information retrieval.
In Proc. ECML, pp. 4-15. Springer.

Lewis, David D., and Karen Spärck Jones.
1996.
Natural language processing for information retrieval.
CACM 39 (1): 92-101.
DOI: doi.acm.org/10.1145/234173.234210.

Lewis, David D., and Marc Ringuette.
1994.
A comparison of two learning algorithms for text categorization.
In Proc. SDAIR, pp. 81-93.

Lewis, David D., Robert E. Schapire, James P. Callan, and Ron Papka.
1996.
Training algorithms for linear text classifiers.
In Proc. SIGIR, pp. 298-306. ACM Press.
DOI: doi.acm.org/10.1145/243199.243277.

Lewis, David D., Yiming Yang, Tony G. Rose, and Fan Li.
2004.
RCV1: A new benchmark collection for text categorization research.
JMLR 5: 361-397.

Li, Fan, and Yiming Yang.
2003.
A loss function analysis for classification methods in text categorization.
In Proc. ICML, pp. 472-479.

Liddy, Elizabeth D.
2005.
Automatic document retrieval.
In Encyclopedia of Language and Linguistics, 2nd edition. Elsevier.

List, Johan, Vojkan Mihajlovic, Georgina Ramírez, Arjen P. Vries, Djoerd Hiemstra, and Henk Ernst Blok.
2005.
TIJAH: Embracing IR methods in XML databases.
IR 8 (4): 547-570.
DOI: dx.doi.org/10.1007/s10791-005-0747-2.

Lita, Lucian Vlad, Abe Ittycheriah, Salim Roukos, and Nanda Kambhatla.
2003.
tRuEcasIng.
In Proc. ACL, pp. 152-159.

Littman, Michael L., Susan T. Dumais, and Thomas K. Landauer.
1998.
Automatic cross-language information retrieval using latent semantic indexing.
In Gregory Grefenstetteed.), Proc. Cross-Language Information Retrieval. Kluwer.
URL: citeseer.ist.psu.edu/littman98automatic.html.

Liu, Tie-Yan, Yiming Yang, Hao Wan, Hua-Jun Zeng, Zheng Chen, and Wei-Ying Ma.
2005.
Support vector machines classification with very large scale taxonomy.
ACM SIGKDD Explorations 7 (1): 36-43.

Liu, Xiaoyong, and W. Bruce Croft.
2004.
Cluster-based retrieval using language models.
In Proc. SIGIR, pp. 186-193. ACM Press.
DOI: doi.acm.org/10.1145/1008992.1009026.

Lloyd, Stuart P.
1982.
Least squares quantization in PCM.
IEEE Transactions on Information Theory 28 (2): 129-136.

Lodhi, Huma, Craig Saunders, John Shawe-Taylor, Nello Cristianini, and Chris Watkins.
2002.
Text classification using string kernels.
JMLR 2: 419-444.

Lombard, Matthew, Cheryl C. Bracken, and Jennifer Snyder-Duch.
2002.
Content analysis in mass communication: Assessment and reporting of intercoder reliability.
Human Communication Research 28: 587-604.

Long, Xiaohui, and Torsten Suel.
2003.
Optimized query execution in large search engines with global page ordering.
In Proc. VLDB.
URL: citeseer.ist.psu.edu/long03optimized.html.

Lovins, Julie Beth.
1968.
Development of a stemming algorithm.
Translation and Computational Linguistics 11 (1): 22-31.

Lu, Wei, Stephen E. Robertson, and Andrew MacFarlane.
2007.
CISR at INEX 2006.
In Fuhr et al. (2007), pp. 57-63.

Luhn, Hans Peter.
1957.
A statistical approach to mechanized encoding and searching of literary information.
IBM Journal of Research and Development 1 (4): 309-317.

Luhn, Hans Peter.
1958.
The automatic creation of literature abstracts.
IBM Journal of Research and Development 2 (2): 159-165, 317.

Luk, Robert W. P., and Kui-Lam Kwok.
2002.
A comparison of Chinese document indexing strategies and retrieval models.
ACM Transactions on Asian Language Information Processing 1 (3): 225-268.

Lunde, Ken.
1998.
CJKV Information Processing.
O'Reilly.

MacFarlane, A., J.A. McCann, and S.E. Robertson.
2000.
Parallel search using partitioned inverted files.
In Proc. SPIRE, pp. 209-220.

MacQueen, James B.
1967.
Some methods for classification and analysis of multivariate observations.
In Proc. Berkeley Symposium on Mathematics, Statistics and Probability, pp. 281-297. University of California Press.

Manning, Christopher D., and Hinrich Schütze.
1999.
Foundations of Statistical Natural Language Processing.
MIT Press.

Maron, M. E., and J. L. Kuhns.
1960.
On relevance, probabilistic indexing, and information retrieval.
JACM 7 (3): 216-244.

Mass, Yosi, Matan Mandelbrod, Einat Amitay, David Carmel, Yoëlle S. Maarek, and Aya Soffer.
2003.
JuruXML - An XML retrieval system at INEX'02.
In Fuhr et al. (2003b), pp. 73-80.
URL: inex.is.informatik.uni-duisburg.de:2003/proceedings.pdf.

McBryan, Oliver A.
1994.
GENVL and WWWW: Tools for Taming the Web.
In Proc. WWW.
URL: citeseer.ist.psu.edu/mcbryan94genvl.html.

McCallum, Andrew, and Kamal Nigam.
1998.
A comparison of event models for Naive Bayes text classification.
In AAAI/ICML Workshop on Learning for Text Categorization, pp. 41-48.

McCallum, Andrew, Ronald Rosenfeld, Tom M. Mitchell, and Andrew Y. Ng.
1998.
Improving text classification by shrinkage in a hierarchy of classes.
In Proc. ICML, pp. 359-367. Morgan Kaufmann.

McCallum, Andrew Kachites.
1996.
Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering.
www.cs.cmu.edu/~mccallum/bow.

McKeown, Kathleen, and Dragomir R. Radev.
1995.
Generating summaries of multiple news articles.
In Proc. SIGIR, pp. 74-82. ACM Press.
DOI: doi.acm.org/10.1145/215206.215334.

McKeown, Kathleen R., Regina Barzilay, David Evans, Vasileios Hatzivassiloglou, Judith L. Klavans, Ani Nenkova, Carl Sable, Barry Schiffman, and Sergey Sigelman.
2002.
Tracking and summarizing news on a daily basis with Columbia's Newsblaster.
In Proc. Human Language Technology Conference.

McLachlan, Geoffrey J., and Thiriyambakam Krishnan.
1996.
The EM Algorithm and Extensions.
John Wiley & Sons.

Meadow, Charles T., Donald H. Kraft, and Bert R. Boyce.
1999.
Text Information Retrieval Systems.
Academic Press.

Meila, Marina.
2005.
Comparing clusterings - An axiomatic view.
In Proc. ICML.

Melnik, Sergey, Sriram Raghavan, Beverly Yang, and Hector Garcia-Molina.
2001.
Building a distributed full-text index for the web.
In Proc. WWW, pp. 396-406. ACM Press.
DOI: doi.acm.org/10.1145/371920.372095.

Mihajlovic, Vojkan, Henk Ernst Blok, Djoerd Hiemstra, and Peter M. G. Apers.
2005.
Score region algebra: Building a transparent XML-R database.
In Proc. CIKM, pp. 12-19.
DOI: doi.acm.org/10.1145/1099554.1099560.

Miller, David R. H., Tim Leek, and Richard M. Schwartz.
1999.
A hidden Markov model information retrieval system.
In Proc. SIGIR, pp. 214-221. ACM Press.

Minsky, Marvin Lee, and Seymour Paperteds.).
1988.
Perceptrons: An introduction to computational geometry.
MIT Press.
Expanded edition.

Mitchell, Tom M.
1997.
Machine Learning.
McGraw Hill.

Moffat, Alistair, and Timothy A. H. Bell.
1995.
In situ generation of compressed inverted files.
JASIS 46 (7): 537-550.

Moffat, Alistair, and Lang Stuiver.
1996.
Exploiting clustering in inverted file compression.
In Proc. Conference on Data Compression, pp. 82-91. IEEE Computer Society.

Moffat, Alistair, and Justin Zobel.
1992.
Parameterised compression for sparse bitmaps.
In Proc. SIGIR, pp. 274-285. ACM Press.
DOI: doi.acm.org/10.1145/133160.133210.

Moffat, Alistair, and Justin Zobel.
1996.
Self-indexing inverted files for fast text retrieval.
TOIS 14 (4): 349-379.

Moffat, Alistair, and Justin Zobel.
1998.
Exploring the similarity space.
SIGIR Forum 32 (1).

Mooers, Calvin.
1961.
From a point of view of mathematical etc. techniques.
In R. A. Fairthorneed.), Towards information retrieval, pp. xvii-xxiii. Butterworths.

Mooers, Calvin E.
1950.
Coding, information retrieval, and the rapid selector.
American Documentation 1 (4): 225-229.

Moschitti, Alessandro.
2003.
A study on optimal parameter tuning for Rocchio text classifier.
In Proc. ECIR, pp. 420-435.

Moschitti, Alessandro, and Roberto Basili.
2004.
Complex linguistic features for text classification: A comprehensive study.
In Proc. ECIR, pp. 181-196.

Murata, Masaki, Qing Ma, Kiyotaka Uchimoto, Hiromi Ozaku, Masao Utiyama, and Hitoshi Isahara.
2000.
Japanese probabilistic information retrieval using location and category information.
In International Workshop on Information Retrieval With Asian Languages, pp. 81-88.
URL: portal.acm.org/citation.cfm?doid=355214.355226.

Muresan, Gheorghe, and David J. Harper.
2004.
Topic modeling for mediated access to very large document collections.
JASIST 55 (10): 892-910.
DOI: dx.doi.org/10.1002/asi.20034.

Murtagh, Fionn.
1983.
A survey of recent advances in hierarchical clustering algorithms.
Computer Journal 26 (4): 354-359.

Najork, Marc, and Allan Heydon.
2001.
High-performance web crawling.
Technical Report 173, Compaq Systems Research Center.

Najork, Marc, and Allan Heydon.
2002.
High-performance web crawling.
In James Abello, Panos Pardalos, and Mauricio Resendeeds.), Handbook of Massive Data Sets, chapter 2. Kluwer.

Navarro, Gonzalo, and Ricardo Baeza-Yates.
1997.
Proximal nodes: A model to query document databases by content and structure.
TOIS 15 (4): 400-435.
DOI: doi.acm.org/10.1145/263479.263482.

Newsam, Shawn, Sitaram Bhagavathy, and B. S. Manjunath.
2001.
Category-based image retrieval.
In Proc. IEEE International Conference on Image Processing, Special Session on Multimedia Indexing, Browsing and Retrieval, pp. 596-599.

Ng, Andrew Y., and Michael I. Jordan.
2001.
On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes.
In Proc. NIPS, pp. 841-848.
URL: www-2.cs.cmu.edu/Groups/NIPS/NIPS2001/papers/psgz/AA28.ps.gz.

Ng, Andrew Y., Michael I. Jordan, and Yair Weiss.
2001a.
On spectral clustering: Analysis and an algorithm.
In Proc. NIPS, pp. 849-856.

Ng, Andrew Y., Alice X. Zheng, and Michael I. Jordan.
2001b.
Link analysis, eigenvectors and stability.
In Proc. IJCAI, pp. 903-910.
URL: citeseer.ist.psu.edu/ng01link.html.

Nigam, Kamal, Andrew McCallum, and Tom Mitchell.
2006.
Semi-supervised text classification using EM.
In Chapelle et al. (2006), pp. 33-56.

Ntoulas, Alexandros, and Junghoo Cho.
2007.
Pruning policies for two-tiered inverted index with correctness guarantee.
In Proc. SIGIR, pp. 191-198. ACM Press.

Oard, Douglas W., and Bonnie J. Dorr.
1996.
A survey of multilingual text retrieval.
Technical Report UMIACS-TR-96-19, Institute for Advanced Computer Studies, University of Maryland, College Park, MD, USA.

Ogilvie, Paul, and Jamie Callan.
2005.
Parameter estimation for a simple hierarchical generative model for XML retrieval.
In Proc. INEX, pp. 211-224.
DOI: dx.doi.org/10.1007/11766278_16.

O'Keefe, Richard A., and Andrew Trotman.
2004.
The simplest query language that could possibly work.
In Fuhr et al. (2005), pp. 167-174.

Osinski, Stanis\law, and Dawid Weiss.
2005.
A concept-driven algorithm for clustering search results.
IEEE Intelligent Systems 20 (3): 48-54.

Page, Lawrence, Sergey Brin, Rajeev Motwani, and Terry Winograd.
1998.
The PageRank citation ranking: Bringing order to the web.
Technical report, Stanford Digital Library Technologies Project.
URL: citeseer.ist.psu.edu/page98pagerank.html.

Paice, Chris D.
1990.
Another stemmer.
SIGIR Forum 24 (3): 56-61.

Papineni, Kishore.
2001.
Why inverse document frequency?
In Proc. North American Chapter of the Association for Computational Linguistics, pp. 1-8.

Pavlov, Dmitry, Ramnath Balasubramanyan, Byron Dom, Shyam Kapur, and Jignashu Parikh.
2004.
Document preprocessing for naive Bayes classification and clustering with mixture of multinomials.
In Proc. KDD, pp. 829-834.

Pelleg, Dan, and Andrew Moore.
1999.
Accelerating exact k-means algorithms with geometric reasoning.
In Proc. KDD, pp. 277-281. ACM Press.
DOI: doi.acm.org/10.1145/312129.312248.

Pelleg, Dan, and Andrew Moore.
2000.
X-means: Extending k-means with efficient estimation of the number of clusters.
In Proc. ICML, pp. 727-734. Morgan Kaufmann.

Perkins, Simon, Kevin Lacker, and James Theiler.
2003.
Grafting: Fast, incremental feature selection by gradient descent in function space.
JMLR 3: 1333-1356.

Persin, Michael.
1994.
Document filtering for fast ranking.
In Proc. SIGIR, pp. 339-348. ACM Press.

Persin, Michael, Justin Zobel, and Ron Sacks-Davis.
1996.
Filtered document retrieval with frequency-sorted indexes.
JASIS 47 (10): 749-764.

Peterson, James L.
1980.
Computer programs for detecting and correcting spelling errors.
CACM 23 (12): 676-687.
DOI: doi.acm.org/10.1145/359038.359041.

Picca, Davide, Benoît Curdy, and François Bavaud.
2006.
Non-linear correspondence analysis in text retrieval: A kernel view.
In Proc. JADT.

Pinski, Gabriel, and Francis Narin.
1976.
Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of Physics.
IP&M 12: 297-326.

Pirolli, Peter L. T.
2007.
Information Foraging Theory: Adaptive Interaction With Information.
Oxford University Press.

Platt, John.
2000.
Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods.
In A.J. Smola, P.L. Bartlett, B. Schölkopf, and D. Schuurmans (eds.), Advances in Large Margin Classifiers, pp. 61-74. MIT Press.

Ponte, Jay M., and W. Bruce Croft.
1998.
A language modeling approach to information retrieval.
In Proc. SIGIR, pp. 275-281. ACM Press.

Popescul, Alexandrin, and Lyle H. Ungar.
2000.
Automatic labeling of document clusters.
Unpublished MS, U. Pennsylvania.
URL: http://www.cis.upenn.edu/ popescul/Publications/popescul00labeling.pdf.

Porter, Martin F.
1980.
An algorithm for suffix stripping.
Program 14 (3): 130-137.

Pugh, William.
1990.
Skip lists: A probabilistic alternative to balanced trees.
CACM 33 (6): 668-676.

Qin, Tao, Tie-Yan Liu, Wei Lai, Xu-Dong Zhang, De-Sheng Wang, and Hang Li.
2007.
Ranking with multiple hyperplanes.
In Proc. SIGIR. ACM Press.

Qiu, Yonggang, and H.P. Frei.
1993.
Concept based query expansion.
In Proc. SIGIR, pp. 160-169. ACM Press.

R Development Core Team.
2005.
R: A language and environment for statistical computing.
R Foundation for Statistical Computing, Vienna.
URL: www.R-project.org.
ISBN 3-900051-07-0.

Radev, Dragomir R., Sasha Blair-Goldensohn, Zhu Zhang, and Revathi Sundara Raghavan.
2001.
Interactive, domain-independent identification and summarization of topically related news articles.
In Proc. European Conference on Research and Advanced Technology for Digital Libraries, pp. 225-238.

Rahm, Erhard, and Philip A. Bernstein.
2001.
A survey of approaches to automatic schema matching.
VLDB Journal 10 (4): 334-350.
URL: citeseer.ist.psu.edu/rahm01survey.html.

Rand, William M.
1971.
Objective criteria for the evaluation of clustering methods.
Journal of the American Statistical Association 66 (336): 846-850.

Rasmussen, Edie.
1992.
Clustering algorithms.
In Frakes and Baeza-Yates (1992), pp. 419-442.

Rennie, Jason D., Lawrence Shih, Jaime Teevan, and David R. Karger.
2003.
Tackling the poor assumptions of naive Bayes text classifiers.
In Proc. ICML, pp. 616-623.

Ribeiro-Neto, Berthier, Edleno S. Moura, Marden S. Neubert, and Nivio Ziviani.
1999.
Efficient distributed algorithms to build inverted files.
In Proc. SIGIR, pp. 105-112. ACM Press.
DOI: doi.acm.org/10.1145/312624.312663.

Ribeiro-Neto, Berthier A., and Ramurti A. Barbosa.
1998.
Query performance for tightly coupled distributed digital libraries.
In Proc. ACM Conference on Digital Libraries, pp. 182-190.

Rice, John A.
2006.
Mathematical Statistics and Data Analysis.
Duxbury Press.

Richardson, M., A. Prakash, and E. Brill.
2006.
Beyond PageRank: machine learning for static ranking.
In Proc. WWW, pp. 707-715.

Riezler, Stefan, Alexander Vasserman, Ioannis Tsochantaridis, Vibhu Mittal, and Yi Liu.
2007.
Statistical machine translation for query expansion in answer retrieval.
In Proc. ACL, pp. 464-471. Association for Computational Linguistics.
URL: www.aclweb.org/anthology/P/P07/P07-1059.

Ripley, B. D.
1996.
Pattern Recognition and Neural Networks.
Cambridge University Press.

Robertson, Stephen.
2005.
How Okapi came to TREC.
In Voorhees and Harman (2005), pp. 287-299.

Robertson, Stephen, Hugo Zaragoza, and Michael Taylor.
2004.
Simple BM25 extension to multiple weighted fields.
In Proc. CIKM, pp. 42-49.
DOI: doi.acm.org/10.1145/1031171.1031181.

Robertson, Stephen E., and Karen Spärck Jones.
1976.
Relevance weighting of search terms.
JASIS 27: 129-146.

Rocchio, J. J.
1971.
Relevance feedback in information retrieval.
In Salton (1971b), pp. 313-323.

Roget, P. M.
1946.
Roget's International Thesaurus.
Thomas Y. Crowell.

Rosen-Zvi, Michal, Thomas Griffiths, Mark Steyvers, and Padhraic Smyth.
2004.
The author-topic model for authors and documents.
In Proc. UAI, pp. 487-494.

Ross, Sheldon.
2006.
A First Course in Probability.
Pearson Prentice Hall.

Rusmevichientong, Paat, David M. Pennock, Steve Lawrence, and C. Lee Giles.
2001.
Methods for sampling pages uniformly from the world wide web.
In Proc. AAAI Fall Symposium on Using Uncertainty Within Computation, pp. 121-128.
URL: citeseer.ist.psu.edu/rusmevichientong01methods.html.

Ruthven, Ian, and Mounia Lalmas.
2003.
A survey on the use of relevance feedback for information access systems.
Knowledge Engineering Review 18 (1).

Sahoo, Nachiketa, Jamie Callan, Ramayya Krishnan, George Duncan, and Rema Padman.
2006.
Incremental hierarchical clustering of text documents.
In Proc. CIKM, pp. 357-366.
DOI: doi.acm.org/10.1145/1183614.1183667.

Sakai, Tetsuya.
2007.
On the reliability of information retrieval metrics based on graded relevance.
IP&M 43 (2): 531-548.

Salton, Gerard.
1971a.
Cluster search strategies and the optimization of retrieval effectiveness.
In The SMART Retrieval System - Experiments in Automatic Document Processing Salton (1971b), pp. 223-242.

Salton, Gerarded.).
1971b.
The SMART Retrieval System - Experiments in Automatic Document Processing.
Prentice Hall.

Salton, Gerard.
1975.
Dynamic information and library processing.
Prentice Hall.

Salton, Gerard.
1989.
Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer.
Addison Wesley.

Salton, Gerard.
1991.
The Smart project in automatic document retrieval.
In Proc. SIGIR, pp. 356-358. ACM Press.

Salton, Gerard, James Allan, and Chris Buckley.
1993.
Approaches to passage retrieval in full text information systems.
In Proc. SIGIR, pp. 49-58. ACM Press.
DOI: doi.acm.org/10.1145/160688.160693.

Salton, Gerard, and Chris Buckley.
1987.
Term weighting approaches in automatic text retrieval.
Technical report, Cornell University, Ithaca, NY, USA.

Salton, Gerard, and Christopher Buckley.
1988.
Term-weighting approaches in automatic text retrieval.
IP&M 24 (5): 513-523.

Salton, Gerard, and Chris Buckley.
1990.
Improving retrieval performance by relevance feedback.
JASIS 41 (4): 288-297.

Saracevic, Tefko, and Paul Kantor.
1988.
A study of information seeking and retrieving. II: Users, questions and effectiveness.
JASIS 39: 177-196.

Saracevic, Tefko, and Paul Kantor.
1996.
A study of information seeking and retrieving. III: Searchers, searches, overlap.
JASIS 39 (3): 197-216.

Savaresi, Sergio M., and Daniel Boley.
2004.
A comparative analysis on the bisecting K-means and the PDDP clustering algorithms.
Intelligent Data Analysis 8 (4): 345-362.

Schamber, Linda, Michael Eisenberg, and Michael S. Nilan.
1990.
A re-examination of relevance: toward a dynamic, situational definition.
IP&M 26 (6): 755-776.

Schapire, Robert E.
2003.
The boosting approach to machine learning: An overview.
In D. D. Denison, M. H. Hansen, C. Holmes, B. Mallick, and B. Yu (eds.), Nonlinear Estimation and Classification. Springer.

Schapire, Robert E., and Yoram Singer.
2000.
Boostexter: A boosting-based system for text categorization.
Machine Learning 39 (2/3): 135-168.

Schapire, Robert E., Yoram Singer, and Amit Singhal.
1998.
Boosting and Rocchio applied to text filtering.
In Proc. SIGIR, pp. 215-223. ACM Press.

Schlieder, Torsten, and Holger Meuss.
2002.
Querying and ranking XML documents.
JASIST 53 (6): 489-503.
DOI: dx.doi.org/10.1002/asi.10060.

Scholer, Falk, Hugh E. Williams, John Yiannis, and Justin Zobel.
2002.
Compression of inverted indexes for fast query evaluation.
In Proc. SIGIR, pp. 222-229. ACM Press.
DOI: doi.acm.org/10.1145/564376.564416.

Schölkopf, Bernhard, and Alexander J. Smola.
2001.
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond.
MIT Press.

Schütze, Hinrich.
1998.
Automatic word sense discrimination.
Computational Linguistics 24 (1): 97-124.

Schütze, Hinrich, David A. Hull, and Jan O. Pedersen.
1995.
A comparison of classifiers and document representations for the routing problem.
In Proc. SIGIR, pp. 229-237. ACM Press.

Schütze, Hinrich, and Jan O. Pedersen.
1995.
Information retrieval based on word senses.
In Proc. SDAIR, pp. 161-175.

Schütze, Hinrich, and Craig Silverstein.
1997.
Projections for efficient document clustering.
In Proc. SIGIR, pp. 74-81. ACM Press.

Schwarz, Gideon.
1978.
Estimating the dimension of a model.
Annals of Statistics 6 (2): 461-464.

Sebastiani, Fabrizio.
2002.
Machine learning in automated text categorization.
ACM Computing Surveys 34 (1): 1-47.

Shawe-Taylor, John, and Nello Cristianini.
2004.
Kernel Methods for Pattern Analysis.
Cambridge University Press.

Shkapenyuk, Vladislav, and Torsten Suel.
2002.
Design and implementation of a high-performance distributed web crawler.
In Proc. International Conference on Data Engineering.
URL: citeseer.ist.psu.edu/shkapenyuk02design.html.

Siegel, Sidney, and N. John Castellan, Jr.
1988.
Nonparametric Statistics for the Behavioral Sciences, 2nd edition.
McGraw Hill.

Sifry, Dave, 2007.
The state of the Live Web, April 2007.
URL: technorati.com/weblog/2007/04/328.html.

Sigurbjörnsson, Börkur, Jaap Kamps, and Maarten de Rijke.
2004.
Mixture models, overlap, and structural hints in XML element retrieval.
In Proc. INEX, pp. 196-210.

Silverstein, Craig, Monika Rauch Henzinger, Hannes Marais, and Michael Moricz.
1999.
Analysis of a very large web search engine query log.
SIGIR Forum 33 (1): 6-12.

Silvestri, Fabrizio.
2007.
Sorting out the document identifier assignment problem.
In Proc. ECIR, pp. 101-112.

Silvestri, Fabrizio, Raffaele Perego, and Salvatore Orlando.
2004.
Assigning document identifiers to enhance compressibility of web search engines indexes.
In Proc. ACM Symposium on Applied Computing, pp. 600-605.

Sindhwani, V., and S. S. Keerthi.
2006.
Large scale semi-supervised linear SVMs.
In Proc. SIGIR, pp. 477-484.

Singhal, Amit, Chris Buckley, and Mandar Mitra.
1996a.
Pivoted document length normalization.
In Proc. SIGIR, pp. 21-29. ACM Press.
URL: citeseer.ist.psu.edu/singhal96pivoted.html.

Singhal, Amit, Mandar Mitra, and Chris Buckley.
1997.
Learning routing queries in a query zone.
In Proc. SIGIR, pp. 25-32. ACM Press.

Singhal, Amit, Gerard Salton, and Chris Buckley.
1995.
Length normalization in degraded text collections.
Technical report, Cornell University, Ithaca, NY.

Singhal, Amit, Gerard Salton, and Chris Buckley.
1996b.
Length normalization in degraded text collections.
In Proc. SDAIR, pp. 149-162.

Singitham, Pavan Kumar C., Mahathi S. Mahabhashyam, and Prabhakar Raghavan.
2004.
Efficiency-quality tradeoffs for vector score aggregation.
In Proc. VLDB, pp. 624-635.
URL: www.vldb.org/conf/2004/RS17P1.PDF.

Smeulders, Arnold W. M., Marcel Worring, Simone Santini, Amarnath Gupta, and Ramesh Jain.
2000.
Content-based image retrieval at the end of the early years.
IEEE Trans. Pattern Anal. Mach. Intell. 22 (12): 1349-1380.
DOI: dx.doi.org/10.1109/34.895972.

Sneath, Peter H.A., and Robert R. Sokal.
1973.
Numerical Taxonomy: The Principles and Practice of Numerical Classification.
W.H. Freeman.

Snedecor, George Waddel, and William G. Cochran.
1989.
Statistical methods.
Iowa State University Press.

Somogyi, Zoltan.
1990.
The Melbourne University bibliography system.
Technical Report 90/3, Melbourne University, Parkville, Victoria, Australia.

Song, Ruihua, Ji-Rong Wen, and Wei-Ying Ma.
2005.
Viewing term proximity from a different perspective.
Technical Report MSR-TR-2005-69, Microsoft Research.

Sornil, Ohm.
2001.
Parallel Inverted Index for Large-Scale, Dynamic Digital Libraries.
PhD thesis, Virginia Tech.
URL: scholar.lib.vt.edu/theses/available/etd-02062001-114915/.

Spärck Jones, Karen.
1972.
A statistical interpretation of term specificity and its application in retrieval.
Journal of Documentation 28 (1): 11-21.

Spärck Jones, Karen.
2004.
Language modelling's generative model: Is it rational?
MS, Computer Laboratory, University of Cambridge.
URL: www.cl.cam.ac.uk/~ksj21/langmodnote4.pdf.

Spärck Jones, Karen, S. Walker, and Stephen E. Robertson.
2000.
A probabilistic model of information retrieval: Development and comparative experiments.
IP&M 36 (6): 779-808, 809-840.

Spink, Amanda, and Charles Coleeds.).
2005.
New Directions in Cognitive Information Retrieval.
Springer.

Spink, Amanda, Bernard J. Jansen, and H. Cenk Ozmultu.
2000.
Use of query reformulation and relevance feedback by Excite users.
Internet Research: Electronic Networking Applications and Policy 10 (4): 317-328.
URL: ist.psu.edu/faculty_pages/jjansen/academic/pubs/internetresearch2000.pdf.

Sproat, Richard, and Thomas Emerson.
2003.
The first international Chinese word segmentation bakeoff.
In SIGHAN Workshop on Chinese Language Processing.

Sproat, Richard, William Gale, Chilin Shih, and Nancy Chang.
1996.
A stochastic finite-state word-segmentation algorithm for Chinese.
Computational Linguistics 22 (3): 377-404.

Sproat, Richard William.
1992.
Morphology and computation.
MIT Press.

Stein, Benno, and Sven Meyer zu Eissen.
2004.
Topic identification: Framework and application.
In Proc. International Conference on Knowledge Management.

Stein, Benno, Sven Meyer zu Eissen, and Frank Wißbrock.
2003.
On cluster validity and the information need of users.
In Proc. Artificial Intelligence and Applications.

Steinbach, Michael, George Karypis, and Vipin Kumar.
2000.
A comparison of document clustering techniques.
In KDD Workshop on Text Mining.

Strang, Gilberted.).
1986.
Introduction to Applied Mathematics.
Wellesley-Cambridge Press.

Strehl, Alexander.
2002.
Relationship-based Clustering and Cluster Ensembles for High-dimensional Data Mining.
PhD thesis, The University of Texas at Austin.

Strohman, Trevor, and W. Bruce Croft.
2007.
Efficient document retrieval in main memory.
In Proc. SIGIR, pp. 175-182. ACM Press.

Swanson, Don R.
1988.
Historical note: Information retrieval and the future of an illusion.
JASIS 39 (2): 92-98.

Tague-Sutcliffe, Jean, and James Blustein.
1995.
A statistical analysis of the TREC-3 data.
In Proc. TREC, pp. 385-398.

Tan, Songbo, and Xueqi Cheng.
2007.
Using hypothesis margin to boost centroid text classifier.
In Proc. ACM Symposium on Applied Computing, pp. 398-403. ACM Press.
DOI: doi.acm.org/10.1145/1244002.1244096.

Tannier, Xavier, and Shlomo Geva.
2005.
XML retrieval with a natural language interface.
In Proc. SPIRE, pp. 29-40.

Tao, Tao, Xuanhui Wang, Qiaozhu Mei, and ChengXiang Zhai.
2006.
Language model information retrieval with document expansion.
In Proc. Human Language Technology Conference / North American Chapter of the Association for Computational Linguistics, pp. 407-414.

Taube, Mortimer, and Harold Woostereds.).
1958.
Information storage and retrieval: Theory, systems, and devices.
Columbia University Press.

Taylor, Michael, Hugo Zaragoza, Nick Craswell, Stephen Robertson, and Chris Burges.
2006.
Optimisation methods for ranking functions with multiple parameters.
In Proc. CIKM. ACM Press.

Teh, Yee Whye, Michael I. Jordan, Matthew J. Beal, and David M. Blei.
2006.
Hierarchical Dirichlet processes.
Journal of the American Statistical Association 101 (476): 1566-1581.

Theobald, Martin, Holger Bast, Debapriyo Majumdar, Ralf Schenkel, and Gerhard Weikum.
2008.
TopX: Efficient and versatile top-k query processing for semistructured data.
VLDB Journal 17 (1): 81-115.

Theobald, Martin, Ralf Schenkel, and Gerhard Weikum.
2005.
An efficient and versatile query engine for TopX search.
In Proc. VLDB, pp. 625-636. VLDB Endowment.

Tibshirani, Robert, Guenther Walther, and Trevor Hastie.
2001.
Estimating the number of clusters in a data set via the gap statistic.
Journal of the Royal Statistical Society Series B 63: 411-423.

Tishby, Naftali, and Noam Slonim.
2000.
Data clustering by Markovian relaxation and the information bottleneck method.
In Proc. NIPS, pp. 640-646.

Toda, Hiroyuki, and Ryoji Kataoka.
2005.
A search result clustering method using informatively named entities.
In International Workshop on Web Information and Data Management, pp. 81-86. ACM Press.
DOI: doi.acm.org/10.1145/1097047.1097063.

Tomasic, Anthony, and Hector Garcia-Molina.
1993.
Query processing and inverted indices in shared-nothing document information retrieval systems.
VLDB Journal 2 (3): 243-275.

Tombros, Anastasios, and Mark Sanderson.
1998.
Advantages of query biased summaries in information retrieval.
In Proc. SIGIR, pp. 2-10. ACM Press.
DOI: doi.acm.org/10.1145/290941.290947.

Tombros, Anastasios, Robert Villa, and Cornelis Joost van Rijsbergen.
2002.
The effectiveness of query-specific hierarchic clustering in information retrieval.
IP&M 38 (4): 559-582.
DOI: dx.doi.org/10.1016/S0306-4573(01)00048-6.

Tomlinson, Stephen.
2003.
Lexical and algorithmic stemming compared for 9 European languages with Hummingbird Searchserver at CLEF 2003.
In Proc. Cross-Language Evaluation Forum, pp. 286-300.

Tong, Simon, and Daphne Koller.
2001.
Support vector machine active learning with applications to text classification.
JMLR 2: 45-66.

Toutanova, Kristina, and Robert C. Moore.
2002.
Pronunciation modeling for improved spelling correction.
In Proc. ACL, pp. 144-151.

Treeratpituk, Pucktada, and Jamie Callan.
2006.
An experimental study on automatically labeling hierarchical clusters using statistical features.
In Proc. SIGIR, pp. 707-708. ACM Press.
DOI: doi.acm.org/10.1145/1148170.1148328.

Trotman, Andrew.
2003.
Compressing inverted files.
IR 6 (1): 5-19.
DOI: dx.doi.org/10.1023/A:1022949613039.

Trotman, Andrew, and Shlomo Geva.
2006.
Passage retrieval and other XML-retrieval tasks.
In SIGIR 2006 Workshop on XML Element Retrieval Methodology, pp. 43-50.

Trotman, Andrew, Shlomo Geva, and Jaap Kampseds.).
2007.
SIGIR Workshop on Focused Retrieval. University of Otago.

Trotman, Andrew, Nils Pharo, and Miro Lehtonen.
2006.
XML-IR users and use cases.
In Proc. INEX, pp. 400-412.

Trotman, Andrew, and Börkur Sigurbjörnsson.
2004.
Narrowed Extended XPath I (NEXI).
In Fuhr et al. (2005), pp. 16-40.
DOI: dx.doi.org/10.1007/11424550_2.

Tseng, Huihsin, Pichuan Chang, Galen Andrew, Daniel Jurafsky, and Christopher Manning.
2005.
A conditional random field word segmenter.
In SIGHAN Workshop on Chinese Language Processing.

Tsochantaridis, Ioannis, Thorsten Joachims, Thomas Hofmann, and Yasemin Altun.
2005.
Large margin methods for structured and interdependent output variables.
JMLR 6: 1453-1484.

Turpin, Andrew, and William R. Hersh.
2001.
Why batch and user evaluations do not give the same results.
In Proc. SIGIR, pp. 225-231.

Turpin, Andrew, and William R. Hersh.
2002.
User interface effects in past batch versus user experiments.
In Proc. SIGIR, pp. 431-432.

Turpin, Andrew, Yohannes Tsegay, David Hawking, and Hugh E. Williams.
2007.
Fast generation of result snippets in web search.
In Proc. SIGIR, pp. 127-134. ACM Press.

Turtle, Howard.
1994.
Natural language vs. Boolean query evaluation: A comparison of retrieval performance.
In Proc. SIGIR, pp. 212-220. ACM Press.

Turtle, Howard, and W. Bruce Croft.
1989.
Inference networks for document retrieval.
In Proc. SIGIR, pp. 1-24. ACM Press.

Turtle, Howard, and W. Bruce Croft.
1991.
Evaluation of an inference network-based retrieval model.
TOIS 9 (3): 187-222.

Turtle, Howard, and James Flood.
1995.
Query evaluation: strategies and optimizations.
IP&M 31 (6): 831-850.
DOI: dx.doi.org/10.1016/0306-4573(95)00020-H.

Vaithyanathan, Shivakumar, and Byron Dom.
2000.
Model-based hierarchical clustering.
In Proc. UAI, pp. 599-608. Morgan Kaufmann.

van Rijsbergen, Cornelis Joost.
1979.
Information Retrieval, 2nd edition.
Butterworths.

van Rijsbergen, Cornelis Joost.
1989.
Towards an information logic.
In Proc. SIGIR, pp. 77-86. ACM Press.
DOI: doi.acm.org/10.1145/75334.75344.

van Zwol, Roelof, Jeroen Baas, Herre van Oostendorp, and Frans Wiering.
2006.
Bricks: The building blocks to tackle query formulation in structured document retrieval.
In Proc. ECIR, pp. 314-325.

Vapnik, Vladimir N.
1998.
Statistical Learning Theory.
Wiley-Interscience.

Vittaut, Jean-Noël, and Patrick Gallinari.
2006.
Machine learning ranking for structured information retrieval.
In Proc. ECIR, pp. 338-349.

Voorhees, Ellen M.
1985a.
The cluster hypothesis revisited.
In Proc. SIGIR, pp. 188-196. ACM Press.

Voorhees, Ellen M.
1985b.
The effectiveness and efficiency of agglomerative hierarchic clustering in document retrieval.
Technical Report TR 85-705, Cornell.

Voorhees, Ellen M.
2000.
Variations in relevance judgments and the measurement of retrieval effectiveness.
IP&M 36: 697-716.

Voorhees, Ellen M., and Donna Harmaneds.).
2005.
TREC: Experiment and Evaluation in Information Retrieval.
MIT Press.

Wagner, Robert A., and Michael J. Fischer.
1974.
The string-to-string correction problem.
JACM 21 (1): 168-173.
DOI: doi.acm.org/10.1145/321796.321811.

Ward Jr., J. H.
1963.
Hierarchical grouping to optimize an objective function.
Journal of the American Statistical Association 58: 236-244.

Wei, Xing, and W. Bruce Croft.
2006.
LDA-based document models for ad-hoc retrieval.
In Proc. SIGIR, pp. 178-185. ACM Press.
DOI: doi.acm.org/10.1145/1148170.1148204.

Weigend, Andreas S., Erik D. Wiener, and Jan O. Pedersen.
1999.
Exploiting hierarchy in text categorization.
IR 1 (3): 193-216.

Weston, Jason, and Chris Watkins.
1999.
Support vector machines for multi-class pattern recognition.
In Proc. European Symposium on Artificial Neural Networks, pp. 219-224.

Williams, Hugh E., and Justin Zobel.
2005.
Searchable words on the web.
International Journal on Digital Libraries 5 (2): 99-105.
DOI: dx.doi.org/10.1007/s00799-003-0050-z.

Williams, Hugh E., Justin Zobel, and Dirk Bahle.
2004.
Fast phrase querying with combined indexes.
TOIS 22 (4): 573-594.

Witten, Ian H., and Timothy C. Bell.
1990.
Source models for natural language text.
International Journal Man-Machine Studies 32 (5): 545-579.

Witten, Ian H., and Eibe Frank.
2005.
Data Mining: Practical Machine Learning Tools and Techniques, 2nd edition.
Morgan Kaufmann.

Witten, Ian H., Alistair Moffat, and Timothy C. Bell.
1999.
Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edition.
Morgan Kaufmann.

Wong, S. K. Michael, Yiyu Yao, and Peter Bollmann.
1988.
Linear structure in information retrieval.
In Proc. SIGIR, pp. 219-232. ACM Press.

Woodley, Alan, and Shlomo Geva.
2006.
NLPX at INEX 2006.
In Proc. INEX, pp. 302-311.

Xu, Jinxi, and W. Bruce Croft.
1996.
Query expansion using local and global document analysis.
In Proc. SIGIR, pp. 4-11. ACM Press.

Xu, Jinxi, and W. Bruce Croft.
1999.
Cluster-based language models for distributed retrieval.
In Proc. SIGIR, pp. 254-261. ACM Press.
DOI: doi.acm.org/10.1145/312624.312687.

Yang, Hui, and Jamie Callan.
2006.
Near-duplicate detection by instance-level constrained clustering.
In Proc. SIGIR, pp. 421-428. ACM Press.
DOI: doi.acm.org/10.1145/1148170.1148243.

Yang, Yiming.
1994.
Expert network: Effective and efficient learning from human decisions in text categorization and retrieval.
In Proc. SIGIR, pp. 13-22. ACM Press.

Yang, Yiming.
1999.
An evaluation of statistical approaches to text categorization.
IR 1: 69-90.

Yang, Yiming.
2001.
A study of thresholding strategies for text categorization.
In Proc. SIGIR, pp. 137-145. ACM Press.
DOI: doi.acm.org/10.1145/383952.383975.

Yang, Yiming, and Bryan Kisiel.
2003.
Margin-based local regression for adaptive filtering.
In Proc. CIKM, pp. 191-198.
DOI: doi.acm.org/10.1145/956863.956902.

Yang, Yiming, and Xin Liu.
1999.
A re-examination of text categorization methods.
In Proc. SIGIR, pp. 42-49. ACM Press.

Yang, Yiming, and Jan Pedersen.
1997.
Feature selection in statistical learning of text categorization.
In Proc. ICML.

Yue, Yisong, Thomas Finley, Filip Radlinski, and Thorsten Joachims.
2007.
A support vector method for optimizing average precision.
In Proc. SIGIR. ACM Press.

Zamir, Oren, and Oren Etzioni.
1999.
Grouper: A dynamic clustering interface to web search results.
In Proc. WWW, pp. 1361-1374. Elsevier North-Holland.
DOI: dx.doi.org/10.1016/S1389-1286(99)00054-7.

Zaragoza, Hugo, Djoerd Hiemstra, Michael Tipping, and Stephen Robertson.
2003.
Bayesian extension to the language model for ad hoc information retrieval.
In Proc. SIGIR, pp. 4-9. ACM Press.

Zavrel, Jakub, Peter Berck, and Willem Lavrijssen.
2000.
Information extraction by text classification: Corpus mining for features.
In Workshop Information Extraction Meets Corpus Linguistics.
URL: www.cnts.ua.ac.be/Publications/2000/ZBL00.
Held in conjunction with LREC-2000.

Zha, Hongyuan, Xiaofeng He, Chris H. Q. Ding, Ming Gu, and Horst D. Simon.
2001.
Bipartite graph partitioning and data clustering.
In Proc. CIKM, pp. 25-32.

Zhai, Chengxiang, and John Lafferty.
2001a.
Model-based feedback in the language modeling approach to information retrieval.
In Proc. CIKM. ACM Press.

Zhai, Chengxiang, and John Lafferty.
2001b.
A study of smoothing methods for language models applied to ad hoc information retrieval.
In Proc. SIGIR, pp. 334-342. ACM Press.

Zhai, ChengXiang, and John Lafferty.
2002.
Two-stage language models for information retrieval.
In Proc. SIGIR, pp. 49-56. ACM Press.
DOI: doi.acm.org/10.1145/564376.564387.

Zhang, Jiangong, Xiaohui Long, and Torsten Suel.
2007.
Performance of compressed inverted list caching in search engines.
In Proc. CIKM.

Zhang, Tong, and Frank J. Oles.
2001.
Text categorization based on regularized linear classification methods.
IR 4 (1): 5-31.
URL: citeseer.ist.psu.edu/zhang00text.html.

Zhao, Ying, and George Karypis.
2002.
Evaluation of hierarchical clustering algorithms for document datasets.
In Proc. CIKM, pp. 515-524. ACM Press.
DOI: doi.acm.org/10.1145/584792.584877.

Zipf, George Kingsley.
1949.
Human Behavior and the Principle of Least Effort.
Addison Wesley.

Zobel, Justin.
1998.
How reliable are the results of large-scale information retrieval experiments?
In Proc. SIGIR, pp. 307-314.

Zobel, Justin, and Philip Dart.
1995.
Finding approximate matches in large lexicons.
Software Practice and Experience 25 (3): 331-345.
URL: citeseer.ifi.unizh.ch/zobel95finding.html.

Zobel, Justin, and Philip Dart.
1996.
Phonetic string matching: Lessons from information retrieval.
In Proc. SIGIR, pp. 166-173. ACM Press.

Zobel, Justin, and Alistair Moffat.
2006.
Inverted files for text search engines.
ACM Computing Surveys 38 (2).

Zobel, Justin, Alistair Moffat, Ross Wilkinson, and Ron Sacks-Davis.
1995.
Efficient retrieval of partial documents.
IP&M 31 (3): 361-377.
DOI: dx.doi.org/10.1016/0306-4573(94)00052-5.

Zukowski, Marcin, Sandor Heman, Niels Nes, and Peter Boncz.
2006.
Super-scalar RAM-CPU cache compression.
In Proc. International Conference on Data Engineering, p. 59. IEEE Computer Society.
DOI: dx.doi.org/10.1109/ICDE.2006.150.



© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.
2009-04-07