next up previous contents index
Next: Bibliography Up: Link analysis Previous: Choosing the subset of   Contents   Index

References and further reading

Garfield (1955) is seminal in the science of citation analysis. This was built on by Pinski and Narin (1976) to develop a journal influence weight, whose definition is remarkably similar to that of the PageRank measure.

The use of anchor text as an aid to searching and ranking stems from the work of McBryan (1994). Extended anchor-text was implicit in his work, with systematic experiments reported in Chakrabarti et al. (1998).

Kemeny and Snell (1976) is a classic text on Markov chains. The PageRank measure was developed in Brin and Page (1998) and in Page et al. (1998). A number of methods for the fast computation of PageRank values are surveyed in Berkhin (2005) and in Langville and Meyer (2006); the former also details how the PageRank eigenvector solution may be viewed as solving a linear system, leading to one way of solving Exercise 21.2.3 . The effect of the teleport probability $\alpha$ has been studied by Baeza-Yates et al. (2005) and by Boldi et al. (2005). Topic-specific PageRank and variants were developed in Haveliwala (2002), Haveliwala (2003) and in Jeh and Widom (2003). Berkhin (2006a) develops an alternate view of topic-specific PageRank.

Ng et al. (2001b) suggests that the PageRank score assignment is more robust than HITS in the sense that scores are less sensitive to small changes in graph topology. However, it has also been noted that the teleport operation contributes significantly to PageRank's robustness in this sense. Both PageRank and HITS can be ``spammed'' by the orchestrated insertion of links into the web graph; indeed, the Web is known to have such link farms that collude to increase the score assigned to certain pages by various link analysis algorithms.

The HITS algorithm is due to Kleinberg (1999). Chakrabarti et al. (1998) developed variants that weighted links in the iterative computation based on the presence of query terms in the pages being linked and compared these to results from several web search engines. Bharat and Henzinger (1998) further developed these and other heuristics, showing that certain combinations outperformed the basic HITS algorithm. Borodin et al. (2001) provides a systematic study of several variants of the HITS algorithm. Ng et al. (2001b) introduces a notion of stability for link analysis, arguing that small changes to link topology should not lead to significant changes in the ranked list of results for a query. Numerous other variants of HITS have been developed by a number of authors, the best know of which is perhaps SALSA (Lempel and Moran, 2000).

We use the following abbreviated journal and conference names in the bibliography:

Communications of the Association for Computing Machinery.
Information Processing and Management.
Information Retrieval.
Journal of the Association for Computing Machinery.
Journal of the American Society for Information Science.
Journal of the American Society for Information Science and Technology.
Journal of Machine Learning Research.
ACM Transactions on Information Systems.
Proc. ACL
Proceedings of the Annual Meeting of the Association for Computational Linguistics. Available from:
Proc. CIKM
Proceedings of the ACM CIKM Conference on Information and Knowledge Management. ACM Press.
Proc. ECIR
Proceedings of the European Conference on Information Retrieval.
Proc. ECML
Proceedings of the European Conference on Machine Learning.
Proc. ICML
Proceedings of the International Conference on Machine Learning.
Proceedings of the International Joint Conference on Artificial Intelligence.
Proc. INEX
Proceedings of the Initiative for the Evaluation of XML Retrieval.
Proc. KDD
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
Proc. NIPS
Proceedings of the Neural Information Processing Systems Conference.
Proc. PODS
Proceedings of the ACM Conference on Principles of Database Systems.
Proceedings of the Annual Symposium on Document Analysis and Information Retrieval.
Proceedings of the Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval. Available from:
Proceedings of the Symposium on String Processing and Information Retrieval.
Proc. TREC
Proceedings of the Text Retrieval Conference.
Proc. UAI
Proceedings of the Conference on Uncertainty in Artificial Intelligence.
Proc. VLDB
Proceedings of the Very Large Data Bases Conference.
Proc. WWW
Proceedings of the International World Wide Web Conference.

next up previous contents index
Next: Bibliography Up: Link analysis Previous: Choosing the subset of   Contents   Index
© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.