next up previous contents index
Next: Features a crawler should Up: Overview Previous: Overview   Contents   Index


Features a crawler must provide

We list the desiderata for web crawlers in two categories: features that web crawlers must provide, followed by features they should provide.
Robustness:
The Web contains servers that create spider traps, which are generators of web pages that mislead crawlers into getting stuck fetching an infinite number of pages in a particular domain. Crawlers must be designed to be resilient to such traps. Not all such traps are malicious; some are the inadvertent side-effect of faulty website development.
Politeness:
Web servers have both implicit and explicit policies regulating the rate at which a crawler can visit them. These politeness policies must be respected.



© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.
2009-04-07