next up previous contents index
Next: Features a crawler should Up: Overview Previous: Overview   Contents   Index

Features a crawler must provide

We list the desiderata for web crawlers in two categories: features that web crawlers must provide, followed by features they should provide.
The Web contains servers that create spider traps, which are generators of web pages that mislead crawlers into getting stuck fetching an infinite number of pages in a particular domain. Crawlers must be designed to be resilient to such traps. Not all such traps are malicious; some are the inadvertent side-effect of faulty website development.
Web servers have both implicit and explicit policies regulating the rate at which a crawler can visit them. These politeness policies must be respected.

© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.