Features a crawler must provide

We list the desiderata for web crawlers in two categories: features that web crawlers must provide, followed by features they should provide.
The Web contains servers that create spider traps, which are generators of web pages that mislead crawlers into getting stuck fetching an infinite number of pages in a particular domain. Crawlers must be designed to be resilient to such traps. Not all such traps are malicious; some are the inadvertent side-effect of faulty website development.
Web servers have both implicit and explicit policies regulating the rate at which a crawler can visit them. These politeness policies must be respected.

