Crawlera is a service to download web pages, that supports an HTTP Proxy API.


Standard proxy providers typically provide a pool of IPs running simple HTTP proxies (using Squid or similar software) whereas Crawlera downloads web pages, distributing requests among many nodes, keeping track of which nodes are blacklisted (per domain), and throttling them to make sure domains crawled politely, which minimizes the risk of getting your crawler banned.


With proxy providers, you have to implement the throttling and blacklisting logic yourself. With Crawlera you only configure your crawler to download pages through Crawlera proxy and forget about throttling or implementing anti-ban policies. Crawlera enables you to crawl as fast as possible without causing any disruption to the sites.