When Crawlera gets a ban from a target website, it automatically retries the request from another proxy IP. By default, Crawlera re-tries 5 times to retrieve the content, and if it still fails, it generates the status code 503. Scrapinghub constantly refreshes the proxy pool and configures specific settings for websites that are difficult to crawl. If you get significant 503 bans in spite of these features, you can consider the following approaches to improve your crawl rates. Please note that a small number of bans are expected for any crawl as Crawlera adapts to use the best settings for each site. The responses with 503 codes will not be billed to you.
You will see this HTTP response header when Crawlera generates a 503 after retries:
Your client can retry the request after a wait time that you can configure in your client, or reduce the crawl rate to see if there are improvements. Crawlera can return 503s with busy domains such as amazon and google, even after trying many outgoing nodes. The only thing we can do is retry.
You can use the following best practices to reduce the occurrences of bans:
1) Try using different headers that provide you more options to circumvent bans to ensure better performance and higher success rate. Some of these headers are available only for higher plans.
|X-Crawlera-Profile (pass) / X-Crawlera-Profile-Pass||✔|
2) Use the following cURL command to verify headers that belong to respective plans:
curl -v -U <API_KEY>: -x proxy.crawlera.com:8010 http://httpbin.org/headers
This can be followed by checking the appropriate header according to the plan we have. You can find more information on profile headers in this article.
3) If cookies are getting handled on the client side, you need to send X-Crawlera-Cookies to disable cookies on the Crawlera side.
4) If mobile apps are incorporated, you should use Crawlera mobile profiles (by means of X-Crawlera-Profile: mobile header) without sessions. Rotating the user agents is the best practice that can be followed.
5) If you require special proxies other than datacenter IPs, you can submit a support ticket to explore the alternatives. We can suggest a proper plan suitable to your requirements.
6) If you require developer assistance to get the data you need, you can submit a request here https://scrapinghub.com/crawlera-quote for our Data Services.