When Crawlera gets a ban from a target website, it automatically retries the request from another proxy IP. By default, Crawlera re-tries 5 times to retrieve the content, and if it still fails, it generates the status code 503. Scrapinghub constantly refreshes the proxy pool and configures specific settings for websites that are difficult to crawl. If you get significant 503 bans in spite of these features, you can consider the following approaches to improve your crawl rates. Please note that a small number of bans are expected for any crawl as Crawlera adapts to use the best settings for each site. The responses with 503 codes will not be billed to you.
You will see this HTTP response header when Crawlera generates a 503 after retries.
Your client can re-try the request after a wait time that you can configure in your client or reduce the crawl rate to see if there are improvements. You can use the following best practices to reduce the occurrences of bans. Crawlera can return 503s with busy domains such as amazon, google even after trying many outgoing nodes, the only thing we can do is retry.
1) You can try using different headers that provide you more options to circumvent bans to ensure better performance and higher success rate. Some of these headers are available only for higher plans.
2) You can use the following curl command to verify headers that belong to respective plans.
curl -vx proxy.crawlera.com:8010 -U <API key>: http://httpbin.org/ip
This can be followed by checking the appropriate header according to the plan we have. You can find more information on profile headers in this article https://support.scrapinghub.com/a/solutions/articles/22000223256
3) If cookies are getting handled on the client side, you need to send X-Crawlera-Cookies to disable cookies on the Crawlera side.
4) If mobile apps are incorporated, you should use Crawlera mobile profiles "X-Crawlera-Profile: mobile" without sessions. Rotating the user agents is the best practice can be followed.
5) If you require special proxies other than datacenter IPs, you can submit a support ticket to explore the alternatives. We can suggest a proper plan suitable to your requirements.
6) If you require developer assistance to get the data you need, you can submit a request here https://scrapinghub.com/crawlera-quote for our Professional Services.