Start a new topic
Answered

Proxy Service - Connection gets closed in 7 out of 10 cases

Hi everyone,
I've got an issue with a resource that apparently blocks requests from AWS IP addresses.

Here's a curl command to reproduce the issue
curl --insecure -x proxy.crawlera.com:8010 -U <API_KEY>: https://shop.lululemon.com/p/girls-tanks/Double-Dutch-Tank/_/prod8841159

Note: it works occasionally, but is very unstable. I've got 7 closed connections out of 10 requests.


Best Answer

It's not blocking of IPs from AWS, the website is expecting headers like:


  • Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
  • Accept-Encoding: gzip, deflate, br
  • Accept-Language: en-US,en;q=0.9

that should be included in the request.


Sorry, it doesn't help.


Could you run multiple requests in a row, please?

Can you confirm that let's say all the 10 requests out of 10 were processed correctly and response time was reasonable for all of them?


As far as I understand, the way my request gets handled depends on which proxy server is picked up by the load balancer. I guess some of them are hosted on aws which IP addresses are blacklisted by the target site shop.lululemon.com.

In such cases, turnaround time might be unacceptable.


Please, take a look. This particular request took 6,5 minutes.


time curl -k -U <API_KEY>: -ix proxy.crawlera.com:8010 https://shop.lululemon.com/sitemap.xml -H "accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8" -H "accept-encoding:gzip, deflate, br" -H "accept-language:en-US,en;q=0.9"
HTTP/1.1 200 OK
HTTP/1.1 200 OK
accept-ranges: bytes
cache-control: max-age=0
Connection: close
content-language: en
content-length: 650
content-type: application/xml
date: Tue, 09 Jan 2018 11:39:21 GMT
etag: "31204d-28a-562534d67a900"
expires: Tue, 09 Jan 2018 11:39:21 GMT
last-modified: Tue, 09 Jan 2018 08:00:04 GMT
Proxy-Connection: close
server: Oracle-HTTP-Server-12c
set-cookie: BIGipServeratg-o2-prod-lulu_oracleoutsourcing_com_3015=974034049.50955.0000; path=/
set-cookie: ltmo=2; path=/; domain=.lululemon.com
set-cookie: luludom=lululemon.com; path=/; domain=.lululemon.com
set-cookie: akavpau_prod_browse=1515498261~id=77cc1150d1e689eaccf22c56707ef11e; Path=/
strict-transport-security: max-age=86400
X-Crawlera-Slave: 91.232.97.4:60099
X-Crawlera-Version: 1.30.39-325aa6
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap><loc>https://shop.lululemon.com/sitemap/Product_Sitemap_US.xml</loc><lastmod>2018-01-09T01-00-04-07:00</lastmod></sitemap> <sitemap><loc>https://shop.lululemon.com/sitemap/Category_Sitemap_US.xml</loc><lastmod>2018-01-09T01-00-04-07:00</lastmod></sitemap> <sitemap><loc>https://shop.lululemon.com/sitemap/Product_Sitemap_CA.xml</loc><lastmod>2018-01-09T01-00-04-07:00</lastmod></sitemap> <sitemap><loc>https://shop.lululemon.com/sitemap/Category_Sitemap_CA.xml</loc><lastmod>2018-01-09T01-00-04-07:00</lastmod></sitemap></sitemapindex>
real 6m24.963s
user 0m0.192s
sys 0m0.084s

Answer

It's not blocking of IPs from AWS, the website is expecting headers like:


  • Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
  • Accept-Encoding: gzip, deflate, br
  • Accept-Language: en-US,en;q=0.9

that should be included in the request.

Login to post a comment