0
Not a bug
Braulio Ríos Ferreira 3 days ago in Crawlera • updated by Pablo Vaz (Support Engineer) yesterday at 6:56 p.m. 1

The spider test code is the following (I've removed irrelevant code, but this spider is tested and reproduces the same error):


# -*- coding: utf-8 -*-from scrapy import Request
from scrapy.spiders import Spider

class AlohaTestSpider(Spider):
    name = "aloha_test"

    def __init__(self, *args, **kwargs):
        super(AlohaTestSpider, self).__init__(*args, **kwargs)

    def start_requests(self):
        site = 'https://aroogas.alohaorderonline.com/OrderEntryService.asmx/GetSiteList/'
        yield Request(url=site,
                      method='POST',
                      callback=self.parse,
                      headers={"Content-Type": "application/json"})

    def parse(self, response):
        print(response.body)

When I run this spider:

$ scrapy crawl aloha_test


I keep getting the following error:

2017-03-20 12:33:11 [scrapy] DEBUG: Retrying <POST https://aroogas.alohaorderonline.com/OrderEntryService.asmx/GetSiteList/> (failed 1 times): 400 Bad Request


In the original spider, I have a retry decorator, and this errors repeats for 10 retries.


I only get this error with this specific request. In the real spider, which has more https requests before, It only fails when this request is reached (previous https requests return 200 OK).


Please note that this is a POST request that doesn't have any data. I don't know if this is relevant to you, but this is the only particularity that this request has in my spider.


If I deactivate "CrawleraMiddleware" and activate "CustomHttpProxyMiddleware" in DOWNLOADER_MIDDLEWARES (settings.py), I can make the request without error.


If I make this request using curl, I can't reproduce this error even when using crawlera, I mean that both of the following requests work fine:


$ curl --cacert ~/crawlera-ca.crt -H 'Content-Type: application/json' -H 'Content-Length: 0' -X POST -vx proxy.crawlera.com:8010 -U MY_API_KEY https://aroogas.alohaorderonline.com/OrderEntryService.asmx/GetSiteList


$ curl -H 'Content-Type: application/json' -H 'Content-Length: 0' -X POST https://aroogas.alohaorderonline.com/OrderEntryService.asmx/GetSiteList


I've tried everything in my imagination (Crawlera sessions, Crawlera cookies disabled, different types of http headers, but I can't figure out a way to get this request to work with Crawlera.


I guess it has to do with the Crawlera Middleware in Scrapy, but I don't know what sort of magic with the http headers might Crawlera be doing that is causing this request to fail.

Any suggestions about what could be causing this error?

Answer

Answer
Not a bug

Hi Braulio,


As you corrected tested wit Curl, seems your Crawlera account is working fine.

Also, all projects using Scrapy-Crawlera integration are working fine in our platform.


About the integration with Scrapy, I can suggest you to review the information provided here:

http://help.scrapinghub.com/crawlera/using-crawlera-with-scrapy


And to know even more please see the official documentation:

http://scrapy-crawlera.readthedocs.io/en/latest/


If your project needs urgent attention, you can also consider to hire our experts. We can set up Scrapy-Crawlera projects that fits your needs saving you a lot of time and resources. If interested, let me invite you to fill our free quote request: https://scrapinghub.com/quote


Best regards,


Pablo Vaz

Support team

Answer
Not a bug

Hi Braulio,


As you corrected tested wit Curl, seems your Crawlera account is working fine.

Also, all projects using Scrapy-Crawlera integration are working fine in our platform.


About the integration with Scrapy, I can suggest you to review the information provided here:

http://help.scrapinghub.com/crawlera/using-crawlera-with-scrapy


And to know even more please see the official documentation:

http://scrapy-crawlera.readthedocs.io/en/latest/


If your project needs urgent attention, you can also consider to hire our experts. We can set up Scrapy-Crawlera projects that fits your needs saving you a lot of time and resources. If interested, let me invite you to fill our free quote request: https://scrapinghub.com/quote


Best regards,


Pablo Vaz

Support team