I needed to scrape site which have many JS code. So I use scrapy+selenium. Aslo it should run at Scrapy Cloud.
I've write spider which uses scrapy+selenuim+phantomjs and run it on my local machine. All is ok.
Then I deployed project to Scrapy cloud using shub-image. Deployment is ok. But results of
webdriver.page_source is different. It's ok on local, not ok(HTML with inscription - 403, request 200 http) at cloud.
Then I decided to use crawlera acc. I've added it with:
service_args = [
'--proxy="proxy.crawlera.com:8010"', '--proxy-type=https', '--proxy-auth="apikey"', ]
self.driver = webdriver.PhantomJS(executable_path=r'D:\programms\phantomjs-2.1.1-windows\bin\phantomjs.exe',service_args=service_args)
self.driver = webdriver.PhantomJS(executable_path=r'/usr/bin/phantomjs', service_args=service_args, desired_capabilities=dcap)
Again at local all is ok. Cloud not ok.
I've checked cralwera info. It's ok. Requests sends from both(local and cloud).
I dont get what's wrong.
I think It might be differences between phantomjs versions(Windows, Linux).
Customer support service by UserEcho