0
kzrster 1 week ago in Scrapy Cloud • updated 1 week ago 1

Hi !
I needed to scrape site which have many JS code. So I use scrapy+selenium. Aslo it should run at Scrapy Cloud.
I've write spider which uses scrapy+selenuim+phantomjs and run it on my local machine. All is ok.
Then I deployed project to Scrapy cloud using shub-image. Deployment is ok. But results of
webdriver.page_source is different. It's ok on local, not ok(HTML with inscription - 403, request 200 http) at cloud.
Then I decided to use crawlera acc. I've added it with:

service_args = [

            '--proxy="proxy.crawlera.com:8010"',
'--proxy-type=https',
'--proxy-auth="apikey"',
]


for Windows(local)
self.driver = webdriver.PhantomJS(executable_path=r'D:\programms\phantomjs-2.1.1-windows\bin\phantomjs.exe',service_args=service_args)


for docker

self.driver = webdriver.PhantomJS(executable_path=r'/usr/bin/phantomjs', service_args=service_args, desired_capabilities=dcap)

Again at local all is ok. Cloud not ok.
I've checked cralwera info. It's ok. Requests sends from both(local and cloud).

I dont get what's wrong.
I think It might be differences between phantomjs versions(Windows, Linux).

Any ideas?










I didnt want to post in Ideas(
Cant fix it now.