0
Answered
ks446 4 weeks ago in Scrapy Cloud • updated by Pablo Vaz (Support Engineer) 3 weeks ago 1

I have a scrapy script running on scrapinghub. The scraper takes one argument as a csv file where the urls have been stored. The script runs without error, but the problem is that it isn't scraping all the items from the url. I have no idea why this is ha

Answer

Answer
Answered

Hey ks446,


It could be for many reasons. To discard any issue with your deploy, you can run your script locally and check if works fine extracting all items.


If works fine, check how much time spend the spider to run and check in the script if there aren't infinite loops or something related that could extend the time pushing the job to cancel due no new items extracted.


Finally, consider that the site itself could be banning your spider. The only solution for this case is to use our proxy rotator Crawlera to make requests from different IPs. If interested to know more, please check:

What is Crawlera?


Best regards!


Pablo

Answer
Answered

Hey ks446,


It could be for many reasons. To discard any issue with your deploy, you can run your script locally and check if works fine extracting all items.


If works fine, check how much time spend the spider to run and check in the script if there aren't infinite loops or something related that could extend the time pushing the job to cancel due no new items extracted.


Finally, consider that the site itself could be banning your spider. The only solution for this case is to use our proxy rotator Crawlera to make requests from different IPs. If interested to know more, please check:

What is Crawlera?


Best regards!


Pablo