Scrapy Cloud Advanced Topics

Here you'll find articles on advanced settings and features of Scrapy Cloud.

Publishing and sharing datasets
Note: Portia is no longer available for new users. It has been disabled for all the new organisations from August 20, 2018 onward. You’ve gone through the...
Thu, 18 Oct, 2018 at 11:17 AM
Deploying Custom Docker images on Scrapy Cloud
⚠ Note: this is an advanced feature in beta stage. Use with care. Scrapy Cloud runs your spiders in Docker containers and allows you to build custom images...
Tue, 19 Sep, 2017 at 12:40 PM
Errors while deploying Custom Image to Scrapy Cloud
While deploying custom Docker images to Scrapy Cloud there're some known issues. We are actively working on getting it resolved, but until it's co...
Thu, 18 Apr, 2019 at 6:34 PM
Inspecting your spider's runtime environment with the Job Console
With the job console you can open a Unix shell directly into the container where your job is running. Once in the console, you can perform tasks such as: ...
Mon, 11 Jun, 2018 at 5:42 PM
Deploying private dependencies to Scrapy Cloud
This article presents some approaches on how to use private dependencies in your Scrapy Cloud project. Using requirements.txt Let's assume your...
Mon, 22 May, 2017 at 4:05 PM
Configuring scraped fields
In the Job page you will find the Fields box, which is also available in the items browser (but hidden by default). It looks like this: The Fields b...
Tue, 28 Mar, 2017 at 12:48 PM
Versioning your deploys to Scrapy Cloud
Shub assigns a version number to your project every time you make a deploy to Scrapy Cloud. The version assigned depends on whether you are using a VCS or n...
Mon, 15 May, 2017 at 3:51 PM
Reset db using DeltaFetch Add-on
In some occasions you may experience errors using DeltaFetch due the interactions with files in S3. Your output may show errors like this: DBRunReco...
Wed, 20 Sep, 2017 at 11:55 AM
Using a custom proxy in a Scrapy spider
Make use of Scrapy's standard HttpProxyMiddleware by specifying proxy meta value and the autherization header in a Scrapy Request, for example: imp...
Fri, 9 Aug, 2019 at 11:42 AM
Incremental crawls with Scrapy and DeltaFetch in Scrapy Cloud
NOT TO BE CONFUSED WITH THE DELTAFETCH AND DOTSCRAPY PERSISTENCE ADDONS The purpose of this is to avoid requesting pages that have already scraped items...
Thu, 14 Jun, 2018 at 9:37 AM