Scrapy Cloud jobs run in containers. These containers can be of different sizes defined by Scrapy Cloud units.
A Scrapy Cloud provides:
- 1 GB of RAM
- 2.5GB of disk space
- 1x CPU
- 1 concurrent crawl slot
Resources available to the job are proportional to the number of units allocated. For example, if a job is started with 2 units it will have 2GB of RAM, 5GB of disk space and 2x CPU.
Scrapy Cloud pricing is based on the number of Container Units that you purchase. You can purchase as many units as you need and allocate them to your spiders.
One unit is given for free upon sign up, and you can use it for as long as you want. Free subscriptions have a data retention period of 7 days and each job has a runtime maximum.
Once you purchase your first unit, your subscription is upgraded. You still have a single unit but you have these extra benefits:
- 120 days of data retention.
- No runtime limit for your jobs
- Private support
- Ability to deploy custom Docker Images to Scrapy Cloud.
Check the Scrapy Cloud Pricing page for more details.
You can choose how many units to allocate to your jobs. For example, if you have a large spider that requires 4 GB of memory to run, you can assign it 4 units, while leaving the smaller jobs using 1 unit each.
Here's an hypothetical scenario to illustrate: suppose you have a Scrapy Cloud subscription of 8 units, and a project with large spiders (that require 4 GB to run) and small spiders (that run with 1 GB). With the 8 units you have, you could have the following spiders running concurrently:
- 1 large spider of 4 GB, and 4 small spiders of 1 GB each
- 2 large spiders of 4 GB
- 8 small spiders of 1 GB each
Jobs in Scrapy Cloud are limited to 6 units, if you need to run larger jobs please contact support.