You can select the runtime environment for your spiders from a list of pre-defined stacks. Each stack is a runtime environment containing certain versions of packages on it.
For example, if you need your spiders to use specific versions of Scrapy and Python (let's say Scrapy 1.2 + Python 3), all you have to do is to set the proper stack in your project's scrapinghub.yml file. For example:
projects: default: 12345 stacks: default: scrapy:1.2-py3
What does the stack name means?
Stack names consists of a name, a version and, in some cases, a release date. For example:
- scrapy:1.3-py3: contains Scrapy version 1.3 running on Python 3.
- scrapy:1.1: contains Scrapy 1.1 running on Python 2.7 (when the -py3 suffix is not present, the stack runs Python 2.7).
- scrapy:1.1-py3-20170421: contains Scrapy 1.1 on Python 3. The date indicates when the stack was released.
Where can I see which stack is used in my project?
Go to your project's Code & Deploys page, select the latest build and check the value for the Stack property, as shown below:
My stack is called hworker:20160708. What does that mean?
The hworker stack is used by default for organizations that were created before 2016-06-28 12:00 UTC and is maintained for compatibility reasons only. If you are getting this stack for your new projects, please define a more modern one (scrapy:1.3 for example) in your scrapinghub.yml file, as described above.
What's the default stack used for my deploys?
That depends on when your organization has been created:
- before 2016-06-28 12:00 UTC: hworker.
- after 2016-06-28 12:00 UTC: right now it's scrapy:1.3, but this is usually updated with each Scrapy major release.
Where can I find the list of available stacks?
There are two main types of stacks:
- scrapy: features the latest stable version of Scrapy along with all the basic requirements that you need to run a full featured Scrapy spider.
- hworker: provides backward compatibility with the legacy Scrapy Cloud platform. This stack is used by default for organizations that were created before 2016-06-28 12:00 UTC, but it not recommended for new organizations.
What packages are installed in a given stack?
To see the packages from a given stack, you have to look into the requirements.txt file from the branch that corresponds to the version that you're looking for in the stack repo on GitHub.
For example, let's say that you want to know what are the packages installed on the scrapy:1.3-py3 stack:
- Go to the Scrapy stack repository: https://github.com/scrapinghub/scrapinghub-stack-scrapy
- Using GitHub UI, select the branch called branch-1.3-py3
- And check out the dependencies listed on the branch requirements.txt file (this one for the given stack)
What if I need extra packages?
If your project depends on Python packages that not shipped in any of the stacks, check out how to deploy dependencies with your projects. If your project have non-Python dependencies (binary ones, for example), check out how to deploy a custom Docker image to Scrapy Cloud.
More about the stack versioning
Stacks have regular (
scrapy:1.3-20170322 ) and major (
scrapy:1.3 ) versions. Users are encouraged to specify regular stack versions - once released they will never change, so package upgrade in the stack will never break the project. This gives our customers full control over the dependencies and the migration process - they can upgrade at their own convenience by changing the stack version whenever they are ready. Major versions are upgraded with each release - so
scrapy:1.3-py3 will always point to the latest regular release with Scrapy 1.3 running on Python 3.