It's possible to build a custom Docker image with a given Portia project bundle.
It can provide some benefits in certain cases:
- SH-hosted Portia doesn't work for you for some reasons
- you need a custom Portia version
To make it work, your work directory should contain at least these 4 files:
1) Portia project zipped bundle (named project-slybot.zip in this guide)
For example, it can be downloaded from hosted Portia instance with:
curl --user $SHUB_APIKEY: \ 'https://portia-beta.scrapinghub.com/api/projects/$PROJECT_ID/download' > project-slybot.zip
2) scrapinghub.yml with images section (more details here)
projects: default: PUT_YOUR_PROJECT_ID_HERE requirements: file: requirements.txt images: default: SomeDockerhubAccount/SomeDockerhubRepository
3) Dockerfile to build and deploy an image to Scrapy Cloud
FROM scrapinghub/scrapinghub-stack-portia:0.13 ENV SHUB_ENTRYPOINT='' COPY project-slybot.zip /scrapy/ COPY list-spiders /usr/local/sbin/list-spiders RUN chmod +x /usr/local/sbin/list-spiders
You can use any other Portia stack version as a base from the listed here.
Additional note: a line with SHUB_ENTRYPOINT is important, as we need to reset existing entrypoint set by original Portia stack. Otherwise stack's entrypoint will try to download a Portia bundle on job start.
4) list-spiders script
#!/bin/sh /usr/local/bin/portiacrawl /scrapy/project-slybot.zip
Now you have everything ready to build and deploy your project with:
shub-image upload --username SomeName --password SomePassword