You’ve gone through the hassle of building your spider (be it with Scrapy or Portia), carefully tuning it to work flawlessly. You’ve spent a lot of time debugging it and are finally happy with how it works. Now what?
What better reward than being able to share the proud outcome of all that sweat and tears - the data! Scrapinghub allows you to publish your spider data as a dataset and share it with others.
There are three types of datasets: Public, Restricted and Private.
- Public datasets can be accessed by anyone (even without a Scrapinghub account) and are indexed by search engines
- Restricted datasets are accessible only by users you explicitly grant access to (they need a Scrapinghub account)
- Private datasets can be accessed only by the members of your organization
To publish a dataset go to the Scrapinghub job page, select the Items tab and click Publish. Select “Publish Dataset”:
Then, name your dataset and click “Publish”:
You will then be taken to the dataset’s main page:
Here you can edit the logo, name and description by moving the mouse over the respective fields. You can only edit datasets owned by organizations you belong to.
Click on ”Configure dataset” to open the dataset configuration window:
In this window you can see which job this dataset is associated with (Source section) and configure both the dataset visibility (Public, Restricted, Private) and data access - in case you want to restrict access to a limited subset of the data.
In the top navigation bar, you will see a new menu called “Datasets” with access to the dataset catalog browser and convenient links to the recent datasets you’ve visited: