Welcome to the Scrapinghub feedback & support site! We discuss all things related to Scrapy Cloud, Portia and Crawlera. You can participate by reading posts, asking questions, providing feedback, helping others, and voting the best questions & answers. Scrapinghub employees regularly pop in to answer questions, share tips, and post announcements.
maniac103 4 months ago in Datasets • updated by Pablo Hoffman (Director) 4 months ago 2

I have a couple of spiders for which I want to automatically publish their results into a public dataset in the dataset catalog, overwriting data of the previous spider run. I seem to be unable to do that because datasets seem to be tied to jobs/runs, not to the spider in general. Am I missing something there? End goal is being able to fetch the data from an app, so

I need a static URL for the last run's data. Unfortunately the method described in [1] doesn't work for me, as it requires me to put my API key (which allows read/write access to the project) into the URL, which is not an option in this (open source) app.

Thanks for your help.

[1] http://help.scrapinghub.com/scrapy-cloud/fetching-latest-spider-data


Hi Maniac,

This is a feature we discussed about, and even though we plan to incorporate it at some point we can't provide an ETA as of yet.

I will forward the bug report to the product team.

nyov 12 months ago in Datasets • updated 8 months ago 7

Downloads are broken when fetching compiled datasets.

JSON lines are concatenated by newlines nothing, instead of comma and no list is built:


should be:



$ json_xs -t none <items-360.json
garbage after JSON object, at character offset 238 (before "{"_key":"46278/1/12/...") at /usr/bin/json_xs line 177, <STDIN> line 1.

This issue has been fixed, thanks for reporting!