Start a new topic
Answered

Extracting data from multiple jobs at once

Is there a way to extract the data from multiple runs (of the same job) at once, in a single spreadsheet?




Best Answer

Hi,


Yes it is possible using the Items API https://doc.scrapinghub.com/api/items.html


Retrieve all items from a given spider

HTTP:

$ curl -u APIKEY: https://storage.scrapinghub.com/items/<projectid>/<spiderid>


Answer

Hi,


Yes it is possible using the Items API https://doc.scrapinghub.com/api/items.html


Retrieve all items from a given spider

HTTP:

$ curl -u APIKEY: https://storage.scrapinghub.com/items/<projectid>/<spiderid>

I keep getting an unauthorized response. I have the correct API key.


I am not sure what password I need to be using? The one associated with my web scraping hub account? Do I need to publish a dataset first?


Yes, the one associated with your Scrapinghub account, you can find it here: https://app.scrapinghub.com/account/apikey, also make sure to include the ':' at the end of the API key, the password is blank.

Still getting unauthorized. I did notice that when I'm logged into the website (as the organization/project owner) and I try to browse to the URL I'm still getting unauthorized, so I assume I have the wrong path.


I am using:


https://storage.scrapinghub.com/items/<project ID>/<spider id>


My project ID is a 6 digit number.


I couldn't find an explicit reference to what the spider ID is, but from the dashboard drilling into a job I've determined it is (because I only have two spiders) the number 1 or 2. I also tried the spider name which didn't work either.

How are you providing the authorization (API key)?

curl -u <API KEY>: <url>


also tried:

curl -u <API KEY>:<PASSWORD> <path>

That solved my problem. Thanks!

Login to post a comment