Start a new topic
Answered

Periodic Jobs Dataset Update

Hi People,


my Portia Spider jobs run fine,  periodically every hour. What i do not understand yet is how to update the dataset with the periodic jobs data... the dataset export seems to be tied to a specific job ID, what i want is to update a dataset with the latest completed jobs data (automatically) ... is that possible?


cheers,

t.


Best Answer

Hello tilllt,


The official way is to use the UI for publishing the dataset. 


There is another option of getting the job ids using https://doc.scrapinghub.com/api/jobs.html#jobs-list-json-jl and then send a patch request to https://app.scrapinghub.com/api/v2/datasets/<datasetid> with a json body of {"job":"xxx/x/xx"}. With python-requests you use requests.patch() or with curl its "curl -X PATCH"

Or you can send the requests within the job when its finishing: within a job the job id is available within the SHUB_JOBKEY environment variable


But please be informed that the API is not officially supported(api/v2/datasets/) and can change without any notice.


Hi Tillt,


please check our documentation, this could help.


Let us know if you have further questions,


Best,


Pablo

Hey Pablo, 


i have read that documentation page you posted and unless i missed something essential, it does not answer my question. Periodic Jobs get ID numbers every time they are executed. The example in the documents describes the process of publishing one specific IDs Data. I managed to do that, but my question is how to automatically update the published data with the following executions of the jobs?


Thanks.

Answer

Hello tilllt,


The official way is to use the UI for publishing the dataset. 


There is another option of getting the job ids using https://doc.scrapinghub.com/api/jobs.html#jobs-list-json-jl and then send a patch request to https://app.scrapinghub.com/api/v2/datasets/<datasetid> with a json body of {"job":"xxx/x/xx"}. With python-requests you use requests.patch() or with curl its "curl -X PATCH"

Or you can send the requests within the job when its finishing: within a job the job id is available within the SHUB_JOBKEY environment variable


But please be informed that the API is not officially supported(api/v2/datasets/) and can change without any notice.

Login to post a comment