Start a new topic
Answered

We need a static URL for scraping results.

We need a static URL for scraping results. Right now the URL changes after every run. What is the solution for this?


<project_id>/<spider_id>/<job_id> 

How to replace <job_id> with last one as default ?


Best Answer

You can use the Scrapinghub Jobs API and python-scrapinghub library. This library interacts with scrapy cloud, hence you can use in the spider to get the Job list and use the latest one. 


Thanks,

Thriveni.


You can also fetch data from latest completed job in csv format using the url 

https://app.scrapinghub.com/api/items.csv?project=PROJECTNUMBER&spider=SPIDERNAME&include_headers=1&fields=FIELDNAME1,FIELDNAME2&apikey=APIKEY '


You need to replace:


  • PROJECTNUMBER  with your project number
  • SPIDERNAME with your spider name
  • FIELDNAME1 , FIELDNAME2  with the name of the fields, in the order you want them to appear in the CSV columns
  • APIKEY  with your Apikey


Answer

You can use the Scrapinghub Jobs API and python-scrapinghub library. This library interacts with scrapy cloud, hence you can use in the spider to get the Job list and use the latest one. 


Thanks,

Thriveni.

Login to post a comment