Is there any way to create or modify periodic jobs programmatically? I'd like to create a process where I autogenerate spiders and commit them to github, where they are pulled down automatically by ScrapingHub.
Then I'd like to script modifying a periodic job to add new spiders to the job to be run on a periodic basis.
This is a very popular request and we are already working on an API for Periodic Jobs, it should be ready sometime this year. For now, the only option is to do it via the UI.
We want to scrape IG post page with registering a spider for the specific post link every X timespan. We also need to remove the job in some cases. The reason is we can have hundreds of posts to scrape and need to happen dynamically.
Workflow:
1. Register spider in scrapy cloud and start scraping every X timespan
2. If applicable delete the spider job
3. Retrieve the data extracted or notified that the job is finished with the data
Any documentation on the above subjects that can help accomplish the scenario would be helpful
A
Aaron Cowper
said
over 1 year ago
Hi, any update on when this feature will be available?
1 person likes this
nestor
said
almost 2 years ago
Answer
This is a very popular request and we are already working on an API for Periodic Jobs, it should be ready sometime this year. For now, the only option is to do it via the UI.
Simon Mosk-Aoyama
Hello,
Is there any way to create or modify periodic jobs programmatically? I'd like to create a process where I autogenerate spiders and commit them to github, where they are pulled down automatically by ScrapingHub.
Then I'd like to script modifying a periodic job to add new spiders to the job to be run on a periodic basis.
Is this possible? The Jobs API only seems to be for one-off jobs (https://doc.scrapinghub.com/api/jobs.html).
thanks!
This is a very popular request and we are already working on an API for Periodic Jobs, it should be ready sometime this year. For now, the only option is to do it via the UI.
- Oldest First
- Popular
- Newest First
Sorted by Newest Firsthareesh
any update on this requirement?
george8
We are also looking at something similar.
We want to scrape IG post page with registering a spider for the specific post link every X timespan. We also need to remove the job in some cases. The reason is we can have hundreds of posts to scrape and need to happen dynamically.
Workflow:
1. Register spider in scrapy cloud and start scraping every X timespan
2. If applicable delete the spider job
3. Retrieve the data extracted or notified that the job is finished with the data
Any documentation on the above subjects that can help accomplish the scenario would be helpful
Aaron Cowper
Hi, any update on when this feature will be available?
1 person likes this
nestor
This is a very popular request and we are already working on an API for Periodic Jobs, it should be ready sometime this year. For now, the only option is to do it via the UI.
-
Unable to select Scrapy project in GitHub
-
ScrapyCloud can't call spider?
-
Unhandled error in Deferred
-
Item API - Filtering
-
newbie to web scraping but need data from zillow
-
ValueError: Invalid control character
-
Cancelling account
-
Best Practices
-
Beautifulsoup with ScrapingHub
-
Delete a project in ScrapingHub
See all 349 topics