Start a new topic
Answered

Re-run Spider, ignoring previous items

Hi Community. My spider has 14,000 urls to crawl, now there's 500 new urls in the list. How do I re-run my spider, while ignoring previous scraped urls?


Best Answer

You can use the Deltafetch for crawling the new URLs as given in Incremental crawls with Scrapy and DeltaFetch in Scrapy Cloud.


Regards,

Thriveni




Answer

You can use the Deltafetch for crawling the new URLs as given in Incremental crawls with Scrapy and DeltaFetch in Scrapy Cloud.


Regards,

Thriveni



For Portia you would need to setup the addon of Deltafetch as given in Deltafetch Addon

Excellent, think it works. Thanks!

Login to post a comment