I'd like to run my Portia crawler for a specific website once every hour but I'd need the crawler to get only the fresh articles in other words to avoid the already crawled duplicates. Now I understand the DeltaFetch addon will ignore duplicates within the same crawling job but not within the subsequent jobs?
If that is the case I was thinking about somehow adding the crawling date (which is shown on the UI next to each item number in the job items list) to the crawl data so when I download the scraped data I can identify duplicates.
Anyone got ideas/experience on this front?
Thanks @nestor, have also managed to inject the crawldate into the crawled items with MagciFields in the meantime.