Start a new topic
Answered

about Portia

hello 
in Portia ... I have set a spider and have given it samples and it fetched data I needed .... 
now 
1. when I can tell the spider how often recrawl that link and update those data ?
2.  and how to say it to ignore repetitive data ?
best regards 
Amin 




Sent with Mailtrack


Best Answer

Hi Amin,


For (1) what about to use periodic jobs, this could do the trick.


About (2) you can set on RAW settings (in spider settings) the option SLYDUPEFILTER_ENABLED to 0 or other value.


I hope this helps,


Pablo


Answer

Hi Amin,


For (1) what about to use periodic jobs, this could do the trick.


About (2) you can set on RAW settings (in spider settings) the option SLYDUPEFILTER_ENABLED to 0 or other value.


I hope this helps,


Pablo

about (1) : 

     thanks alot ... yes it works ... but is it make another new database each time after crawling or complete      the previous database ?

aabout (2) :

     in spider setting we have 2 tabs : 1.settings 2.RAW settings 

     1. you mean I should type this in RAW settings ?   =>   SLYDUPEFILTER_ENABLED=1

     2. in tab 1 settings .... we can select some predefined parameters .... where I can read the help of these parameters to set them ?

     3. SLYDUPEFILTER_ENABLED lead to comparison of data in current crawl or it also consider previous scraped data from the target site in its comparisons?

sorry for asking alot 

Login to post a comment