0
Answered
Tristan Bailey 3 weeks ago in Portia • updated by Pablo Vaz (Support Engineer) 1 week ago 4

Is there an api command to submit or to add to the project on the website, to limit the pages crawled?

I would like to stop after 1000 for my testing phase.

Answer

+1
Answer
Answered

Hi Tristan, you can also try with DEPTH_LIMIT and set values to 3 to 5 for example using CLOSESPIDER_PAGECOUNT to 100.
I've obtained different number of requests and items changing Depth Limit for the same Closespider pagecount.
Regards.

GOOD, I'M SATISFIED

thank you for this answer too

Satisfaction mark by Tristan Bailey 2 weeks ago

I found closespider_pagecount as a setting in the scrapy docs, but dont see much for what can be pasted in the api.


I am NOT seeing this is working yet, so thought would ask for others thoughts too.

Hi

Seems that closespider_pagecount set on the Spiders Setting page to 100 does work, but due to the way spiders work, they finish the queued items after they hit the mark.


So new pages are not spidered but found ones are, so I get about 112 pages for the 100 page setting. This makes sense now, after reading, another post.


Thanks]


tristan


+1
Answer
Answered

Hi Tristan, you can also try with DEPTH_LIMIT and set values to 3 to 5 for example using CLOSESPIDER_PAGECOUNT to 100.
I've obtained different number of requests and items changing Depth Limit for the same Closespider pagecount.
Regards.