Start a new topic
Answered

Fewitems_scraped error (Yelp test project on Portia)


My first attempt with Portia failed with the error “slybot_fewitems_scraped: The job was cancelled because it wasn’t scraping enough new data.”

 

I was attempting to pull Yelp reviews for my neighborhood restaurant, with columns for reviewer, date, text of review, and the number of reviews from that reviewer. The previewed extraction looked right, so I clicked <Run>. I was expecting data on 20 reviews, which was page 1. Later on I wanted to <page advance> to collect all reviews.

 

Instead, what I got (after an hour) was the above error and 11 data rows with inconsistent and missing data – i.e. not at all what was previewed when I designed the project.

 

Solutions and feedback welcome.

 

Thanks!


Best Answer

Your spider is following links that are not extracting anything because it doesn't match the sample you created. Ideally, you should refine the follow/exclude patterns with regular expressions so that the spider doesn't follow such links. More information on the error and a proposed alternative is explained here: https://support.scrapinghub.com/support/solutions/articles/22000200391-my-portia-spider-was-finished-due-to-slybot-fewitems-scraped-

1 Comment

Answer

Your spider is following links that are not extracting anything because it doesn't match the sample you created. Ideally, you should refine the follow/exclude patterns with regular expressions so that the spider doesn't follow such links. More information on the error and a proposed alternative is explained here: https://support.scrapinghub.com/support/solutions/articles/22000200391-my-portia-spider-was-finished-due-to-slybot-fewitems-scraped-

Login to post a comment