Start a new topic
Answered

Portia only crawling 4 items ("Extracted items") despite having 39 ("Items") show up on the left

As title shows and as screenshot shows, why cant I scrape all items?

Also worth noting that an older spider I have crawls the 39 successfully, and when I compare on Portia I see no difference.


Thanks guys!


image




Best Answer

As Tejashri had mentioned there is difference in Featured and non-featured ads that seemed the reason for only 4 Items are extracted. I have created another spider "olx.com.eg_1" which extracts 43 Items. I had to use the tool twice (once for featured ads and next for the simple ads). 




Hi, As I can see this page has feature ads and ads which is why when you extract it shows only four items. 


Are you looking for Feature Ads or just ads? Thanks!

 Hi tejashri,


Thanks for looking into this, I am actually only selecting the 39 normal ads, intentionally NOT slecting the featured ads, I know it seems like a coincidence, but the point is, that the left pane shows 39 items while an actual run returns only 4

 Quick update, I am able to confirm that the 4 ones actually returned ARE the ones from the featured ads, I tried even selecting with CSS path, but not making any difference, I still get only 4 items but this time each item has a list of each of the 39 original items with a new line in between.

This site is hectic :/

Answer

As Tejashri had mentioned there is difference in Featured and non-featured ads that seemed the reason for only 4 Items are extracted. I have created another spider "olx.com.eg_1" which extracts 43 Items. I had to use the tool twice (once for featured ads and next for the simple ads). 



 Hi thriveni,

Thanks for your reply. You are both right, there are differences between both, the key difference being;

"Featured ads" being under <table class="fixed offers breakword" summary="" width="100%" cellspacing="0" cellpadding="0">

"Ads" under <table id="offers_table" class="fixed offers breakword" summary="" width="100%" cellspacing="0" cellpadding="0">


I tried using this as a CSS path to force just the "offers_table" in the annotation selection with no luck, the right pane still shows 3 items.

I was able to reach your step and collect all 43 items (39 normal ads + 4 featured), but I only need the 39.


Ive attached a video if it helps at all. Could I ask you to take another look to just select the 39?


Thanks,

Karim

avi

Ill consider this request closed, I will manage selecting all 43. This page luckily has a "Featured" tag and I can filter them out later.


Thanks for your support tejashri and thriveni

Login to post a comment