Start a new topic
Answered

Limiting the elements picked up

Hello, 

I've recently started using Portia, and I'm trying to create spiders to scrape some data for some research project.


I am trying to scrape this sports scorecard from this webpage: http://www.howstat.com/cricket/Statistics/Matches/MatchScorecard.asp?MatchCode=2291


While I've managed to scrape most of the page successfully, I'm having difficulties with two types of elements:

1. The mentions after "Fall of Wickets". Ideally, it should be an array that should have "1-10 (Soumya Sarkar)" as the first element, "2-10 (Imrul Kayes)" as the second element, etc.The problem I'm facing is that it often picks a lot of data after the 10th element.

2. The title above each part of the scorecard. For example "Bangladesh 1st Innings" and "Australia 1st Innings". Again, facing the same problem - it is picking up much more data than required. Also, I would ideally like to just save "Bangladesh" instead of "Bangladesh 1st Innings"


I've been through all the documentation, but unable to figure it out. Thanks in advance!


Best Answer

Please check our article: https://helpdesk.scrapinghub.com/support/solutions/articles/22000200446-troubleshooting-portia


to know how to proceed next time you find a bug, and provide all details possible if you wish.


Thanks for your help to make Portia a better tool.


Best,


Pablo

1 Comment

Answer

Please check our article: https://helpdesk.scrapinghub.com/support/solutions/articles/22000200446-troubleshooting-portia


to know how to proceed next time you find a bug, and provide all details possible if you wish.


Thanks for your help to make Portia a better tool.


Best,


Pablo

Login to post a comment