0
Answered
mark80 1 month ago in Portia • updated by Pablo Vaz (Support Engineer) 3 weeks ago 5

http://vetrina.assimpredilance.it/results.aspx?rs=

I need to go through 113 page scraping for every name the address and mail ..

How can I set up multipage scraping?


Answer

Answer
Answered

Hey Mark, I hope you find the solution by checking these two approaches:


1. Extract data from a List of URLs


2. Handle pagination in Portia


if not possible with those two approaches, perhaps Portia is not the right solution for your needs as mouch1 suggested.


mouch1, thanks for your suggestions.


Best regards!


Pablo

Hi,



I had a quick look on your request.

The solution I'll give you is not using Portia but it is pretty easy to put in place.


Actually, all records are accessible with an API. As you can see here


By calling the web page, you are also calling a specific URL (which is /api/api/Company) that will return as a JSON array all that you need and even more. 

Thus, you can only call that URL whenever you want and it will return a complete list of companies.


If you still want to stick with Portia, you will have to implement Splash to run some Javascript.



Cyril


Little problem! I've not Java And json knowledge !

I know it's a limit ! But can you help me in a way i can extract?

I need extract in every company name link, email and addresse.


Well, is it a one shot extract or do you want to update it on regular basis?


If it is a one shot extract, here is your data https://konklone.io/json/?id=34da400622ec579a7c5fd293c9faa52f


If you want to extract it manually whenever you want, you can use the Developper Tool from Chrome or Firefox and use the Network tab (like shown in my screenshot). You can see here data downloaded by your browser before it is processed and rendered. You then copy the complete JSON array found at the URL api/api/Company into some free translator on the net (https://konklone.io/json/) and it will provide you an extract in CSV which you can open with Excel or whatever tabulator tool.


Finally, if you want to extract it on regular basis with an automatized process, you will need to use Splash with Portia or to write down some code.

i'm not familiar with json, anyway i cant see in the json data mail record! that is most impotant data for me.

i've to manually copy json in konklone.io for every 113 page? 

Answer
Answered

Hey Mark, I hope you find the solution by checking these two approaches:


1. Extract data from a List of URLs


2. Handle pagination in Portia


if not possible with those two approaches, perhaps Portia is not the right solution for your needs as mouch1 suggested.


mouch1, thanks for your suggestions.


Best regards!


Pablo