0
Answered
mark80 2 months ago in Portia • updated by Pablo Vaz (Support Engineer) 2 months ago 5

it's my first project and it seams to me great app!! 

here http://www.sia.ch/it/affiliazione/elenco-dei-membri/socii-individuali/ we have 255 page (little number on top of list) and i need not only extract these 4 visible column but either mail and telephone inside every name of the list..

i've yet extracted 255 page with main 4 column sample of the link, but i don't know how go one level deeper in every name

can i do all job with a single crawler project?

Answer

Answered

Hey!


I tried to accomplish with Portia but seems not possible.

Try to set different two different spiders one for each "level". Then try also enabling and disabling javascript in Portia.

Please keep in mind that some sites are too complex to handle with Portia. If that is the case, consider try with Scrapy.


Best,

Pablo

I've solved with 2 step as you said, but Most important problem I've found now is that mail I need scraping are antispam js crypted.. So even if I scraping all 17000 name, I'll not retrieve mail! 

How to solve?

Can you do a simple test? 

I've played with your project and I've learned that the key was the expression in js and correct  selection in sample . trying it  in my project I've got strange results.. Can you see project and last csv? Some mail yes and other not.. 

Please review entire project (its first for me) and give me advice 

Hi Mark, the project seems right. To solve the Out of memory issue, you probably need more units to store more data.

Best,

Pablo