Welcome to the Scrapinghub community forum! We discuss all things related to Scrapy Cloud, Portia and Crawlera. You can participate by reading posts, asking questions, providing feedback, helping others, and voting the best questions & answers. Scrapinghub employees regularly pop in to answer questions, share tips, and post announcements.

Remember to check the Help Center!

Please remember to check the Scrapinghub Help Center before asking here, your question may be already answered there.

+2
Completed
Ron Johnson 3 years ago in Portia • updated by Pablo Hoffman (Director) 10 months ago 1
Once a spider has been named. There is no option to go in and rename it. This could be useful during the development of spiders as their functions and roles change through the development process.
Answer
Pablo Hoffman (Director) 10 months ago

Portia supports renaming spiders.


(The original question was for Autoscraping, Portia predecessor product)

0
Fixed
Ron Johnson 3 years ago in Portia • updated by Andrés Pérez-Albela H. 3 years ago 5
The spider at this page failed to delete when deleting the autospider. Now I cannot go to the autospider page to try and redelete this spider.
Answer
Hi Ron,

I have deleted the failed entry. Sorry for delay, I was trying to get a time for investigate how that could have happened, and even I was not able to reproduce for other cases.

Let us know if happen again
0
Answered
Ron Johnson 3 years ago in Portia • updated by Oleg Tarasenko (Support Engineer) 3 years ago 5
On this web page I want to extract the list of names under owner name. When I establish a template to do so for this page, it successfully extracts the owner names. However, as the spider crawls on to the next page as seen here the template starts to fall apart.

The end goal is to generate a csv of all the owner names to import into excel for post processing. I can get the spider to crawl every page but I can't seem to get the items to extract properly. Am I just missing something?

0
Completed
Ayush Lodhi 3 years ago in Portia • updated by Oleg Tarasenko (Support Engineer) 3 years ago 1
how can i download the data which is scraped, i have seen every where but i cant find a way to download the data
0
Answered
Robert Clements 3 years ago in Portia • updated by Samir 6 months ago 8
I'm trying to build a database of houses for sale in London from Zoopla.co.uk. I've managed to scrape description, price etc. but i'm trying to scrape images from an embedded 'carousel' but i'm not sure if i can. 

Any help appreciated, cheers Rob
0
Answered
Sammy Kiogora 3 years ago in Scrapy Cloud • updated by Pablo Hoffman (Director) 3 years ago 1
Answer
There was a problem with project creation that is now fixed.
+1
Completed
Rolando Espinoza (Engineer) 3 years ago in Scrapy Cloud • updated by Pablo Hoffman (Director) 10 months ago 1
In my case, it is usual that I have to look for a particular field count or check if it's missing. The current unsorted display makes hard to:

1. Find a particular field.
2. Tell whether a field is missing.

Display the scraped fields sorted would make these task easy.
Answer
Pablo Hoffman (Director) 10 months ago

Fields are sorted alphabetically now.

+8
Completed
Rolando Espinoza (Engineer) 3 years ago in Scrapy Cloud • updated by Paul Tremberth (Engineer) 3 years ago 7
I have some spiders that needs to be schedule with a couple of arguments. This is no a hassle when scheduling the job via the API, but when doing manually (i.e. using a test input and updating the spider code each run) it would be nice to be able to re-schedule a job without having to re-enter all the custom arguments.
Answer
It's available now in the "Completed Jobs" tab., at the bottom of the page, next to the "Remove" button.

Select a job's checkbox, and click "Restart"
+2
Completed
Rolando Espinoza (Engineer) 3 years ago in Scrapy Cloud • updated by Pablo Hoffman (Director) 10 months ago 1
For large projects with a lot of spiders, would be handy to be able to type the name of the spider in the search box to go directly to its spider page.
Answer
Pablo Hoffman (Director) 10 months ago

This is supported already.

0
Completed
Oleg Tarasenko (Support Engineer) 3 years ago in Scrapy Cloud • updated by Pablo Hoffman (Director) 10 months ago 1
I want to suggest a tweak to items filter. Especially to the part about pages section. For example: http://dash.scrapinghub.com/p/78/job/14/2/#pages

Here you can see that fields list box contains only one item! 

So why not to pre-select it for me? It will make dash usage a great pleasure for me!
Answer
Pablo Hoffman (Director) 10 months ago

There are more fields now, so pre-selecting doesn't make sense anymore.

+2
Completed
fernando Almeida 3 years ago in Scrapy Cloud • updated by Pablo Hoffman (Director) 10 months ago 1
Currently you can only schedule jobs to run either every day of the week or once per week, it would be nice if more options could be added like on each day of the the month (1-30) or every 15 days .
Answer
Pablo Hoffman (Director) 10 months ago

This is supported now.

0
Answered
Hirantha 3 years ago in Scrapy Cloud • updated by Pablo Hoffman (Director) 10 months ago 1
Hi,

Is it possible to setup an email notification when there is new data while scraping the page. Example - when there is new news item listed on a particular news page, it should notify by an email only for the new news update.
Answer
Pablo Hoffman (Director) 10 months ago

You need to implement this functionality yourself in your Scrapy spider, for example using the MailSender facility:

http://doc.scrapy.org/en/latest/topics/email.html

+1
Answered
Matt Lebrun 3 years ago in Portia • updated by Shane Evans (Director) 3 years ago 4
To note, all fields are marked as vary so I don't get why this is even happening.

Here's a sample item that AS says to be duplicate, which upon checking clearly isn't:

LineResult
32Scraped from <200 http://www.courts.com.sg/Products/PID-IP058275(Courts)/Computers/IT-Accessories/Hard-Disks/WESTERN-DIGITAL-MY-PASSPORT-ESSENTIAL-2TB-BLK-WDBY8L0020BBKPESN> Less

{'_cached_page_id': '60b7e00f7dbe65861cb6505a0f29296817e215f5',
'_template': '52b1899e4d6c710f54a65589',
'_type': u'product_page',
u'brand': [u'WESTERN DIGITAL'],
u'category': [u'MY PASSPORT ESSENTIAL 2TB BLK'],
u'image': ['http://d2j8wlv4w10az1.cloudfront.net/assets/images/products/ip058275.jpg'],
u'price': [u'209'],
u'title': [u'WDBY8L0020BBK-PESN'],
'url': 'http://www.courts.com.sg/Products/PID-IP058275(Courts)/Computers/IT-Accessories/Hard-Disks/WESTERN-DIGITAL-MY-PASSPORT-ESSENTIAL-2TB-BLK-WDBY8L0020BBKPESN'}
50Dropped: Duplicate product scraped at <http://www.courts.com.sg/Products/PID-IP039998(Courts)/Computers/IT-Accessories/Hard-Disks/WESTERN-DIGITAL-MY-BOOK-ESSENTIAL-3TB-35INUSB30-WDBACW0030HBKSESN>, first one was scraped at <http://www.courts.com.sg/Products/PID-IP058275(Courts)/Computers/IT-Accessories/Hard-Disks/WESTERN-DIGITAL-MY-PASSPORT-ESSENTIAL-2TB-BLK-WDBY8L0020BBKPESN> Less

{'_cached_page_id': '2d21d63869a48f310fec54c48bdc15be1ed942e0',
'_template': '52b1899e4d6c710f54a65589',
'_type': u'product_page',
u'brand': [u'WESTERN DIGITAL'],
u'category': [u'MY BOOK ESSENTIAL 3TB 3.5INUSB3.0'],
u'image': ['http://d2j8wlv4w10az1.cloudfront.net/assets/images/products/ip039998.jpg'],
u'price': [u'229'],
u'title': [u'WDBACW0030HBK-SESN'],
'url': 'http://www.courts.com.sg/Products/PID-IP039998(Courts)/Computers/IT-Accessories/Hard-Disks/WESTERN-DIGITAL-MY-BOOK-ESSENTIAL-3TB-35INUSB30-WDBACW0030HBKSESN'}
+3
Completed
Shane Evans (Director) 3 years ago in Portia • updated by Pablo Vaz (Support Engineer) 5 months ago 3
It should be possible to specify:
* a larger set of start urls, with some clear maximum number supported
* a url containing other start urls
* a simple pattern to move through integer numbers, e.g. page[1..200].html
Answer

Hi Shane!


We are happy to announce in our community that new release of Portia will allow you to set a bulk of start urls using a list (from Dropbox for example).

We hope to get this new feature among others ready, very soon!

Best Regards!

0
Answered
David 3 years ago in Portia • updated by Paul Tremberth (Engineer) 2 years ago 5
When viewing scraped Items/pages, I click on Items drop-down to Ge as CSV, though it opens a blank page. Please advise on proper way to view what has been scraped. Thanks!
Answer
Need more information in order to help