Welcome to the Scrapinghub feedback & support site! We discuss all things related to Scrapy Cloud, Portia and Crawlera. You can participate by reading posts, asking questions, providing feedback, helping others, and voting the best questions & answers. Scrapinghub employees regularly pop in to answer questions, share tips, and post announcements.
0
vl2017 3 days ago in Portia 0

Could you add support for UTF-8. Not English letters are not shown in the sample page editor, and regexp-conditions are not working with them.

0
vl2017 3 days ago in Portia • updated 3 days ago 0

What is the difference between annotations and fields? In the "Sample page → Items" each field has configuration icons that open a tab with separate groups "Annotation" and "Field". There are separate "required" options, what do they mean and whether they overlap each other? The "Annotation" group sets the path to the element, but it is already hidden in the "Item", why "required"?


How do I configure the scrapper to ignore any pages that containing a specified attribute or word?
0
sappollo 4 days ago in Portia • updated 4 days ago 0

Hi all,


Since yesterday my Portia crawls are failing with certain error:


I don't know whether this is Scrapinghub/Portia error or related to the external page to be scraped (which worked previously successfully before since months)

0
MSH 5 days ago in Portia 0

Hi.


I created a portia spider for a website which created by asp.net and uses (javascript:__doPostBack) for the pagination links.


is it possible to use this kind of links (javascript:__doPostBack) in portia?

for example:

<a href="javascript:__doPostBack('p$lt$ctl06$pageplaceholder$p$lt$ctl00$ApplyPageJobSearch$GridView1','Page$2')">2</a>


Thanks

0
Answered
jkluv000 2 weeks ago in Portia • updated by Thriveni Patil (Support Engineer) 2 weeks ago 1

Is portia natively using crawlera or is there an integration between the 2?

Answer

Hello,


By default Portia doesnt use Crawlera. One would need to subscribe to Crawlera and then enable it for the Project through the Addon settings (https://helpdesk.scrapinghub.com/solution/articles/22000200395-scrapy-cloud-addons) and then run the Portia Spider. Then the spider will use Crawlera while crawling.


Regards,

Thriveni Patil

0
Waiting for Customer
Base79 3 weeks ago in Portia • updated by Nestor Toledo Koplin (Support Engineer) 2 weeks ago 11

Hi there,


This tool is new to me, but I keep running into a problem right from the start.

The New Sample button doesn't show anywhere after I have created a new spider.

This way I can not select any data.

0
Started
robi9011235 3 weeks ago in Portia • updated 1 week ago 7

This article give me a bit of information but I still don't get what I need to do in order for it to work and the reason it's not working.

http://help.scrapinghub.com/portia/annotations-and-data-extraction

Answer

Hey Robi, sorry to hear you have experienced problems using Portia.

When you said: "And I'm paying for this thing", that's strange, we offer Portia as a free service you shouldn't pay for it. Please let us know if some third party is charging you for use Portia.


About bugs and issues, unfortunately yes, we have been experiencing some, since our release of Portia v2 and we are trying to solve as soon as possible. Again, we offer our Portia as a free service and your contribution and constructive feedback are always welcome.


Please check: https://helpdesk.scrapinghub.com/support/solutions/articles/22000200446-troubleshooting-portia, to know more,


Best regards,


Pablo

Support team

0
Answered
robi9011235 3 weeks ago in Portia • updated 3 weeks ago 4

I'm trying to crawl this website: https://www.fxp.co.il/

But I always get the message: "Frames are not supported by portia"

But thing is, it worked a few days ago with the same project.


Also, unfortunately I'm having a really bad expirience with Portia. Always getting different errors when creating new projects, trying to load existing projects, and always trying to reconnect to Portia server. You product is really buggy and this results with bad expirience for me.

I wish there would be better alternative but all I found is just not as easy, simple and fast.

Answer

Hey Robi,


About:

"I wish there would be better alternative but all I found is just not as easy, simple and fast"


That's the cost for making more UX friendly:

https://helpdesk.scrapinghub.com/support/solutions/articles/22000200446-troubleshooting-portia


Our team is hardly working for fixing all bugs and misbehavior of Portia, unfortunately that not depends just on our Portia. If that site improves their security, Portia won't work as usual. Even any change in the site could affect Portia interaction.


If your project turns more ambitious, my suggestion is to think in a more powerful crawler like Scrapy. Check this comparison table:

https://helpdesk.scrapinghub.com/support/solutions/articles/22000201026-portia-or-scrapy

If interested in to learn Scrapy, please check this excellent videos provided by Valdir:

https://helpdesk.scrapinghub.com/support/solutions/articles/22000201028-learn-scrapy-video-tutorials-


If your project requires urgent attention, you can also consider to hire our experts. It can save you a lot of time and resources: https://scrapinghub.com/quote


Regardless above suggestions, thanks for your feedback, I will share with our Portia team as well.


Best regards,


Pablo Vaz

Support team

0
Answered
Tristan 4 weeks ago in Portia • updated by Pablo Vaz (Support Engineer) 3 weeks ago 3

Hi

Which types of Regex does Portia/Scrapy support for include/exclude urls, when added in Portia interface?


Does it support \d \s [0-9] [^0-9] sorts of regex?


Is there maybe a library reference for this? I see a page on the query cleaner on your site but not in general.


Also I want to figure out how to make the query case insensitive? Is there a setting or just do

/[Ff]older/[Pp]age.html

for:

/Folder/Page.html

/folder/page.html
/folder/Page.html

Thanks


tristan

Answer

Hi Tristan!

For example, if you want to configure a URL pattern for:

https://www.kickstarter.com/projects/1865494715/apollo-7-worlds-most-compact-true-wireless-earphon/comments?cursor=14162300


you should use:


(https:\/\/www\.kickstarter\.com\/projects\/1865494715\/apollo-7-worlds-most-compact-true-wireless-earphon\/comments\?cursor=14162300)+


Best,

Pablo

0
Not a bug
San 4 weeks ago in Portia • updated 4 weeks ago 1

am trying to scrape a super simple table on a webpage. But whatever I try, I keep getting errors:

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://portia.scrapinghub.com/api/projects/174585/download/.....

Really weird because this spider is as simple as can be.


Answer

Hi San.


Even, this seems a simple task, Portia is not suggested to parse tables.

Perhaps you can try using Scrapy.

Check this tutorials:

https://helpdesk.scrapinghub.com/support/solutions/articles/22000201028-learn-scrapy-video-tutorials-

Best regards,

Pablo


0
Fixed
shweta.kumar 1 month ago in Portia • updated by Pablo Vaz (Support Engineer) 4 weeks ago 2

When I select " </> download as scrapy" option a new tab opens with the message "A server error occurred. Please contact the administrator.", although same is not the case with "Download as Portia" option.

Answer

Dear Shweta,


I hope you are satisfied with our response.

Don't hesitate to ask again if you need further assistance.


Best regards,

Pablo

0
Not a bug
gianghi1985 1 month ago in Portia • updated by Pablo Vaz (Support Engineer) 1 month ago 1

i use trial portia, i can't get items and very slow when i enable javascript

Answer

Hi Gianghi!

Please check our Portia articles in our help center for more tips on how to use portia:

http://help.scrapinghub.com/portia


About your question, some sites doesn't interact correctly with Portia. If you want to pursue more complex extractions, please consider using other tools like Scrapy. If interested, our experts can help you.


Take a minute to explore this option:


Kind regards,

Pablo

0
Answered
shweta.kumar 1 month ago in Portia • updated by Pablo Vaz (Support Engineer) 1 month ago 1

My ultimate goal is to scrape some information like tittle, etc from sample page and save it along with its url. I also want to do this using Portia Visual interface online, without having to install portia.

Answer

Dear Shweta,


You can do this with any problems. Have you checked this section in our help center?

http://help.scrapinghub.com/portia


Kind regards,

Pablo

0
Answered
mescalante1988 1 month ago in Portia • updated by Thriveni Patil (Support Engineer) 1 month ago 1

Hello, I am doing a Project and I think Portia is great!

I have a doubt because I am extracting data from a webpage, but I want to include the category on all items I am extracting.. but I only have from each item the image, price and description.

What I want to do is force to add manually a category..

For example now I am receiving:


[ { "image": ["urlImage" ], "description": [ "TV LED " ], "price": [ "565" ] },[ { "image": [urlImage1], "description": [ "TV1" ], "price": [ "867" ] },


I want to add manually a category called TV and obtain the next result:


[ { "image": ["urlImage" ], "description": [ "TV LED " ], "price": [ "565" ], "category": ["TV"] },[ { "image": [urlImage1], "description": [ "TV1" ], "price": [ "867" ], "category": ["TV"] },

Could anyone help me with this?

I only know how to work with Portia on webpage on graphic mode.

Thanks!

Answer

Good to know that you are liking Portia :)


To add a field for every Item you can make use of Magic Fields addon, Please refer http://help.scrapinghub.com/scrapy-cloud/addons/magic-fields-addon to know more about the Magic Fields.


Regards,

Thriveni

0
Fixed
Uptown Found 1 month ago in Portia • updated by Pablo Vaz (Support Engineer) 1 month ago 1

When I try to access my Portia project using Chrome, I get a blank page. Opening the Chrome Inspector shows there are several CSS and JS files that cannot be loaded (404 errors):


Answer

Hi Uptown found,


We have been doing some maintenance work, it should be working now.

Please be sure to clean cache to avoid related issues.


Best regards,

Pablo