Welcome to the Scrapinghub community forum! We discuss all things related to Scrapy Cloud, Portia and Crawlera. You can participate by reading posts, asking questions, providing feedback, helping others, and voting the best questions & answers. Scrapinghub employees regularly pop in to answer questions, share tips, and post announcements.

Remember to check the Help Center!

Please remember to check the Scrapinghub Help Center before asking here, your question may be already answered there.

0
Completed
Alexander Dorsk 4 years ago in Scrapy Cloud • updated by Oleg Tarasenko (Support Engineer) 3 years ago 0

Hi, just started exploring the Autoscraper, and I'm very impressed by what it has to offer. Natalia's screencast has been very helpful for seeing how it works.


As I use it I'm keeping a list of UI suggestions. Do you want me to post these right now? 


If you're still hashing out the UI I don't want to bombard you over things that will be changed anyway.


If you want me to post the suggestions just let me know.

Answer
Hi, Alexander. Thanks.

We are aware of many things to improve in the UI. And we are already implementing many changes which will be released in some weeks, and even more improvements specifically in the annotation tool that will be available later. Feel free to do any suggestion but consider that many of them could have been already issued or planned to do, and others even could become not applicable.

0
Fixed
mustaffa 4 years ago in Scrapy Cloud • updated by Pablo Hoffman (Director) 4 years ago 11

Are services down? it wont start any schedule and simply get stuck on pending.

Answer

This was caused by over-demand last night, which affected customers with no dedicated servers. Load is back to normal now.

0
Answered
Rodolpho Ramirez 4 years ago in Portia • updated by Martin Olveyra (Engineer) 4 years ago 9

I've scraped some items from a website, but when downloading data they don't show up, only URL, body and cookies in JSON. Shoudn't be an ITEM column?

Answer

Check the documentation:


http://help.scrapinghub.com/autoscraping.html

first section (basic concepts and procedures)


As a fast intro, AS basically runs in two different modes: annotating mode and normal mode. The "annotating" mode is only for the purpose of capture pages, add templates and test them. The normal mode is what you need to actually get the items, once you tested everything properly in annotating mode.

In order to switch from annotating mode to normal mode, you have to remove the "annotating" tag from the spiders properties and run again. But important! If you did not get good results in annotating mode, you will not get good results in normal mode. So ensure you have thoroughly tested in annotating mode.

0
Answered
drsumm 4 years ago in Scrapy Cloud • updated by Martin Olveyra (Engineer) 4 years ago 3

It takes about   1 sec to scrape one item group. Why is it so slow on this platform ?, My spider is running since about 20 hours and still very slow. I have used scrapy and it was pretty fast.

Answer

It is because of the Autothrottle addon. Check this documentation for scrapinghub users, which explains why we limit spider speed with Autothrottle and how to change the behaviour:


http://help.scrapinghub.com/addons.html#autothrottle

0
Answered
drsumm 4 years ago in Scrapy Cloud • updated by Pablo Hoffman (Director) 4 years ago 1

I got the message closespider_pagecount what does that mean?

Answer

It means the spider has reached the maximum number of pages allowed to crawl, and was terminated because of that.


Autoscraping runs in annotating mode always have that limit in place.


The actual name "closespider_pagecount" comes from the Scrapy extension that powers the shutdown: https://scrapy.readthedocs.org/en/latest/topics/extensions.html#closespider-pagecount

+1
Answered
drsumm 4 years ago in Portia • updated by Martin Olveyra (Engineer) 4 years ago 10

I had followed instructions to annotate in the template, but the spider is not extracting any fields. Onlu body,url items are extracted.

Answer

If you cannot see extracted data in an annotating mode run it usually means that the templates are not extracting (even they have not annotated) all the required fields. Check how you defined the fields of the item, in particular their Required flag, and also check that the template is annotating all the required ones, or remove the Required flag from those fields that you really don't expect to annotate or extract with every template.

For more detailed info please check the autoscraping documentation, in particular the section that explains how templates are used in the extraction process.

+1
Completed
Nicolas Ramírez 4 years ago in Scrapy Cloud • updated by Pablo Hoffman (Director) 3 years ago 1

It would be nice to be able to download a single item (instead of all) from the panel in JSON format, for testing.

0
Answered
Nicolas Ramírez 4 years ago in Crawlera • updated by Pablo Hoffman (Director) 10 months ago 0
Answer
Pablo Hoffman (Director) 10 months ago

Use the X-Crawlera-Cookies header.

0
Answered
Nicolas Ramírez 4 years ago in Scrapy Cloud • updated by Pablo Hoffman (Director) 4 years ago 0


Answer

5 minutes

0
Answered
Nicolas Ramírez 4 years ago in Scrapy Cloud • updated by Pablo Hoffman (Director) 4 years ago 0

I want to copy and paste spider settings from a spider to another.

Answer

Not currently.

+1
Answered
Sebastián Fonseca 4 years ago in Scrapy Cloud • updated by Pablo Hoffman (Director) 4 years ago 0

I changed a spider and I would like to deploy only that spider, how can i do it?

Answer

Projects are deployed as a whole, individual spiders cannot be deployed separately.


Spider are often dependent on other project's code (such item declarations or help functions, among others) and deploying just the spider file may cause the project code to end up in an inconsistent state.


For this reason, all project code needs to be deployed at once. This is also a requirement by Scrapyd, the open source Scrapy deployment application.

0
Answered
Pablo Hoffman (Director) 4 years ago in Scrapy Cloud • updated 4 years ago 0


Answer

No, Scrapinghub is currently only offered as a hosted service.

0
Answered
Pablo Hoffman (Director) 4 years ago in Scrapy Cloud • updated 4 years ago 0


Answer

Our API documentation is available at:

http://help.scrapinghub.com/api.html

0
Answered
Pablo Hoffman (Director) 4 years ago in Scrapy Cloud • updated 4 years ago 0


Answer

Yes, our uptime reports are available at:

http://status.scrapinghub.com


You can also follow us on Twitter for outage updates.