Welcome to the Scrapinghub community forum! We discuss all things related to Scrapy Cloud, Portia and Crawlera. You can participate by reading posts, asking questions, providing feedback, helping others, and voting the best questions & answers. Scrapinghub employees regularly pop in to answer questions, share tips, and post announcements.
Remember to check the Help Center!
Please remember to check the Scrapinghub Help Center before asking here, your question may be already answered there.
Hi, just started exploring the Autoscraper, and I'm very impressed by what it has to offer. Natalia's screencast has been very helpful for seeing how it works.
As I use it I'm keeping a list of UI suggestions. Do you want me to post these right now?
If you're still hashing out the UI I don't want to bombard you over things that will be changed anyway.
If you want me to post the suggestions just let me know.
We are aware of many things to improve in the UI. And we are already implementing many changes which will be released in some weeks, and even more improvements specifically in the annotation tool that will be available later. Feel free to do any suggestion but consider that many of them could have been already issued or planned to do, and others even could become not applicable.
Are services down? it wont start any schedule and simply get stuck on pending.
This was caused by over-demand last night, which affected customers with no dedicated servers. Load is back to normal now.
I've scraped some items from a website, but when downloading data they don't show up, only URL, body and cookies in JSON. Shoudn't be an ITEM column?
Check the documentation:
first section (basic concepts and procedures)
In order to switch from annotating mode to normal
mode, you have to remove the "annotating" tag from the spiders
and run again. But important! If you did not get good results in
annotating mode, you will not get good results in normal mode. So ensure
you have thoroughly tested in annotating mode.
As a fast intro, AS basically runs in two different modes: annotating mode and normal mode. The "annotating" mode is only for the purpose of capture pages, add templates and test them. The normal mode is what you need to actually get the items, once you tested everything properly in annotating mode.
It takes about 1 sec to scrape one item group. Why is it so slow on this platform ?, My spider is running since about 20 hours and still very slow. I have used scrapy and it was pretty fast.
It is because of the Autothrottle addon. Check this documentation for scrapinghub users, which explains why we limit spider speed with Autothrottle and how to change the behaviour:
I got the message closespider_pagecount what does that mean?
It means the spider has reached the maximum number of pages allowed to crawl, and was terminated because of that.
Autoscraping runs in annotating mode always have that limit in place.
The actual name "closespider_pagecount" comes from the Scrapy extension that powers the shutdown: https://scrapy.readthedocs.org/en/latest/topics/extensions.html#closespider-pagecount
I had followed instructions to annotate in the template, but the spider is not extracting any fields. Onlu body,url items are extracted.
If you cannot see extracted data in an annotating mode run it usually means that the templates are not extracting (even they have not annotated) all the required fields. Check how you defined the fields of the item, in particular their Required flag, and also check that the template is annotating all the required ones, or remove the Required flag from those fields that you really don't expect to annotate or extract with every template.
For more detailed info please check the autoscraping documentation, in particular the section that explains how templates are used in the extraction process.
I want to copy and paste spider settings from a spider to another.
I changed a spider and I would like to deploy only that spider, how can i do it?
Projects are deployed as a whole, individual spiders cannot be deployed separately.
Spider are often dependent on other project's code (such item declarations or help functions, among others) and deploying just the spider file may cause the project code to end up in an inconsistent state.
For this reason, all project code needs to be deployed at once. This is also a requirement by Scrapyd, the open source Scrapy deployment application.
No, Scrapinghub is currently only offered as a hosted service.
Customer support service by UserEcho