Our Forums have moved to a new location - please visit Scrapinghub Help Center to post new topics.
You can still browse older topics on this page.
Welcome to the Scrapinghub community forum! We discuss all things related to Scrapy Cloud, Portia and Crawlera. You can participate by reading posts, asking questions, providing feedback, helping others, and voting the best questions & answers. Scrapinghub employees regularly pop in to answer questions, share tips, and post announcements.
I registered in mashape and I'm trying for example to fetch a page: http://site.pl
by means of the following script in PHP:
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, 'http://MY_NICK:MY_PASS@api.crawlera.com/fetch?url=http://site.pl');
In response gets only 1
What am I doing wrong?
How to fetch the page in PHP?
I can reply tickets in english, spanish and german. I would love to work in the support team of scrapinghub.
Where can I send a message for applying? thank you.
I ordered crawlera but I'm not sure if I really need it, cause I'm not getting banned.
Question is will the crawlera middleware work the same even if I'll be using the Crawlera Mashape API? If not how do I do this?
503 Service UnavailableNo server is available to handle this request.
ImportError: No module named psycopg2
ERROR:root:Script initialization failed
the psycopg2 egg contains nothing. Also I don't think that it is possible to build a valid egg with this library, as it is not pure python, it needs platform dependent C postgre libraries
1. S3 is up, but I can't figure how to discover this info: s3://<bucket name>/<base path>/ in Amazons' cpanel. I've manually uploaded an image, and when I click on it I get this url: https://s3-sa-east-1.amazonaws.com/cookpedia/1.jpg...
Can I get this info from here?
2. I can't find in autoscraping settings where to setup AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. If I click on Settings/Scrapy Settings and refer to images add-on clicking on the + only shows the other options like IMAGES_EXPIRES and so on...
3. I also seem to be unable to find this info aws access key and aws secret access key, is it this info on the link above (to the image)?
Thanks a lot :)
If you have already uploaded an image, then you have used an s3 bucket created by you (you can create all the bucket names you want, although their names are global, so you cannot create a bucket with the same name of another bucket created by anyone else in the s3 cloud)
the base path can be anything you want. It is just for the purpose of classify your data inside a bucket. It is like a folder. But you don't need to create it, because they are not really such. They are just file name prefixes, and the images addon will include it in the name of the file it creates for each uploaded image.
The AWS keys must be created from the aws amazon control panel (if no one were already created by default). They are needed for accessing your storage from outside. Probably what you need is to read some aws s3 tutorial in order to understand better how it works. Check for example
About question 2, the AWS_ settings are in the list of general settings, because they are not specific to the images addon. There are other components that may need s3 storage.
If you need more interactive help from us, you can use the support chat http://www.hipchat.com/gJog3cSUL
I made a crawler using Scrapy, problem is that the sitemap URLs (in sitemap index files) end with ".xml.gz", but are not actually gzipped (and when scrappy fetchez them & attempts to gunzip, it raises an error and refuses to continue).
I made a modified version of Scrapy that fixes this problem - but can I upload my own Scrapy libray to your cloud? (if yes, how? if not, do you have any suggested fix for my problem?)
actually scrapy tries to gunzip based on the response header, not the file name. Aside that, the decompression is handled by the httpcompression middleware:
which can be disabled with the setting COMPRESSION_ENABLED=0
When on the go, one sometimes wants to check the status of long-running spider jobs and other periodic jobs.
Current Scrapinghub on smartphones or tablet makes links and buttons quite small and difficult to select.
It would be cool to have a stripped down version of the Scrapinghub, making better use of small(er) screens,
for example by moving tall menus and advanced options to top bar menus or other offcanvas techniques (http://foundation.zurb.com/docs/components/offcanv...)
There are no plans to have a mobile specific version of the Scrapinghub dashboard, although it's fluid/reactive layout should make it render acceptable in modern mobile browser.
Customer support service by UserEcho