0
Answered
Jazzity 3 months ago in Scrapy Cloud • updated by Nestor Toledo Koplin (Support Engineer) 2 months ago 6

Dear Forum,


I am trying to store scraped images to S3.

However, when launching the scraper I get the following error message:


ValueError: Missing scheme in request url: h



The message no longer appears when I deactivate the images addon, so it would seem that the problem is not actually the request url.


These are my spider settings:



Any helpful is greatly appreciated!


Regards,


Sebastian

Answer

Answer
Answered

Hi Sebastian, please check if you are setting the item as a list and not as a string in your spider, for example if you are yielding:


yield {

'image': response.css('example').extract_first(),

}

use


yield {

'image': [response.css('example').extract_first()],
}

To know more, please check the example provided in this excellent blog post:

https://blog.scrapinghub.com/2016/02/24/scrapy-tips-from-the-pros-february-2016-edition/


Best,


Pablo

GOOD, I'M SATISFIED

Solved my problem.

Satisfaction mark by Jazzity 2 months ago
+1
Waiting for Customer

Hi Sebastian,

I checked on your projects that jobs are running very well now without these errors.

Did you change something?

Best,

Pablo

Hey Pablo,

thanks for taking on my issue. There were no errors because I had deactivated the images addon. I have now reactivated it and am getting the same errors.

Thanks for your support!

Best,

Sebastian

Hi Pablo,

I have deactivated the Images addon for my main project jazzity, but I have just created a duplicate as a test project. The new project is called "S3_test" and has the ID 178090.

For this project I have enabled the Images addon - and I still can't get it to work. When I try to run the spider #8 (jk69) I still get the error message "ValueError: Missing scheme in request url: h".

Any hints are greatly appreciated!

Best wishes,

Sebastian

Answer
Answered

Hi Sebastian, please check if you are setting the item as a list and not as a string in your spider, for example if you are yielding:


yield {

'image': response.css('example').extract_first(),

}

use


yield {

'image': [response.css('example').extract_first()],
}

To know more, please check the example provided in this excellent blog post:

https://blog.scrapinghub.com/2016/02/24/scrapy-tips-from-the-pros-february-2016-edition/


Best,


Pablo

Hey Pablo,

thanks a lot, this seems to have done the trick!

However, I immediately run into another issue: I now get an error saying that "The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256."

I have created a new support ticket for this: https://support.scrapinghub.com/topics/2553-s3-authorization-mechanism-not-supported/

Thanks,

Sebastian

Thanks Sebastian! I will check the other ticket ASAP. Nice to help you as always!