0
Answered
ihoekstra 3 months ago in Scrapy Cloud • updated by Pablo Vaz (Support Engineer) 2 months ago 5

Hi,


I am trying to deploy my first project to scrapinghub and I'm very confused. I tried to follow the instructions here. It says that if you type "shub deploy" you will "be guided through a wizard that will set up the project configuration file (scrapinghub.yml) for you."


But this never happens. I just get "ImportError: No module named bs4". I suppose that is because the dependency (on beautifulsoup4) needs to be set in the scrapinghub,yml file, which hasn't been created..


Should I try to write the file by hand? If so, what needs to be in it and in what folder should it be stored? If I shouldn't write the file by hand, is there something I can do to meet this mysterious wizard so he can do it for me??

Answer

Answer

OK, I finally figured it out. I manually created a scrapinghub.yml file, in the same directory that scrapy.cfg was in. Then I created a file called requirements.txt, as explained here. This file should be in that same directory!


scrapinghub.yml looks like this:


projects:
default: [yourprojectid]


requirements_file: requirements.txt


And requirements.txt looks like this:


beautifulsoup4==4.5.1


On to the next error... but I will leave that for another thread.

Waiting for Customer

Can you share more of your console logs? (what comes before and around that ImportError)

Just to confirm, the bs4 dependency is for your spider code, right? (because I don't see bs4 needed in the shub command line client)

Thank you for your quick response. Yes, bs4 is for the spider code.


Here is my log:


c:\src\Weizen\src>shub deploy


-------------------------------------------------------------------------------
Welcome to shub version 2!


This release contains major updates to how shub is configured, as well as
updates to the commands and shub's look & feel.


Run 'shub' to get an overview over all available commands, and
'shub command --help' to get detailed help on a command. Definitely try the
new 'shub items -f [JOBID]' to see items live as they are being scraped!


From now on, shub configuration should be done in a file called
'scrapinghub.yml', living next to the previously used 'scrapy.cfg' in your
Scrapy project directory. Global configuration, for example API keys, should be
done in a file called '.scrapinghub.yml' in your home directory.


But no worries, shub has automatically migrated your global settings to
~/.scrapinghub.yml, and will also automatically migrate your project settings
when you run a command within a Scrapy project.


Visit http://doc.scrapinghub.com/shub.html for more information on the new
configuration format and its benefits.


Happy scraping!
-------------------------------------------------------------------------------


Error: Not logged in. Please run 'shub login' first.


c:\src\Weizen\src>shub login
Enter your API key from https://app.scrapinghub.com/account/apikey
API key: [apikey]
Validating API key...
API key is OK, you are logged in now.


c:\src\Weizen\src>shub deploy
Target project ID: [projectid]
Save as default [Y/n]: n
Packing version 1483608643
Deploying to Scrapy Cloud project "[projectid]"
Deploy log last 30 lines:


Deploy log location: C:\Users\irn\AppData\Local\Temp\shub_deploy_vx1xbs4k.log
Error: Deploy failed: b'{"status": "error", "message": "Internal build error"}'
File "/usr/local/lib/python2.7/site-packages/sh_scrapy/crawl.py", line 145, in _run_usercode
_run(args, settings)
File "/usr/local/lib/python2.7/site-packages/sh_scrapy/crawl.py", line 103, in _run
_run_scrapy(args, settings)
File "/usr/local/lib/python2.7/site-packages/sh_scrapy/crawl.py", line 111, in _run_scrapy
execute(settings=settings)
File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 141, in execute
cmd.crawler_process = CrawlerProcess(settings)
File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 238, in __init__
super(CrawlerProcess, self).__init__(settings)
File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 129, in __init__
self.spider_loader = _get_spider_loader(settings)
File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 325, in _get_spider_loader
return loader_cls.from_settings(settings.frozencopy())
File "/usr/local/lib/python2.7/site-packages/scrapy/spiderloader.py", line 33, in from_settings
return cls(settings)
File "/usr/local/lib/python2.7/site-packages/scrapy/spiderloader.py", line 20, in __init__
self._load_all_spiders()
File "/usr/local/lib/python2.7/site-packages/scrapy/spiderloader.py", line 28, in _load_all_spiders
for module in walk_modules(name):
File "/usr/local/lib/python2.7/site-packages/scrapy/utils/misc.py", line 71, in walk_modules
submod = import_module(fullpath)
File "/usr/local/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
File "/app/__main__.egg/src/spiders/ksa_spider.py", line 1, in <module>
ImportError: No module named bs4
Process terminated with exit code 1, signal None, status=0x0100
{"message": "List exit code: 193", "details": null, "error": "build_error"}
{"status": "error", "message": "Internal build error"}

I would like to add that the global yml file in my home directory (~/.scrapinghub.yml) was created properly. It is just the local file that's missing. I found this page and tried to write the local file myself but I don't understand how to declare the dependencies. The page I just linked is a bit outdated I think, because the use of Egg is assumed. We're no longer supposed to use that, right?

Answer

OK, I finally figured it out. I manually created a scrapinghub.yml file, in the same directory that scrapy.cfg was in. Then I created a file called requirements.txt, as explained here. This file should be in that same directory!


scrapinghub.yml looks like this:


projects:
default: [yourprojectid]


requirements_file: requirements.txt


And requirements.txt looks like this:


beautifulsoup4==4.5.1


On to the next error... but I will leave that for another thread.

Answered

Hey Glad you could solve it by yourself.

Your answer will be helpful for other users having the same issues. Thanks!