Start a new topic
Answered

Unhandled error in Deferred

I'm new-ish to scraping but I'm technical (used to write scrapers at a much lower level).  I like Portia so far for quickly writing recipes for items.


I was just attempting to run a Job I had made via Portia but I keep getting a very generic error (below).  Everything looks fine in Portia when I'm writing the sample and the extracted data preview in the right-hand area looks good.  Ideas?


2017-10-21 23:42:14 INFO Log opened.

2017-10-21 23:42:14 INFO [scrapy.log] Scrapy 1.4.0 started

2017-10-21 23:42:15 INFO [stderr] /usr/local/lib/python2.7/site-packages/scrapy/crawler.py:134: ScrapyDeprecationWarning: SPIDER_MANAGER_CLASS option is deprecated. Please use SPIDER_LOADER_CLASS.

2017-10-21 23:42:15 INFO [stderr] self.spider_loader = _get_spider_loader(settings)

2017-10-21 23:42:15 INFO [root] Slybot 0.13.1 Spider

2017-10-21 23:42:16 INFO [scrapy.utils.log] Scrapy 1.4.0 started (bot: scrapybot)

2017-10-21 23:42:16 INFO [scrapy.utils.log] Overridden settings: {'LOG_LEVEL': 'INFO', 'AUTOTHROTTLE_ENABLED': True, 'LOG_ENABLED': False, 'MEMUSAGE_LIMIT_MB': 950, 'STATS_CLASS': 'sh_scrapy.stats.HubStorageStatsCollector', 'TELNETCONSOLE_HOST': '0.0.0.0'}

2017-10-21 23:42:16 WARNING [py.warnings] /src/slybot/slybot/slybot/plugins/scrapely_annotations/builder.py:366: ScrapyDeprecationWarning: Attribute `_root` is deprecated, use `root` instead

  elems = [elem._root for elem in self.selector.css(selector)]


2017-10-21 23:42:16 WARNING [py.warnings] /src/slybot/slybot/slybot/closespider.py:10: ScrapyDeprecationWarning: Importing from scrapy.xlib.pydispatch is deprecated and will no longer be supported in future Scrapy versions. If you just want to connect signals use the from_crawler class method, otherwise import pydispatch directly if needed. See: https://github.com/scrapy/scrapy/issues/1762

  from scrapy.xlib.pydispatch import dispatcher


2017-10-21 23:42:16 INFO [scrapy.middleware] Enabled extensions:

['scrapy.extensions.corestats.CoreStats',

 'scrapy.extensions.memusage.MemoryUsage',

 'scrapy.extensions.logstats.LogStats',

 'scrapy.extensions.debug.StackTraceDump',

 'scrapy.extensions.telnet.TelnetConsole',

 'slybot.closespider.SlybotCloseSpider',

 'scrapy.extensions.spiderstate.SpiderState',

 'scrapy.extensions.throttle.AutoThrottle',

 'sh_scrapy.extension.HubstorageExtension']

2017-10-21 23:42:16 ERROR Unhandled error in Deferred:

2017-10-21 23:42:16 CRITICAL [twisted] Unhandled error in Deferred:

2017-10-21 23:42:16 ERROR Traceback (most recent call last):

   File "/usr/local/lib/python2.7/site-packages/scrapy/commands/crawl.py", line 57, in run

     self.crawler_process.crawl(spname, **opts.spargs)

   File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 168, in crawl

     return self._crawl(crawler, *args, **kwargs)

   File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 172, in _crawl

     d = crawler.crawl(*args, **kwargs)

   File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1447, in unwindGenerator

     return _inlineCallbacks(None, gen, Deferred())

 --- <exception caught here> ---

   File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1301, in _inlineCallbacks

     result = g.send(result)

   File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 95, in crawl

     six.reraise(*exc_info)

   File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 76, in crawl

     self.spider = self._create_spider(*args, **kwargs)

   File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 99, in _create_spider

     return self.spidercls.from_crawler(self, *args, **kwargs)

   File "/usr/local/lib/python2.7/site-packages/scrapy/spiders/__init__.py", line 51, in from_crawler

     spider = cls(*args, **kwargs)

   File "/src/slybot/slybot/slybot/spidermanager.py", line 56, in __init__

     **kwargs)

   File "/src/slybot/slybot/slybot/spider.py", line 58, in __init__

     settings, spec, item_schemas, all_extractors)

   File "/src/slybot/slybot/slybot/spider.py", line 226, in _configure_plugins

     self.logger)

   File "/src/slybot/slybot/slybot/plugins/scrapely_annotations/annotations.py", line 89, in setup_bot

     self.extractors.append(SlybotIBLExtractor(list(group)))

   File "/src/slybot/slybot/slybot/plugins/scrapely_annotations/extraction/extractors.py", line 61, in __init__

     for p, v in zip(parsed_templates, template_versions)

   File "/src/slybot/slybot/slybot/plugins/scrapely_annotations/extraction/extractors.py", line 70, in build_extraction_tree

     basic_extractors = ContainerExtractor.apply(template, basic_extractors)

   File "/src/slybot/slybot/slybot/plugins/scrapely_annotations/extraction/container_extractors.py", line 65, in apply

     extraction_tree = cls._build_extraction_tree(containers)

   File "/src/slybot/slybot/slybot/plugins/scrapely_annotations/extraction/container_extractors.py", line 144, in _build_extraction_tree

     parent = containers[parent_id]

 exceptions.KeyError: u'9650-4934-a318#parent'

2017-10-21 23:42:16 CRITICAL [twisted] 

Traceback (most recent call last):

  File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1301, in _inlineCallbacks

    result = g.send(result)

  File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 95, in crawl

    six.reraise(*exc_info)

  File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 76, in crawl

    self.spider = self._create_spider(*args, **kwargs)

  File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 99, in _create_spider

    return self.spidercls.from_crawler(self, *args, **kwargs)

  File "/usr/local/lib/python2.7/site-packages/scrapy/spiders/__init__.py", line 51, in from_crawler

    spider = cls(*args, **kwargs)

  File "/src/slybot/slybot/slybot/spidermanager.py", line 56, in __init__

    **kwargs)

  File "/src/slybot/slybot/slybot/spider.py", line 58, in __init__

    settings, spec, item_schemas, all_extractors)

  File "/src/slybot/slybot/slybot/spider.py", line 226, in _configure_plugins

    self.logger)

  File "/src/slybot/slybot/slybot/plugins/scrapely_annotations/annotations.py", line 89, in setup_bot

    self.extractors.append(SlybotIBLExtractor(list(group)))

  File "/src/slybot/slybot/slybot/plugins/scrapely_annotations/extraction/extractors.py", line 61, in __init__

    for p, v in zip(parsed_templates, template_versions)

  File "/src/slybot/slybot/slybot/plugins/scrapely_annotations/extraction/extractors.py", line 70, in build_extraction_tree

    basic_extractors = ContainerExtractor.apply(template, basic_extractors)

  File "/src/slybot/slybot/slybot/plugins/scrapely_annotations/extraction/container_extractors.py", line 65, in apply

    extraction_tree = cls._build_extraction_tree(containers)

  File "/src/slybot/slybot/slybot/plugins/scrapely_annotations/extraction/container_extractors.py", line 144, in _build_extraction_tree

    parent = containers[parent_id]

KeyError: u'9650-4934-a318#parent'



Best Answer

Hey Ian,


According our team, seems to be a small bug in portia. It fails to find an element on the page so the extractor can't be built. Can you try with JS enabled?


Let us know how goes.


Best,


Pablo


Answer

Hey Ian,


According our team, seems to be a small bug in portia. It fails to find an element on the page so the extractor can't be built. Can you try with JS enabled?


Let us know how goes.


Best,


Pablo

That solved the issue!  Thanks!

Login to post a comment