Start a new topic
Answered

Portia Feed URL Error

I am trying pass list of URL feeds from public files from google drive. It is simple .txt file with format similar to sample feed page.


https://drive.google.com/file/d/0B2DOTGYKanqESlRiZngwYzd0dXhZY1JURkNfVEFPX05tNmpJ/view?usp=sharing


Portia is failing using below error. The file is in UTF-8 format. 


[scrapy.core.scraper] Spider error processing <GEThttps://drive.google.com/file/d/0B2DOTGYKanqESlRiZngwYzd0dXhZY1JURkNfVEFPX05tNmpJ/view?usp=sharing> (referer: None)

 Less

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
    yield next(it)
  File "/usr/local/lib/python2.7/site-packages/sh_scrapy/middlewares.py", line 30, in process_spider_output
    for x in result:
  File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
    for x in result:
  File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/usr/local/lib/python2.7/site-packages/scrapy_pagestorage.py", line 98, in process_spider_output
    for r in result:
  File "/src/slybot/slybot/slybot/starturls/feed_generator.py", line 17, in parse_urls
    yield Request(url, callback=self.callback)
  File "/usr/local/lib/python2.7/site-packages/scrapy/http/request/__init__.py", line 25, in __init__
    self._set_url(url)
  File "/usr/local/lib/python2.7/site-packages/scrapy/http/request/__init__.py", line 58, in _set_url
    raise ValueError('Missing scheme in request url: %s' % self._url)
ValueError: Missing scheme in request url: d),b.tick(%22tbsd_%22,%22wtsrt_%22))%7Dtry%7Ba=null,window.chrome&&window.chrome.csi&&(a=Math.floor(window.chrome.csi().pageT),b&&0%3Cc&&(b.tick(%22_tbnd%22,void%200,window.chrome.csi().startE),b.tick(%22tbnd_%22,%22_tbnd%22,c))),null==a&&window.gtbExternal&&(a=window.gtbExternal.pageT()),null==a&&window.external&&(a=window.external.pageT,b&&0%3Cc&&(b.tick(%22_tbnd%22,void%200,window.external.startE),b.tick(%22tbnd_%22,%22_tbnd%22,c))),a&&(window.jstiming.pt=a)%7Dcatch(g)%7B%7D%7D)();%7D).call(this);

19:2018-05-10 14:14:07WARNING

[py.warnings] /usr/local/lib/python2.7/site-packages/scrapy/link.py:21: UserWarning: Link urls must be str objects. Assuming utf-8 encoding (which could be wrong)


Best Answer

The link you provided is the preview link (/view), instead, use the export link and it should work.

1 Comment

Answer

The link you provided is the preview link (/view), instead, use the export link and it should work.

Login to post a comment