Start a new topic

Difference in FormRequest Post action in python 3 against python2

Hi Team,


Below FormRequest post action is working fine in python 2 for scrapy:1.5 but now working in scrapy:1.5-py3. Having callback_method information in meta dict is causing failure. Please highlight or suggest work around.


yield FormRequest(url=login_url, callback=self.after_login, formdata=form_data
,meta={'followup_url':url,'callback_method':callback_method})


Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
yield next(it)
GeneratorExit

Unhandled error in Deferred:

[twisted] Unhandled error in Deferred:

Traceback (most recent call last): Less

File "/usr/local/lib/python3.6/site-packages/twisted/internet/base.py", line 1243, in run
self.mainLoop()
File "/usr/local/lib/python3.6/site-packages/twisted/internet/base.py", line 1252, in mainLoop
self.runUntilCurrent()
File "/usr/local/lib/python3.6/site-packages/twisted/internet/base.py", line 878, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/usr/local/lib/python3.6/site-packages/twisted/internet/task.py", line 671, in _tick
taskObj._oneWorkUnit()
--- <exception caught here> ---
File "/usr/local/lib/python3.6/site-packages/twisted/internet/task.py", line 517, in _oneWorkUnit
result = next(self._iterator)
File "/usr/local/lib/python3.6/site-packages/scrapy/utils/defer.py", line 63, in <genexpr>
work = (callable(elem, *args, **named) for elem in iterable)
File "/usr/local/lib/python3.6/site-packages/scrapy/core/scraper.py", line 183, in _process_spidermw_output
self.crawler.engine.crawl(request=output, spider=spider)
File "/usr/local/lib/python3.6/site-packages/scrapy/core/engine.py", line 210, in crawl
self.schedule(request, spider)
File "/usr/local/lib/python3.6/site-packages/scrapy/core/engine.py", line 216, in schedule
if not self.slot.scheduler.enqueue_request(request):
File "/usr/local/lib/python3.6/site-packages/scrapy/core/scheduler.py", line 57, in enqueue_request
dqok = self._dqpush(request)
File "/usr/local/lib/python3.6/site-packages/scrapy/core/scheduler.py", line 86, in _dqpush
self.dqs.push(reqd, -request.priority)
File "/usr/local/lib/python3.6/site-packages/queuelib/pqueue.py", line 35, in push
q.push(obj) # this may fail (eg. serialization error)
File "/usr/local/lib/python3.6/site-packages/scrapy/squeues.py", line 15, in push
s = serialize(obj)
File "/usr/local/lib/python3.6/site-packages/scrapy/squeues.py", line 27, in _pickle_serialize
return pickle.dumps(obj, protocol=2)
builtins.TypeError: can't pickle _thread.lock objects


[twisted] Less

Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/twisted/internet/task.py", line 517, in _oneWorkUnit
result = next(self._iterator)
File "/usr/local/lib/python3.6/site-packages/scrapy/utils/defer.py", line 63, in <genexpr>
work = (callable(elem, *args, **named) for elem in iterable)
File "/usr/local/lib/python3.6/site-packages/scrapy/core/scraper.py", line 183, in _process_spidermw_output
self.crawler.engine.crawl(request=output, spider=spider)
File "/usr/local/lib/python3.6/site-packages/scrapy/core/engine.py", line 210, in crawl
self.schedule(request, spider)
File "/usr/local/lib/python3.6/site-packages/scrapy/core/engine.py", line 216, in schedule
if not self.slot.scheduler.enqueue_request(request):
File "/usr/local/lib/python3.6/site-packages/scrapy/core/scheduler.py", line 57, in enqueue_request
dqok = self._dqpush(request)
File "/usr/local/lib/python3.6/site-packages/scrapy/core/scheduler.py", line 86, in _dqpush
self.dqs.push(reqd, -request.priority)
File "/usr/local/lib/python3.6/site-packages/queuelib/pqueue.py", line 35, in push
q.push(obj) # this may fail (eg. serialization error)
File "/usr/local/lib/python3.6/site-packages/scrapy/squeues.py", line 15, in push
s = serialize(obj)
File "/usr/local/lib/python3.6/site-packages/scrapy/squeues.py", line 27, in _pickle_serialize
return pickle.dumps(obj, protocol=2)
TypeError: can't pickle _thread.lock objects




Hi Team,

     Was able to make it work in scrapy:1.5-py3 using getattr()

response.meta['callback_url'] = response.url
response.meta['callback_method'] = callback_method.__name__
yield FormRequest(url=login_url, callback=self.after_login, formdata=form_data, meta=response.meta)
def after_login(self, response):
yield response.follow(url=response.meta['callback_url'], callback=getattr(self,response.meta['callback_method']),
meta=response.meta)

Explanation on why earlier code failed when passing function via FormRequest meta under scrapy:1.5-py3 would be helpful


Please note that above statement is working fine in local environment for python 3.6. Issue only when running via scrapinghub

 

Login to post a comment