0
Answered
lucse11 4 weeks ago in Scrapy Cloud • updated by Pablo Vaz (Support Engineer) 4 weeks ago 1

I'm having problem following pagination of this website: http://gamesurf.tiscali.it/ps4/recensioni.html

My spider part of code :

for pag in response.css('li.square-nav'):
    next = pag.css('li.square-nav > a > span::text').extract_first()
    if next=='»':
        next_page_url = pag.css('a::attr(href)').extract_first()
        if next_page_url:
            next_page_url = response.urljoin(next_page_url)
            yield scrapy.Request(url=next_page_url, callback=self.parse)


If i run my spider in terminal it works on all pages of the website, but when i deploy to scrapinghub and run from the button in the dashboard, spider scrape only the first page of the website.

Between log messages there is a warning: [py.warnings] /app/__main__.egg/reccy/spiders/reccygsall.py:21: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal.


I have checked problem is not caused by robot.txt.

How can i fix this?

Thanks


Answer

Answer
Answered

Hey Lucse,


Please check this post, seems related to your issue.

https://stackoverflow.com/questions/18193305/python-unicode-equal-comparison-failed


Basically Your program, seems to be comparing unicode objects with str objects, and the contents of a str object is not a valid UTF8 encoding. Not much convinced that would work, but did you try using something like:


if next == unicode('»'):


or related?


Best,


Pablo

Answer
Answered

Hey Lucse,


Please check this post, seems related to your issue.

https://stackoverflow.com/questions/18193305/python-unicode-equal-comparison-failed


Basically Your program, seems to be comparing unicode objects with str objects, and the contents of a str object is not a valid UTF8 encoding. Not much convinced that would work, but did you try using something like:


if next == unicode('»'):


or related?


Best,


Pablo