Start a new topic

Using Splash and SitemapSpiders

Hi,


So we currently use splash to on some of our crawlers to grab dynamic data.


However we are trying to use scrapy splash and sitemap spiders and would like to know if its possible and how to do it? We currently do the following:


Here is a basic template we are using, currently it works but does not interact with splash and docker the same way a 'class TestSpider(scrapy.Spider) does?


Any help would be appreciated!

     

class TestSpider(scrapy.spiders.SitemapSpider):
    name = "Test"
    allowed_domains = ["Test.com"]
    sitemap_urls = ["https://www.test.com/sitemap_index.xml"]
    sitemap_rules = [('/catalog/product/', 'parse')]
    
    custom_settings = {

    "SPLASH_URL": 'http://localhost:8050',
    "DOWNLOADER_MIDDLEWARES": {
        'scrapy_splash.SplashCookiesMiddleware': 723,
        'scrapy_splash.SplashMiddleware': 725,
        'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
    },
    "SPIDER_MIDDLEWARES": {
        'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
    },
    "DUPEFILTER_CLASS": 'scrapy_splash.SplashAwareDupeFilter',
    } 

  

Login to post a comment