0
Not a bug
parulchouhan1990 2 months ago in Scrapy Cloud • updated by Pablo Vaz (Support Engineer) 2 months ago 1

I am trying to insert input urls using the text file.

def start_requests(self):
        # read file data
        with open(self.url, 'r') as f:
            content = f.readlines()
        
        for url in content:
            yield scrapy.Request(url)


Using above code but getting error 

IOError: [Errno 2] No such file or directory

Answer

Answer
Not a bug

Hi Parul, 


As seen in our Article:


You need to declare the files in the <strong>package_data</strong>  section of your <strong>setup.py</strong>  file.

For example, if your Scrapy project has the following structure:

myproject/
  __init__.py
  settings.py
  resources/
    cities.txt
scrapy.cfg
setup.py

You would use the following in your <strong>setup.py</strong>  to include the <strong>cities.txt</strong>  file:

HTML

setup(
    name='myproject',
    version='1.0',
    packages=find_packages(),
    package_data={
        'myproject': ['resources/*.txt']
    },
    entry_points={
        'scrapy': ['settings = myproject.settings']
    },
    zip_safe=False,
)

Note that the <strong>zip_safe</strong> flag is set to <strong>False</strong> , as this may be needed in some cases.

Now you can access the <strong>cities.txt</strong>  file content from <strong>setting.py</strong> like this:

import pkgutil
data = pkgutil.get_data("myproject", "resources/cities.txt")

Note that this code works for the example Scrapy project structure defined at the beginning of the article. If your project has different structure - you will need to adjust <strong>package_data</strong> section and your code accordingly.

For advanced resource access take a look at setuptools pkg_resources module.


Best regards,


Pablo

Answer
Not a bug

Hi Parul, 


As seen in our Article:


You need to declare the files in the <strong>package_data</strong>  section of your <strong>setup.py</strong>  file.

For example, if your Scrapy project has the following structure:

myproject/
  __init__.py
  settings.py
  resources/
    cities.txt
scrapy.cfg
setup.py

You would use the following in your <strong>setup.py</strong>  to include the <strong>cities.txt</strong>  file:

HTML

setup(
    name='myproject',
    version='1.0',
    packages=find_packages(),
    package_data={
        'myproject': ['resources/*.txt']
    },
    entry_points={
        'scrapy': ['settings = myproject.settings']
    },
    zip_safe=False,
)

Note that the <strong>zip_safe</strong> flag is set to <strong>False</strong> , as this may be needed in some cases.

Now you can access the <strong>cities.txt</strong>  file content from <strong>setting.py</strong> like this:

import pkgutil
data = pkgutil.get_data("myproject", "resources/cities.txt")

Note that this code works for the example Scrapy project structure defined at the beginning of the article. If your project has different structure - you will need to adjust <strong>package_data</strong> section and your code accordingly.

For advanced resource access take a look at setuptools pkg_resources module.


Best regards,


Pablo