Start a new topic

NLTK dependency data not available, How to download

I used NLTK package in spider pipeline file. However, the NLTK dependency data is not downloaded in scrapinghub cloud. In local python, we just use nltk.download() to download them. Any way to download the NLTK data on scrapinghub? I paste the processing error as below.

Traceback (most recent call last):
  File "/app/python/lib/python3.6/site-packages/sumy/nlp/tokenizers.py", line 79, in _get_sentence_tokenizer
    return nltk.data.load(path)
  File "/app/python/lib/python3.6/site-packages/nltk/data.py", line 836, in load
    opened_resource = _open(resource_url)
  File "/app/python/lib/python3.6/site-packages/nltk/data.py", line 954, in _open
    return find(path_, path + ['']).open()
  File "/app/python/lib/python3.6/site-packages/nltk/data.py", line 675, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource �[93mpunkt�[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  �[31m>>> import nltk
  >>> nltk.download('punkt')
  �[0m
  Searched in:
    - '/scrapinghub/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/usr/local/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
1 Comment

How are you deploying? With a requirements.txt file or did you make your own Docker image containing this data?

Login to post a comment