0
Answered
tofunao1 1 month ago in Scrapy Cloud • updated by Pablo Vaz (Support Engineer) 3 weeks ago 1

Now I need to write a new spider. And the spider need to:

  1. Download a zip file from a website, about 3GB per file.
  2. Unzip the download file, then I got many xml files.
  3. Parse the xml, and select the information what I need into one item or mysql tables.

But there exists some questions in above steps:

  1. Where can I put the download files? Amazon S3?
  2. How can I unzip the file if I put the file in S3?
  3. If the files in S3 is very big, such as 3GB. How can I open the S3 file from scrapinghub?
  4. Can I use the ftp instead of the Amazon S3 if the file is 3GB?

Thank you.

Answer

Answer
Answered

Hi Tofunao, we don't provide coding assistance through this forum.


I suggest to visit our Reddit - Scrapy channel:

https://www.reddit.com/r/scrapy/

and poste there any inquiries related to the spider. 


Although these suggestions you can find more information in our Scrapy Cloud API related to manage your items and fetching data:

https://doc.scrapinghub.com/scrapy-cloud.html

Regards,


Pablo

Answer
Answered

Hi Tofunao, we don't provide coding assistance through this forum.


I suggest to visit our Reddit - Scrapy channel:

https://www.reddit.com/r/scrapy/

and poste there any inquiries related to the spider. 


Although these suggestions you can find more information in our Scrapy Cloud API related to manage your items and fetching data:

https://doc.scrapinghub.com/scrapy-cloud.html

Regards,


Pablo