Welcome to the Scrapinghub community forum! We discuss all things related to Scrapy Cloud, Portia and Crawlera. You can participate by reading posts, asking questions, providing feedback, helping others, and voting the best questions & answers. Scrapinghub employees regularly pop in to answer questions, share tips, and post announcements.

Remember to check the Help Center!

Please remember to check the Scrapinghub Help Center before asking here, your question may be already answered there.

0
Laurent Ades 17 hours ago in Portia 0

Hi,

new to all this... very exciting!

To give it a try i am scraping the open-mesh website to get all the products (less than 30 in total)

I have created a spider .... defined a sample page which pattern is followed by all product pages as expected...

http://www.open-mesh.com/products/s48-48-port-poe-cloud-managed-switch.html

It works pretty well, except for the price which sometimes is scraped, sometimes not... and can't get my hand on a specific reason coming for a difference in one page vs another one. I have tried to define fields with css or xpath but it does not change...

I have read other posts that kinda sound like my issue - bu t not exactly - whereby extraction does not always come up as expected....

Bug to corrected in the coming version (as i have read it ) or me doing something stupid ?


Thanks

0
jbothma yesterday at 8:36 a.m. in Scrapy Cloud 0

My spider gets killed after 2 hours of syncing dotscrapy.


DotScrapy Persistence and the HTTP Cache worked fine for a few days: I set a 4 day lifetime, it populated he cache and did a few good scrapes, the the cache expired and it had a couple of slow scrapes repopulating the cache, then since 2017-02-17 21:00:08 UTC my jobs get SIGTERM after only 2 hours


This is where my log ends each time after around 1 hour 55 mins.


6:
2017-02-17 21:00:16
INFO
[scrapy_dotpersistence] Syncing .scrapy directory from s3://scrapinghub-app-dash-addons/org-66666/79193/dot-scrapy/mfma/

7:
2017-02-17 22:51:11
INFO
[scrapy.crawler] Received SIGTERM, shutting down gracefully. Send again to force

I'm trying to figure out how to clear the dotscrapy storage and start afresh but I'd also like to know whether I'm doing something wrong so I don't get into this situation again.


Why would my job just get SIGTERM? Is there something killing slow AWS activity?


I don't think my dotscrapy is too large because I enabled gzip for the http cache which dealt with the out-of-space errors I initially got with the cache.

0
Answered
ghostmou 4 days ago in Scrapy Cloud • updated by Nestor Toledo Koplin (Support Engineer) 4 days ago 1

I am having issues downloading a paginated CSV. I have seen in the documentation that the parameters to control pagination are:

  • start, to indicate the start of the page, indicated in format <project>/<spider>/<job>/<item_id>.
  • count, to indicate the page's size.

When I try it in JSON format, it works like a charm. But with CSV, I'm having issues:

  • Parameter count works properly, returning the desired page size.
  • Parameter start seems to fail in CSV. I have tried both with the recommended format (<project>/<spider>/<job>/<item_id>) and with a numeral format (for example, to start the page on item 2500, pass the number 2499.

It seems to ignore the start parameter...

Example URLs used:

Any suggestions? :(

Thank you!

Answer

Hello,


You can use the Items API for this:

~ curl -u APIKEY: "https://storage.scrapinghub.com/items/<project_id>/<spider_id>/<job_id>?format=csv&include_headesr=1&fields=field1,field2,field3&start=<project_id>/<spider_id>/<job_id>/<item_id>&count=x"

Or


~ "https://storage.scrapinghub.com/items/<project_id>/<spider_id>/<job_id>?apikey=<apikey>&format=csv&include_headesr=1&fields=field1,field2,field3&start=/<project_id>/<spider_id>/<job_id>/<item_id>&count=x"
0
Answered
Adam 4 days ago in Portia • updated by Laurent Ades 18 hours ago 2

Hi guys,


My spider doesn't capture all of the fields that I've specified, even though it seems to work in the "Extracted Items" preview. I've tried different things and still no luck.


Some facts:


  • Data is available on the page load (it's not loaded with AJAX).
  • It's happening for all of the scraped pages.
  • I have 100% match on 3 out of 7 fields and none on the remaining 4.
  • I have tried setting up new sample page from scratch, using new schema but I still have the same issue.


There's nothing unusual in the log:


0:2017-02-16 08:02:39INFO

Log opened.

1:2017-02-16 08:02:39INFO

[scrapy.log] Scrapy 1.2.2 started

2:2017-02-16 08:02:40INFO

[stderr] /usr/local/lib/python2.7/site-packages/scrapy_pagestorage.py:7: HubstorageDeprecationWarning: python-hubstorage is deprecated, please use python-scrapinghub >= 1.9.0 instead (https://pypi.python.org/pypi/scrapinghub).

3:2017-02-16 08:02:40INFO

[stderr] from hubstorage import ValueTooLarge

4:2017-02-16 08:02:40INFO

[stderr] /usr/local/lib/python2.7/site-packages/scrapy/crawler.py:129: ScrapyDeprecationWarning: SPIDER_MANAGER_CLASS option is deprecated. Please use SPIDER_LOADER_CLASS.

5:2017-02-16 08:02:40INFO

[stderr] self.spider_loader = _get_spider_loader(settings)

6:2017-02-16 08:02:40INFO

[root] Slybot 0.13.0b30 Spider

7:2017-02-16 08:02:40INFO

[stderr] /src/slybot/slybot/slybot/plugins/scrapely_annotations/builder.py:334: ScrapyDeprecationWarning: Attribute `_root` is deprecated, use `root` instead

8:2017-02-16 08:02:40INFO

[stderr] elems = [elem._root for elem in page.css(selector)]

9:2017-02-16 08:02:40INFO

[scrapy.utils.log] Scrapy 1.2.2 started (bot: scrapybot)

10:2017-02-16 08:02:40INFO

[scrapy.utils.log] Overridden settings: {'LOG_LEVEL': 'INFO', 'AUTOTHROTTLE_ENABLED': True, 'STATS_CLASS': 'sh_scrapy.stats.HubStorageStatsCollector', 'MEMUSAGE_LIMIT_MB': 950, 'TELNETCONSOLE_HOST': '0.0.0.0', 'LOG_ENABLED': False, 'MEMUSAGE_ENABLED': True}

11:2017-02-16 08:02:40WARNING

[py.warnings] /src/slybot/slybot/slybot/closespider.py:10: ScrapyDeprecationWarning: Importing from scrapy.xlib.pydispatch is deprecated and will no longer be supported in future Scrapy versions. If you just want to connect signals use the from_crawler class method, otherwise import pydispatch directly if needed. See: https://github.com/scrapy/scrapy/issues/1762

More
12:2017-02-16 08:02:40INFO

[scrapy.log] HubStorage: writing items to https://storage.scrapinghub.com/items/156095/5/17

13:2017-02-16 08:02:40INFO

[scrapy.middleware] Enabled extensions:

More
14:2017-02-16 08:02:40INFO

[scrapy.middleware] Enabled downloader middlewares:

More
15:2017-02-16 08:02:40WARNING

[py.warnings] /usr/local/lib/python2.7/site-packages/scrapy_pagestorage.py:50: ScrapyDeprecationWarning: log.msg has been deprecated, create a python logger and log through it instead

More
16:2017-02-16 08:02:40INFO

[scrapy.log] HubStorage: writing pages to https://storage.scrapinghub.com/collections/156095/cs/Pages

17:2017-02-16 08:02:41INFO

[scrapy.middleware] Enabled spider middlewares:

More
18:2017-02-16 08:02:41INFO

[scrapy.middleware] Enabled item pipelines:

More
19:2017-02-16 08:02:41INFO

[scrapy.core.engine] Spider opened

20:2017-02-16 08:02:41INFO

[scrapy.extensions.logstats] Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)

21:2017-02-16 08:02:41INFO

TelnetConsole starting on 6023

22:2017-02-16 08:02:51WARNING

[py.warnings] /src/slybot/slybot/slybot/plugins/scrapely_annotations/processors.py:226: ScrapyDeprecationWarning: Attribute `_root` is deprecated, use `root` instead

More
23:2017-02-16 08:02:51WARNING

[py.warnings] /src/slybot/slybot/slybot/plugins/scrapely_annotations/processors.py:213: ScrapyDeprecationWarning: Attribute `_root` is deprecated, use `root` instead

More
24:2017-02-16 08:03:42INFO

[scrapy.extensions.logstats] Crawled 149 pages (at 149 pages/min), scraped 91 items (at 91 items/min)

25:2017-02-16 08:04:33INFO

[scrapy.crawler] Received SIGTERM, shutting down gracefully. Send again to force

26:2017-02-16 08:04:33INFO

[scrapy.core.engine] Closing spider (shutdown)

27:2017-02-16 08:04:41INFO

[scrapy.extensions.logstats] Crawled 188 pages (at 39 pages/min), scraped 126 items (at 35 items/min)

28:2017-02-16 08:05:11INFO

[scrapy.statscollectors] Dumping Scrapy stats:

More
29:2017-02-16 08:05:12INFO

[scrapy.core.engine] Spider closed (shutdown)

30:2017-02-16 08:05:12INFO

(TCP Port 6023 Closed)

31:2017-02-16 08:05:12INFO

Main loop terminated.


Any ideas why this happened?


Update:


I've tried setting up a brand new spider from scratch, same problem occurred.


On the original spider, I've added some random fields that are always on the page (such as login link or telephone number), that doesn't seem to get picked up.


I've tried to rename one of the 3 fields that work to see if my changes are actually deployed successfully, this worked, I could see the renamed field in the scraped data, still missing the other 4 fields though.


Thanks,

Adam

Answer

Hi Adam, our Portia team is about to release a new version of Portia with fixes to most bugs, like this, reported by our users.

We will update in our community for this new release.

Kind regards,

Pablo

0
Answered
triggerdev 6 days ago in Scrapy Cloud • updated by Nestor Toledo Koplin (Support Engineer) 5 days ago 1

Hi,


I am running a spider job every 3 minutes and I would like how can I get the last job from shub (Windows version)?


Thanks.

Answer

Hello,


You can use the JobQ API (https://doc.scrapinghub.com/api/jobq.html#jobq-project-id-list) with parameters state=finished and count=1 (if you only want the last one) to get.

Or you can use the Jobs API (https://doc.scrapinghub.com/api/jobs.html#jobs-list-json-jl).

0
Answered
mrcai 6 days ago in Scrapy Cloud • updated by Nestor Toledo Koplin (Support Engineer) 6 days ago 1

Hi,


I'm receiving the following error.


[scrapy.extensions.feedexport] Unknown feed format: 'jsonlines'


This works in my development environment, I'm not sure if I've missed enabling a plugin?


Many thanks,

Answer

Hello,

If you are adding FEED_FORMAT via the UI settings, try removing the ' '.

0
Completed
Tristan Bailey 6 days ago in Portia • updated by Pablo Vaz (Support Engineer) 5 days ago 1

I see in Portia 2.0 there is the option for FeedUrl as a starting page list type - text one link per line.

Is it possible to pass this feedurl in the API to start a new spider like "start_urls"?
(looks like may be not?)


Second part there is another post that mentions that you can do it with as RSS or XML sitemap.

I can not find the docs for this. It looks like it might work, but is there a spec for these formats, as they can vary.


Third part, is there any limit to the number of urls in these bulk methods for seeding?


thanks


tristan


Answer

Hi Tristan,


For the first question, the feed refers to a URL, so if you can update the data provided in that URL and schedule the spider in Scrapy Cloud, you could solve this. Perhaps there's another solution more efficient and our community members would like to share.

I think the second question is related to first one. But feel free to elaborate a bit more what you want to achieve so we can find a possible solution.


For the last question, according our Portia developers there's no limit for URLs, but keep in mind that pushing Portia beyond its limits has, as you may experience, uncomfortable consequences due to memory usage and capacity of our free storage.


Feel free to explore using Portia and share with us what you find. Your contributions are very helpful.


Kind regards,

Pablo

0
Answered
19dc60 6 days ago in Portia • updated by Pablo Vaz (Support Engineer) 5 days ago 1

I am getting the following when attempting to open my spider in Portia. Please advise why.

"Failed to load resource: the server responded with a status of 403 (Forbidden)"

Answer

Hi 19dc60!


Possibly a network issue, now is working fine. Feel free to ask if you need further assistance.


Kind regards,

Pablo

0
Answered
maniac103 6 days ago in Datasets • updated by Pablo Hoffman (Director) 5 days ago 2

I have a couple of spiders for which I want to automatically publish their results into a public dataset in the dataset catalog, overwriting data of the previous spider run. I seem to be unable to do that because datasets seem to be tied to jobs/runs, not to the spider in general. Am I missing something there? End goal is being able to fetch the data from an app, so

I need a static URL for the last run's data. Unfortunately the method described in [1] doesn't work for me, as it requires me to put my API key (which allows read/write access to the project) into the URL, which is not an option in this (open source) app.


Thanks for your help.


[1] http://help.scrapinghub.com/scrapy-cloud/fetching-latest-spider-data

Answer

Hi Maniac,


This is a feature we discussed about, and even though we plan to incorporate it at some point we can't provide an ETA as of yet.


I will forward the bug report to the product team.

0
Answered
I. Hathout 1 week ago in Scrapy Cloud • updated by Pablo Vaz (Support Engineer) 1 week ago 1

I can not run my scrapy spider i deployed it like the video on youtube, but it just ran for 16 seconds with 0 items and no_reson outcome !!

Answer

Hey Hathout, if you are deploying a Scrapy project, I suggest to follow this tutorial:

https://doc.scrapy.org/en/latest/intro/tutorial.html

It's quite complete, and works.


If using Portia, our visual scraper, take a moment to explore this tutorial:

http://help.scrapinghub.com/portia/using-portia-20-the-complete-beginners-guide


Good look with your projects, and be patient! Don't hesitate to ask here further questions.

Kind regards,

Pablo

0
Answered
edward.feng 1 week ago in Scrapy Cloud • updated by Nestor Toledo Koplin (Support Engineer) 1 week ago 1

I am trying to save PDF file in Base64 format binary in “items” storage of ScrapingHub.It is able to download the PDF document lesser than 1MB, but failed on 2 large PDF’s. I couldn’t find documentations in this regard… Does anyone had encountered this issue as well?


Traceback (most recent call last):

File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred

result = f(*args, **kw)

File "/usr/local/lib/python2.7/site-packages/pydispatch/robustapply.py", line 55, in robustApply

return receiver(*arguments, **named)

File "/usr/local/lib/python2.7/site-packages/sh_scrapy/extension.py", line 46, in item_scraped

self._write_item(item)

File "/usr/local/lib/python2.7/site-packages/scrapinghub/hubstorage/resourcetype.py", line 208, in write

return self.writer.write(item)

File "/usr/local/lib/python2.7/site-packages/scrapinghub/hubstorage/batchuploader.py", line 229, in write

.format(self.maxitemsize, truncated_data))

ValueTooLarge: Value exceeds max encoded size of 1048576 bytes: '{"_type": "CrossroadsItem", "isFound": "true", "requestID": "565780", "html": "JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PC9TdWJ0eXBlL1hNTC9MZW5ndGggMzIyMC9UeXBlL01ldGFkYXRhPj5zdHJlYW0KPD94cGFja2V0IGJlZ2luPSLvu78iIGlkPSJXNU0wTXBDZWhpSHpyZVN6TlRjemtjOWQiPz4KPHg6eG1wbWV0YSB4bWxuczp4PSJhZG9iZTpuczptZXRhLyIgeDp4bXB0az0iWE1QIENvcmUgNS41LjAiPgogICA8cmRmOlJERiB4bWxuczpyZGY9Imh0dHA6Ly93d3cudzMub3JnLzE5OTkvMDIvMjItcmRmLXN5bnRheC1ucyMiPgogICAgICA8cmRmOkRlc2NyaXB0aW9uIHJkZjphYm91dD0iIiB4bWxuczpwZGY9Imh0dHA6Ly9ucy5hZG9iZS5jb20vcGRmLzEuMy8iIHhtbG5zOnBkZmFpZD0iaHR0cDovL3d3dy5haWltLm9yZy9wZGZhL25zL2lkLyIgeG1sbnM6ZGM9Imh0dHA6Ly9wdXJsLm9yZy9kYy9lbGVtZW50cy8xLjEvIiB4bWxuczp4bXBNTT0iaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wL21tLyIgeG1sbnM6eG1wPSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAvIj4KICAgICAgICAgPHBkZjpQcm9kdWNlcj5QRlUgUERGIExpYnJhcnkgMS4yLjA7IG1vZGlmaWVkIHVzaW5nIGlUZXh0U2hhcnAgNS4wLjAgKGMpIDFUM1hUIEJWQkE8L3BkZjpQcm9kdWNlcj4KICAgICAgICAgPHBkZmFpZDpwYXJ0PjE8L3BkZmFpZDpwYXJ0PgogICAgICAgICA8cGRmYWlkOmNvbmZvcm1hbmNlPkI8L3BkZmFpZDpjb25mb3JtYW5jZ...'

Answer

Hello Edward,


There's a limit of 1MB per item/log/collection item and it cannot be increased. Possible solutions are to split the item so it doesn't exceed the limit and merge them during post-processing, try to produce smaller items or you could try using a file storage like S3.

0
Answered
nobody 1 week ago in Scrapy Cloud • updated 6 days ago 4

I got an error "ValueTooLarge: Value exceeds max encoded size of 1048576 bytes:" when running spider. I suppose there are limit of 1MiB (=1,048,576B) in html file.

Is it possible to relax size limitation?


Detailed Error:

Traceback (most recent call last):

  File "/usr/local/lib/python2.7/site-packages/scrapy/core/spidermw.py", line 42, in process_spider_input
    result = method(response=response, spider=spider)
  File "/usr/local/lib/python2.7/site-packages/scrapy_pagestorage.py", line 59, in process_spider_input
    self.save_response(response, spider)
  File "/usr/local/lib/python2.7/site-packages/scrapy_pagestorage.py", line 90, in save_response
    self._writer.write(payload)
  File "/usr/local/lib/python2.7/site-packages/scrapinghub/hubstorage/batchuploader.py", line 229, in write
    .format(self.maxitemsize, truncated_data))
ValueTooLarge: Value exceeds max encoded size of 1048576 bytes: '{"body": "<!DOC...(snip)
Answer

Hello,


Yes there's a limit of 1MB per item/log/collection item and it is not possible to increase. You could try to split the item so it doesn't exceed the limit and then join then during post-processing, producing smaller items or you could try using a file storage like S3.

0
Answered
Tristan Bailey 2 weeks ago in Portia • updated by Nestor Toledo Koplin (Support Engineer) 1 week ago 3

Hi


I want to crawl a clients website, but Portia just stops at page 1, I cant see robots.txt blocking so how can I see what might block this spider?



Time (UTC) Level Message
0: 2017-02-09 16:38:21 INFO Log opened.
1: 2017-02-09 16:38:21 INFO [scrapy.log] Scrapy 1.2.2 started
2: 2017-02-09 16:38:21 INFO [stderr] /usr/local/lib/python2.7/site-packages/scrapy_pagestorage.py:7: HubstorageDeprecationWarning: python-hubstorage is deprecated, please use python-scrapinghub >= 1.9.0 instead (https://pypi.python.org/pypi/scrapinghub).
3: 2017-02-09 16:38:21 INFO [stderr] from hubstorage import ValueTooLarge
4: 2017-02-09 16:38:21 INFO [stderr] /usr/local/lib/python2.7/site-packages/scrapy/crawler.py:129: ScrapyDeprecationWarning: SPIDER_MANAGER_CLASS option is deprecated. Please use SPIDER_LOADER_CLASS.
5: 2017-02-09 16:38:21 INFO [stderr] self.spider_loader = _get_spider_loader(settings)
6: 2017-02-09 16:38:21 INFO [root] Slybot 0.13.0b30 Spider
7: 2017-02-09 16:38:22 INFO [stderr] /src/slybot/slybot/slybot/plugins/scrapely_annotations/builder.py:334: ScrapyDeprecationWarning: Attribute `_root` is deprecated, use `root` instead
8: 2017-02-09 16:38:22 INFO [stderr] elems = [elem._root for elem in page.css(selector)]
9: 2017-02-09 16:38:22 INFO [scrapy.utils.log] Scrapy 1.2.2 started (bot: scrapybot)
10: 2017-02-09 16:38:22 INFO [scrapy.utils.log] Overridden settings: {'CLOSESPIDER_ITEMCOUNT': 1000, 'LOG_LEVEL': 'INFO', 'STATS_CLASS': 'sh_scrapy.stats.HubStorageStatsCollector', 'LOG_ENABLED': False, 'MEMUSAGE_LIMIT_MB': 950, 'TELNETCONSOLE_HOST': '0.0.0.0', 'CLOSESPIDER_PAGECOUNT': 1000, 'AUTOTHROTTLE_ENABLED': True, 'MEMUSAGE_ENABLED': True}
11: 2017-02-09 16:38:22 WARNING [py.warnings] /src/slybot/slybot/slybot/closespider.py:10: ScrapyDeprecationWarning: Importing from scrapy.xlib.pydispatch is deprecated and will no longer be supported in future Scrapy versions. If you just want to connect signals use the from_crawler class method, otherwise import pydispatch directly if needed. See: https://github.com/scrapy/scrapy/issues/1762 More
12: 2017-02-09 16:38:22 INFO [scrapy.log] HubStorage: writing items to https://storage.scrapinghub.com/items/135909/1/12524
13: 2017-02-09 16:38:22 INFO [scrapy.middleware] Enabled extensions: More
14: 2017-02-09 16:38:22 INFO [scrapy.middleware] Enabled downloader middlewares: More
15: 2017-02-09 16:38:22 WARNING [py.warnings] /usr/local/lib/python2.7/site-packages/scrapy_pagestorage.py:50: ScrapyDeprecationWarning: log.msg has been deprecated, create a python logger and log through it instead More
16: 2017-02-09 16:38:22 INFO [scrapy.log] HubStorage: writing pages to https://storage.scrapinghub.com/collections/135909/cs/Pages
17: 2017-02-09 16:38:23 INFO [scrapy.middleware] Enabled spider middlewares: More
18: 2017-02-09 16:38:23 INFO [scrapy.middleware] Enabled item pipelines: More
19: 2017-02-09 16:38:23 INFO [scrapy.core.engine] Spider opened
20: 2017-02-09 16:38:24 INFO [scrapy.extensions.logstats] Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
21: 2017-02-09 16:38:24 INFO TelnetConsole starting on 6023
22: 2017-02-09 16:38:37 ERROR [scrapy.core.scraper] Error downloading : []
23: 2017-02-09 16:38:37 INFO [scrapy.core.engine] Closing spider (finished)
24: 2017-02-09 16:38:38 INFO [scrapy.statscollectors] Dumping Scrapy stats: Less
{'downloader/exception_count': 3,
'downloader/exception_type_count/twisted.web._newclient.ResponseFailed': 3,
'downloader/request_bytes': 699,
'downloader/request_count': 3,
'downloader/request_method_count/GET': 3,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2017, 2, 9, 16, 38, 37, 649231),
'log_count/ERROR': 1,
'log_count/INFO': 9,
'log_count/WARNING': 2,
'memusage/max': 76365824,
'memusage/startup': 76365824,
'scheduler/dequeued': 3,
'scheduler/dequeued/disk': 3,
'scheduler/enqueued': 3,
'scheduler/enqueued/disk': 3,
'start_time': datetime.datetime(2017, 2, 9, 16, 38, 24, 453847)}
25: 2017-02-09 16:38:39 INFO [scrapy.core.engine] Spider closed (finished)
26: 2017-02-09 16:38:39 INFO Main loop terminated.
Answer

Hi Tristan,


I looked at the 22: 2017-02-09 16:38:37 ERROR [scrapy.core.scraper] Error downloading : [] in your project.

Try changing your start URL to https://www..., (not posting the complete URL because I assume you removed it purposely from the logs in this post).

0
Answered
Andreas Dreyer Hysing 2 weeks ago in Portia • updated by Pablo Vaz (Support Engineer) 5 days ago 2

I have finally managed to get a full application with end users based on Portia. It is natural to make a separate scrapinghub.com project for the deployed application (as production environment). I wish to configure, test, and develop Portia for new web sites in one project, and move the Portia configuration to a separate project when they are tested and ready. This will enable me to have different settings per project, and save the data to separate buckets in Amazon S3.

There used to be a button for moving Portia spiders between projects in Portia 1.0. In the new UI I can not find any such thing. Creating a an identical spider configuration every time I want to move data is a cumbersome and error prone extra step.

Please concider making a such feature and follow up on its status.

Answer

The feature to copy spiders between projects is currently under QA testing, but it will be released in the following patches.

0
Answered
1669573348 2 weeks ago in Crawlera • updated by Nestor Toledo Koplin (Support Engineer) 2 weeks ago 1

i want change exsist Crawlera Account Regions, how to do?

Answer

Hello,


This article will guide on how to create an account for a particular region: http://help.scrapinghub.com/crawlera/regional-ips-in-crawlera