0
Planned
v2065925 1 month ago in Portia • updated by Thriveni Patil (Support Engineer) 4 weeks ago 6

Starting from yesterday In URL Generation, GENERATION LIST сontains errors


Before the bug

I introduced a fixed part x and list 1 2 3

GENERATION LIST consisted of correct urls:

x1

x2

x3


But since yesterday urls became incorrect and contain an extra sign %20

x%201

x%202

x%203


How can fix this?


Answer

Answer
Planned

The URL rendering %20 in browser is a bug and Portia team would be working to fix the bug. 


But when the spider is run, the URLs in the request are correctly rendered. They seem to be blocked by site with 403 http code. You may need to use Proxy rotater like Crawlera to evade the bans.

Can anybody check and fix this?


This forum section is working?

Under review

Hello,


Sorry for the delay in response. Can you please link us to the job where you see the extra sign %20. The latest job https://app.scrapinghub.com/p/188288/1/49/requests had made 23 requests but without the extra sign. All the requests had http code of 403. You can try adding the User Agent in the settings while running the job.



Yes, in Job Requests links are correct 

but In the generated links, they are with an error

And the spider just gets on non-existing links and can not collect data

Answer
Planned

The URL rendering %20 in browser is a bug and Portia team would be working to fix the bug. 


But when the spider is run, the URLs in the request are correctly rendered. They seem to be blocked by site with 403 http code. You may need to use Proxy rotater like Crawlera to evade the bans.

Thank you, I will look at Crawlera . Is it easy to use? I do not have special knowledge, and I'm just very comfortable to use Portia because I can do everything simply

Yes Crawlera is also easy to use, to use with Portia, you would need to subscribe to Crawlera plan enable it in sccrapycloud and then run the Portia spider. To know more about Crawlera please refer the Knowledge base articles at https://helpdesk.scrapinghub.com/solution/folders/22000165055