AutoExtract FAQ

Why isn't data extracted correctly?
Web scraping is complex - there are bans, location-specific content, issues with remote websites, misbehaving web pages. Like humans, any useful machine lea...
Mon, 12 Aug, 2019 at 11:58 AM
How do I use the API?
See https://doc.scrapinghub.com/autoextract.html
Mon, 12 Aug, 2019 at 12:01 PM
How should I use the "probability" field?
This value is an indicator of how confident we are that a page is an individual Product or Article page, depending on whether pageType is "product"...
Mon, 12 Aug, 2019 at 11:59 AM
What are the possible errors and how should my code handle them?
See https://doc.scrapinghub.com/autoextract.html#errors
Mon, 12 Aug, 2019 at 12:00 PM
What should I do if my request returns with HTTP status code 429 ("too many requests")?
This status code indicates that service is too busy and either per-user or system-level rate limit is hit. The best thing to do is to continue sending reque...
Mon, 12 Aug, 2019 at 12:01 PM
Can I pass custom cookies to be used to download a web page?
At present the answer is No. Withstanding that, please be assured we are working on this feature, so if it's important for you please reach out so that...
Mon, 12 Aug, 2019 at 12:03 PM
Is JavaScript executed?
We enable or disable JavaScript to get the best extraction result.
Mon, 12 Aug, 2019 at 12:04 PM
Do I have to request URLs against the API in a polite manner or will the API take care of scheduling requests in such a way it doesn't DDoS the site?
API server rate limits the requests, we're trying to avoid causing any problems for target websites.
Mon, 12 Aug, 2019 at 12:04 PM
Are the content extraction techniques language agnostic?
Yes, Automatic Extraction API works on pages in all languages and from all countries.
Mon, 12 Aug, 2019 at 12:05 PM