Zyte API proxy mode#

To use Zyte API as a proxy, use the api.zyte.com:8011 endpoint, with your API key and proxy headers:

using System;
using System.IO;
using System.Net;
using System.Text;

var proxy = new WebProxy("http://api.zyte.com:8011", true);
proxy.Credentials = new NetworkCredential("YOUR_API_KEY", "");

var request = (HttpWebRequest)WebRequest.Create("https://toscrape.com");
request.Proxy = proxy;
request.PreAuthenticate = true;
request.AllowAutoRedirect = false;

var response = (HttpWebResponse)request.GetResponse();
var stream = response.GetResponseStream();
var reader = new StreamReader(stream);
var httpResponseBody = reader.ReadToEnd();
reader.Close();
response.Close();

Console.WriteLine(httpResponseBody);
curl \
    --proxy api.zyte.com:8011 \
    --proxy-user YOUR_API_KEY: \
    --compressed \
    https://toscrape.com
const axios = require('axios')

axios
  .get(
    'https://toscrape.com',
    {
      proxy: {
        protocol: 'http',
        host: 'api.zyte.com',
        port: 8011,
        auth: {
          username: 'YOUR_API_KEY',
          password: ''
        }
      }
    }
  )
  .then((response) => {
    const httpResponseBody = response.data
    console.log(httpResponseBody)
  })
<?php

$client = new GuzzleHttp\Client();
$response = $client->request('GET', 'https://toscrape.com', [
    'proxy' => 'http://YOUR_API_KEY:@api.zyte.com:8011',
]);
$http_response_body = (string) $response->getBody();
fwrite(STDOUT, $http_response_body);
import requests

response = requests.get(
    "https://toscrape.com",
    proxies={
        scheme: "http://YOUR_API_KEY:@api.zyte.com:8011" for scheme in ("http", "https")
    },
)
http_response_body: bytes = response.content
print(http_response_body.decode())

When using scrapy-zyte-smartproxy, set the ZYTE_SMARTPROXY_URL setting to "http://api.zyte.com:8011" and the ZYTE_SMARTPROXY_APIKEY setting to your API key for Zyte API.

Then you can continue using Scrapy as usual and all requests will be proxied through Zyte API automatically.

from scrapy import Spider


class ToScrapeSpider(Spider):
    name = "toscrape_com"
    start_urls = ["https://toscrape.com"]

    def parse(self, response):
        print(response.text)

Limitations#

The proxy mode makes it easier to migrate existing code that uses a proxy service.

However, the proxy mode has some limitations when compared to the HTTP API:

Request headers#

The following headers allow changing how a request is sent through Zyte API in proxy mode.

Zyte-Client#

May be used to report to Zyte the software being used to access Zyte API.

It should be formatted with the syntax of the User-Agent header, e.g. curl/1.2.3.

Zyte-Device#

Sets device emulation.

Zyte-Geolocation#

Sets a geolocation.

Zyte-JobId#

Sets the ID of the Scrapy Cloud job that is sending the request.

scrapy-zyte-smartproxy sets this header automatically when used from a Scrapy Cloud job.

Zyte-Override-Headers#

Zyte API automatically sends some request headers for ban avoidance.

Custom headers from your request will override most automatic headers, but not these:

Accept
Accept-Encoding
User-Agent

To override any of these 3 headers, set Zyte-Override-Headers to a comma-separated list of names of headers to override, e.g. Zyte-Override-Headers: Accept,Accept-Encoding.

Warning

Overriding headers can break Zyte API ban avoidance.

Zyte-Session-ID#

Sets session.id for a client-managed session.

Invalid request headers#

The following headers are not allowed, and any request with one or more of them will result in an HTTP 400 response:

Client-IP
Cluster-Client-IP
Forwarded-For
True-Client-IP
Via
X-Client-IP
X-Forwarded
X-Forwarded-For
X-Forwarded-Host
X-Host
X-Original-URL
X-Originating-IP
X-ProxyUser-IO
X-ProxyUser-IP
X-Remote-Addr
X-Remote-IP

Response headers#

Responses include some headers injected by Zyte API.

Note that the response body of unsuccessful responses is always the actual JSON response from the HTTP API that provides error details.

Zyte-Error#

The presence of this header indicates that the response was unsuccessful.

It’s value should be ignored and not relied upon, as it is an internal error ID subject to change at any time.

Zyte-Error-Title#

A short summary of the problem type. Written in English and readable for engineers, usually not suited for non-technical stakeholders, and not localized.

It matches the title JSON field of the error response.

Zyte-Error-Type#

A URI reference that uniquely identifies the problem type, only in the context of the provided API.

Opposed to the specification in RFC-7807, it is neither recommended to be dereferencable and point to human-readable documentation nor globally unique for the problem type.

It matches the type JSON field of the error response.

Zyte-Request-ID#

A unique identifier of the request.

When reporting an issue about the outcome of a request to our Support team, please include the value of this response header when possible.

HTTPS proxy#

Tip

The main endpoint works both for HTTP and HTTPS URLs, you do not need an HTTPS proxy interface to access HTTPS URLs.

You can use the api.zyte.com:8014 endpoint for an HTTPS proxy interface, provided your tech stack supports HTTPS proxies and you have installed our CA certificate:

curl \
    --proxy https://api.zyte.com:8014 \
    --proxy-user YOUR_API_KEY: \
    --compressed \
    https://toscrape.com
const HttpsProxyAgent = require('https-proxy-agent')
const httpsAgent = new HttpsProxyAgent.HttpsProxyAgent('https://YOUR_API_KEY:@api.zyte.com:8014')
const axiosDefaultConfig = { httpsAgent }
const axios = require('axios').create(axiosDefaultConfig)

axios
  .get('https://toscrape.com')
  .then((response) => {
    const httpResponseBody = response.data
    console.log(httpResponseBody)
  })
import requests

response = requests.get(
    "https://toscrape.com",
    proxies={
        scheme: "https://YOUR_API_KEY:@api.zyte.com:8014"
        for scheme in ("http", "https")
    },
)
http_response_body: bytes = response.content
print(http_response_body.decode())