/scrape

Scrapes and extracts structured data from any web page. Below is the code example.

### Python ###
#
# Install our package: pip install web-extract-data
#

from web_extract_data import WebExtractClient

# Initialize the client with your InstantAPI.ai key
# Replace %% API_KEY %% with your API key from:
# https://web.instantapi.ai/#pricing-03-254921

client = WebExtractClient("%%API_KEY%%")

# You can modify the URL and data fields to extract in JSON format
result = client.scrape(
  url="https://www.amazon.com.au/MSI-PRO-MP341CQW-UltraWide-Compatible/dp/B09Y19TRQ2",
  fields={
    "monitor_name": "< The product name of the monitor. >",
    "brand": "< The brand or manufacturer name. >",
    "display_size_in_inches": "< Numeric only. >",
    "resolution": "< Example format: 1920x1080. >",
    "panel_type": "< Type of panel. >",
    "refresh_rate_hz": "< Numeric only. >",
    "aspect_ratio": "< Example format: 16:9. >",
    "ports": "< A comma-delimited list of available ports (e.g., HDMI, DisplayPort, etc.). >",
    "features": "< Key selling points or capabilities, comma-delimited (e.g., LED, Full HD, etc.). >",
    "price": "< Numeric price (integer or float). >",
    "price_currency": "< Price currency (3 character alphabetic ISO 4217). >",
    "review_count": "< Total number of customer reviews, numeric only. >",
    "average_rating": "< Float or numeric star rating (e.g., 4.3). >",
    "review_summary": "< A 50 words or less summary of all the written customer feedback. >"
  }
)

# Print the extracted data
print(result)
### JavaScript ###
//
// Install our package: npm install web-extract-data
//

const { WebExtractClient } = require('web-extract-data');

// Initialize the client with your InstantAPI.ai key
// Replace %% API_KEY %% with your API key from:
// https://web.instantapi.ai/#pricing-03-254921

const client = new WebExtractClient("%%API_KEY%%");

// You can modify the URL and data fields to extract in JSON format
client.scrape({
  url: "https://www.amazon.com.au/MSI-PRO-MP341CQW-UltraWide-Compatible/dp/B09Y19TRQ2",
  fields: {
    "monitor_name": "< The product name of the monitor. >",
    "brand": "< The brand or manufacturer name. >",
    "display_size_in_inches": "< Numeric only. >",
    "resolution": "< Example format: 1920x1080. >",
    "panel_type": "< Type of panel. >",
    "refresh_rate_hz": "< Numeric only. >",
    "aspect_ratio": "< Example format: 16:9. >",
    "ports": "< A comma-delimited list of available ports (e.g., HDMI, DisplayPort, etc.). >",
    "features": "< Key selling points or capabilities, comma-delimited (e.g., LED, Full HD, etc.). >",
    "price": "< Numeric price (integer or float). >",
    "price_currency": "< Price currency (3 character alphabetic ISO 4217). >",
    "review_count": "< Total number of customer reviews, numeric only. >",
    "average_rating": "< Float or numeric star rating (e.g., 4.3). >",
    "review_summary": "< A 50 words or less summary of all the written customer feedback. >"
  }
})
.then(result => {
  // Print the extracted data
  console.log(result);
})
.catch(error => {
  console.error("Error:", error.message);
});
### HTTP ###
# Initialize the client with your InstantAPI.ai key
# Replace %% API_KEY %% with your API key from:
# https://web.instantapi.ai/#pricing-03-254921

API_KEY="%%API_KEY%%"

# API endpoint
API_URL="https://instantapi.ai/api/scrape/"

# You can modify the URL and data fields to extract in JSON format
cat > payload.json << 'EOF'
{
  "url": "https://www.amazon.com.au/MSI-PRO-MP341CQW-UltraWide-Compatible/dp/B09Y19TRQ2",
  "fields": {
    "monitor_name": "< The product name of the monitor. >",
    "brand": "< The brand or manufacturer name. >",
    "display_size_in_inches": "< Numeric only. >",
    "resolution": "< Example format: 1920x1080. >",
    "panel_type": "< Type of panel. >",
    "refresh_rate_hz": "< Numeric only. >",
    "aspect_ratio": "< Example format: 16:9. >",
    "ports": "< A comma-delimited list of available ports (e.g., HDMI, DisplayPort, etc.). >",
    "features": "< Key selling points or capabilities, comma-delimited (e.g., LED, Full HD, etc.). >",
    "price": "< Numeric price (integer or float). >",
    "price_currency": "< Price currency (3 character alphabetic ISO 4217). >",
    "review_count": "< Total number of customer reviews, numeric only. >",
    "average_rating": "< Float or numeric star rating (e.g., 4.3). >",
    "review_summary": "< A 50 words or less summary of all the written customer feedback. >"
  }
}
EOF

# Make the API request and print the extracted data
curl "$API_URL" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d @payload.json \
  | json_pp
### MCP ###
# Initialize the client with your InstantAPI.ai key
# Replace %% API_KEY %% with your API key from:
# https://web.instantapi.ai/#pricing-03-254921

{
  "mcpServers": {
    "web-scraping-api-by-instantapi-ai": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-remote",
        "https://web-scraping-api-by-instantapi-ai.help-052.workers.dev/sse",
        "--header",
        "Authorization:${AUTH_HEADER}"
      ],
      "env": {
        "AUTH_HEADER": "Bearer %%API_KEY%%"
      }
    }
  }
}

Please provide your API key and the URL you’d like to scrape. Then, include a JSON structure describing the data you want to extract.

You don’t need to match the JSON keys to the names or specific data on the webpage—just define how you want your data returned, and our AI will figure out the rest.

Response:

{
  "scrape":  < The populated JSON object that matches the structure you provided. >,
  "markdown": "< Markdown of the page which can be optionally saved for further analysis. >",
  "html": "< HTML of the page which can be optionally saved for further analysis. >"
}

/links

Scrapes and extracts links matching a description from any web page. Below is the code example.

### Python ###
#
# Install our package: pip install web-extract-data
#

from web_extract_data import WebExtractClient

# Initialize the client with your InstantAPI.ai key
# Replace %% API_KEY %% with your API key from:
# https://web.instantapi.ai/#pricing-03-254921

client = WebExtractClient("%%API_KEY%%")

# You can modify the URL and link description
result = client.links(
  url="https://www.ikea.com/au/en/cat/quilt-cover-sets-10680/?page=3",
  description="individual product urls"
)

# Print the extracted links
print(result)
### JavaScript ###
//
// Install our package: npm install web-extract-data
//

const { WebExtractClient } = require('web-extract-data');

// Initialize the client with your InstantAPI.ai key
// Replace %% API_KEY %% with your API key from:
// https://web.instantapi.ai/#pricing-03-254921

const client = new WebExtractClient("%%API_KEY%%");

// You can modify the URL and link description
client.links({
  url: "https://www.ikea.com/au/en/cat/quilt-cover-sets-10680/?page=3",
  description: "individual product urls"
})
.then(result => {
  // Print the extracted links
  console.log(result);
})
.catch(error => {
  console.error("Error:", error.message);
});
### HTTP ###
# Initialize the client with your InstantAPI.ai key
# Replace %% API_KEY %% with your API key from:
# https://web.instantapi.ai/#pricing-03-254921

API_KEY="%%API_KEY%%"

# API endpoint
API_URL="https://instantapi.ai/api/links/"

# You can modify the URL and link description
cat > payload.json << 'EOF'
{
  "url": "https://www.ikea.com/au/en/cat/quilt-cover-sets-10680/?page=3",
  "description": "individual product urls"
}
EOF

# Make the API request and print the extracted links
curl "$API_URL" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d @payload.json \
  | json_pp
### MCP ###
# Initialize the client with your InstantAPI.ai key
# Replace %% API_KEY %% with your API key from:
# https://web.instantapi.ai/#pricing-03-254921

{
  "mcpServers": {
    "web-scraping-api-by-instantapi-ai": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-remote",
        "https://web-scraping-api-by-instantapi-ai.help-052.workers.dev/sse",
        "--header",
        "Authorization:${AUTH_HEADER}"
      ],
      "env": {
        "AUTH_HEADER": "Bearer %%API_KEY%%"
      }
    }
  }
}

Please provide your API key and the URL you’d like to scrape. Then, include a description of the type of links you want to extract.

Response:

{
  "links": [< An array of URLs that match the description you provided. >],
  "markdown": "< Markdown of the page which can be optionally saved for further analysis. >",
  "html": "< HTML of the page which can be optionally saved for further analysis. >"
}

/next

Scrapes and extracts the 'next page' links from any web page with pagination. Below is the code example.

### Python ###
#
# Install our package: pip install web-extract-data
#

from web_extract_data import WebExtractClient

# Initialize the client with your InstantAPI.ai key
# Replace %% API_KEY %% with your API key from:
# https://web.instantapi.ai/#pricing-03-254921

client = WebExtractClient("%%API_KEY%%")

# You can modify the URL
result = client.next(
  url="https://www.ikea.com/au/en/cat/quilt-cover-sets-10680/"
)

# Print the extracted next page URLs
print(result)
### JavaScript ###
//
// Install our package: npm install web-extract-data
//

const { WebExtractClient } = require('web-extract-data');

// Initialize the client with your InstantAPI.ai key
// Replace %% API_KEY %% with your API key from:
// https://web.instantapi.ai/#pricing-03-254921

const client = new WebExtractClient("%%API_KEY%%");

// You can modify the URL
client.next({
  url: "https://www.ikea.com/au/en/cat/quilt-cover-sets-10680/"
})
.then(result => {
  // Print the extracted next page URLs
  console.log(result);
})
.catch(error => {
  console.error("Error:", error.message);
});
### HTTP ###
# Initialize the client with your InstantAPI.ai key
# Replace %% API_KEY %% with your API key from:
# https://web.instantapi.ai/#pricing-03-254921

API_KEY="%%API_KEY%%"

# API endpoint
API_URL="https://instantapi.ai/api/next/"

# You can modify the URL
cat > payload.json << 'EOF'
{
  "url": "https://www.ikea.com/au/en/cat/quilt-cover-sets-10680/"
}
EOF

# Make the API request and print the extracted next page URLs
curl "$API_URL" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d @payload.json \
  | json_pp
### MCP ###
# Initialize the client with your InstantAPI.ai key
# Replace %% API_KEY %% with your API key from:
# https://web.instantapi.ai/#pricing-03-254921

{
  "mcpServers": {
    "web-scraping-api-by-instantapi-ai": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-remote",
        "https://web-scraping-api-by-instantapi-ai.help-052.workers.dev/sse",
        "--header",
        "Authorization:${AUTH_HEADER}"
      ],
      "env": {
        "AUTH_HEADER": "Bearer %%API_KEY%%"
      }
    }
  }
}

Please provide your API key and the URL you’d like to scrape.

Response:

{
  "next": [< An array of all matched 'next page' URLs. >],
  "markdown": "< Markdown of the page which can be optionally saved for further analysis. >",
  "html": "< HTML of the page which can be optionally saved for further analysis. >"
}

/search

Scrapes and extracts relevant URLs from Google search results pages. Below is the code example.

### Python ###
#
# Install our package: pip install web-extract-data
#

from web_extract_data import WebExtractClient

# Initialize the client with your InstantAPI.ai key
# Replace %% API_KEY %% with your API key from:
# https://web.instantapi.ai/#pricing-03-254921

client = WebExtractClient("%%API_KEY%%")

# You can modify the search query, google_domain, and page number
result = client.search(
  query="AVID POWER 20V MAX Lithium Ion Cordless Drill Set",
  google_domain="www.google.com",
  page=1
)

# Print the extracted search result URLs
print(result)
### JavaScript ###
//
// Install our package: npm install web-extract-data
//

const { WebExtractClient } = require('web-extract-data');

// Initialize the client with your InstantAPI.ai key
// Replace %% API_KEY %% with your API key from:
// https://web.instantapi.ai/#pricing-03-254921

const client = new WebExtractClient("%%API_KEY%%");

// You can modify the search query, google_domain, and page number
client.search({
  query: "AVID POWER 20V MAX Lithium Ion Cordless Drill Set",
  google_domain: "www.google.com",
  page: 1
})
.then(result => {
  // Print the extracted search result URLs
  console.log(result);
})
.catch(error => {
  console.error("Error:", error.message);
});
### HTTP ###
# Initialize the client with your InstantAPI.ai key
# Replace %% API_KEY %% with your API key from:
# https://web.instantapi.ai/#pricing-03-254921

API_KEY="%%API_KEY%%"

# API endpoint
API_URL="https://instantapi.ai/api/search/"

# You can modify the search query, google_domain, and page number
cat > payload.json << 'EOF'
{
  "query": "AVID POWER 20V MAX Lithium Ion Cordless Drill Set",
  "google_domain": "www.google.com",
  "page": 1
}
EOF

# Make the API request and print the extracted search result URLs
curl "$API_URL" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d @payload.json \
  | json_pp
### MCP ###
# Initialize the client with your InstantAPI.ai key
# Replace %% API_KEY %% with your API key from:
# https://web.instantapi.ai/#pricing-03-254921

{
  "mcpServers": {
    "web-scraping-api-by-instantapi-ai": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-remote",
        "https://web-scraping-api-by-instantapi-ai.help-052.workers.dev/sse",
        "--header",
        "Authorization:${AUTH_HEADER}"
      ],
      "env": {
        "AUTH_HEADER": "Bearer %%API_KEY%%"
      }
    }
  }
}

Please provide your API key and the Google search domain you’d like to scrape. Then, include the search query and page number of the Google search results.

Response:

{
  "search": [< An array of relevant URLs for any of the promoted and organic search results. >],
  "markdown": "< Markdown of the page which can be optionally saved for further analysis. >",
  "html": "< HTML of the page which can be optionally saved for further analysis. >"
}

Error Handling

How to handle errors and their messages.

### Python ###
#
# Install our package: pip install web-extract-data
#

#
# The package will raise exceptions if the API returns an error.
# You can handle these exceptions with a try-except block:
#

from web_extract_data import WebExtractClient

# Initialize the client with your InstantAPI.ai key
# Replace %% API_KEY %% with your API key from:
# https://web.instantapi.ai/#pricing-03-254921

client = WebExtractClient("%%API_KEY%%")

try:
    result = client.scrape(url="https://www.somemadeupurl.com/", fields={"title": "< The title of the page. >"})
    print(result)
except Exception as e:
    print(f"An error occurred: {e}")
### JavaScript ###
//
// Install our package: npm install web-extract-data
//

//
// The package will throw errors if the API returns an error.
// You can handle these errors with a try-catch block or the Promise catch method:
//

const { WebExtractClient } = require('web-extract-data');

// Initialize the client with your InstantAPI.ai key
// Replace %% API_KEY %% with your API key from:
// https://web.instantapi.ai/#pricing-03-254921

const client = new WebExtractClient("%%API_KEY%%");

// Promise catch method

client.scrape({
  url: "https://www.somemadeupurl.com/",
  fields: { "title": "< The title of the page. >" }
})
.then(result => {
  console.log(result);
})
.catch(error => {
  console.error("An error occurred:", error.message);
});

// Try-catch block

async function getData() {
  try {
    const result = await client.scrape({
      url: "https://www.somemadeupurl.com/",
      fields: { "title": "< The title of the page. >" }
    });
    console.log(result);
  } catch (error) {
    console.error("An error occurred:", error.message);
  }
}

getData();
### HTTP ###
#
# The API will return an error response if the API returns an error.
# It will respond with a JSON payload with the following structure:
#
# {
#   "error": true,
#   "reason": "< The reason for the error. >"
# }
#

# Initialize the client with your InstantAPI.ai key
# Replace %% API_KEY %% with your API key from:
# https://web.instantapi.ai/#pricing-03-254921

API_KEY="%%API_KEY%%"

# API endpoint
API_URL="https://instantapi.ai/api/scrape/"

# You can modify the URL
cat > payload.json << 'EOF'
{
  "url": "https://www.somemadeupurl.com/",
  "fields": { "title": "< The title of the page. >" }
}
EOF

# Make the API request with error handling
response=$(curl -s -w "\n%{http_code}" "$API_URL" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d @payload.json)

# Get the status code (last line) and response body (everything else)
http_code=$(echo "$response" | tail -n1)
response_body=$(echo "$response" | sed '$ d')

# Parse and process the response with proper JSON handling
if [[ "$http_code" == "200" ]]; then
  # Even with HTTP 200, check if the API returned an error in the response
  is_error=$(echo "$response_body" | jq -r '.error // false')
  
  if [[ "$is_error" == "true" ]]; then
    # API returned an error with a reason
    error_reason=$(echo "$response_body" | jq -r '.reason')
    echo "An error occurred: $error_reason"
  else
    # Success response
    echo "$response_body" | json_pp
  fi
else
  # HTTP error occurred
  echo "HTTP error occurred (status code: $http_code):"
  echo "$response_body" | json_pp
fi

Our client libraries automatically handle API errors and present them in language-appropriate ways.

Python raises exceptions you can catch with try-except blocks, JavaScript throws errors you can handle with Promise catches or try-catch blocks, and direct HTTP requests return structured JSON error objects with clear messages.

All approaches give you the same helpful error information to troubleshoot quickly.

Can we help?

We're on Discord—let's chat!