Chrome extension

Our Chrome extension lets you scrape data from any webpage—no coding required.

Watch the video to learn more.

API documentation

The AI web scraper is also available via a single API endpoint. We refer to this as the 'Retrieve' endpoint, as it is used for retrieving data from any web page and returning structured data exactly how we described it.

Endpoint URL:

https://instantapi.ai/api/retrieve/

The Retrieve endpoint supports the POST (JSON body) method.

Parameters

The Retrieve endpoint accepts several parameters within the JSON body payload. Below is a detailed description of each parameter, including whether it's required, its purpose, type, and example usage.

Example: Get Data

import requests
import json

url = "https://instantapi.ai/api/retrieve/"
headers = {
    "Content-Type": "application/json"
}
data = {
    "webpage_url": "https://www.ebay.com/itm/175955440726",
    "api_method_name": "getItemDetails",
    "api_response_structure": json.dumps({
        "item_name": "<the item name>",
        "item_price": "<the item price>",
        "item_image": "<the absolute URL of the first item image>",
        "item_url": "<the absolute URL of the item>",
        "item_type": "<the item type>",
        "item_weight": "<the item weight>",
        "item_main_feature": "<the main feature of this item that would most appeal to its target audience>",
        "item_review_summary": "<a summary of the customer reviews received for this item>",
        "item_available_colors": "<the available colors of the item, converted to closest primary colors>",
        "item_materials": "<the materials used in the item>",
        "item_shape": "<the shape of the item>"
    }),
    "api_key": "<your API key>"
}

response = requests.post(url, headers=headers, json=data)
print(response.json())

eBay Item Data

This example demonstrates how to use the Retrieve endpoint to scrape detailed information from an eBay listing.

Try using the same api_response_structure on different eCommerce product pages, or even try modifying it by adding, removing, and updating existing parameters to suit your own needs.

NOTES:

At the time of writing, this eBay item is a live item. However, eBay items change constantly. If it does not work, feel free to try any other eBay item or any other item from any eCommerce website - it should work universally.

The value of api_response_structure must be JSON as an escaped string. In this example code, json.dumps is used to achieve that. If you are using another language, you can likely automatically convert JSON to an escape string using another in-built function.

Example provided in Python.

Example: Get Links

import requests
import json

url = "https://instantapi.ai/api/retrieve/"
headers = {
    "Content-Type": "application/json"
}
data = {
    "webpage_url": "https://www.tehrantimes.com/",
    "api_method_name": "getAllNewsArticleURLs",
    "api_response_structure": json.dumps({
        "all_news_article_urls": [
            {
                "news_article_url": "<the absolute URL of the news article>"
            }
        ]
    }),
    "link_extract": True,
    "api_key": "<your API key>"
}

response = requests.post(url, headers=headers, json=data)
print(response.json())

Links by Condition

This example shows how to use the Retrieve endpoint to scrape certain types of links you desire from a web page. This is especially useful for when creating your own AI-powered crawler, and wanting to only scrape further web pages that meet your criteria.

Try using the same api_response_structure on different news web pages, or even try modifying it by adding, removing, and updating existing parameters to suit your own needs.

NOTES:

At the time of writing, the Tehran Times website address is correct. If it does not work, feel free to try any other news website - it should work universally.

The value of api_response_structure must be JSON as an escaped string. In this example code, json.dumps is used to achieve that. If you are using another language, you can likely automatically convert JSON to an escape string using another in-built function.

Example provided in Python.

Response Handling

{
  "response": {
    "name": "John Doe",
    "email": "john.doe@example.com"
  },
  "verbose_full_html": "<html> ... </html>",
  "verbose_markdown": "--- ..."
}

The Retrieve endpoint will return a JSON object based on the specified api_response_structure. If the verbose parameter is set to true, the response will also include the full HTML content under the key verbose_full_html, and the Markdown under the key verbose_markdown.

Error Handling

{
  "error": true,
  "reason": "Missing required parameters. Please check and try again with required parameters."
}

If any required parameters are missing or an error occurs, the Retrieve endpoint will return a JSON object with an error message.

Best Practices

Descriptive Naming

Use clear and descriptive names for api_method_name to guide the AI effectively. For example, prefer getUserData over getData.

Detailed Response Structure

Clearly define the api_response_structure to ensure the AI understands your requirements. Specificity leads to more accurate responses.

Contextual Parameters

Utilize api_parameters to provide additional context, helping the AI generate more precise outputs.

Optimization Tips

Minimize Token Usage

The AI model's latency is influenced by the length of the output. Be concise in your requests to improve response time.

Use Premium Proxies Judiciously

The service defaults to the quickest scraping method. Use country-specific premium web proxies only when necessary to avoid latency.

Leveraging AI Capabilities

Creative Output Requirements

Be creative with your output requirements. The AI can handle various tasks, including summarization and sentiment analysis.

Inference and Analysis

The AI can infer information and perform analytical tasks. Specify outputs that require deeper understanding or analysis.

Limitations

Error Handling

If required parameters are missing or an error occurs, the Retrieve endpoint will return an error message. It's recommended to retry up to five times before failing, due to potential cycling in and out of premium web proxies.

AI Interpretations

While the AI is powerful, it may not always interpret requests perfectly. Providing clear, detailed instructions will yield the best results.