Scrapes and extracts structured data from any web page. Below is the code example.
### Python ### # # Install our package: pip install web-extract-data # from web_extract_data import WebExtractClient # Initialize the client with your InstantAPI.ai key # Replace %% API_KEY %% with your API key from: # https://web.instantapi.ai/#pricing-03-254921 client = WebExtractClient("%%API_KEY%%") # You can modify the URL and data fields to extract in JSON format result = client.scrape( url="https://www.amazon.com.au/MSI-PRO-MP341CQW-UltraWide-Compatible/dp/B09Y19TRQ2", fields={ "monitor_name": "< The product name of the monitor. >", "brand": "< The brand or manufacturer name. >", "display_size_in_inches": "< Numeric only. >", "resolution": "< Example format: 1920x1080. >", "panel_type": "< Type of panel. >", "refresh_rate_hz": "< Numeric only. >", "aspect_ratio": "< Example format: 16:9. >", "ports": "< A comma-delimited list of available ports (e.g., HDMI, DisplayPort, etc.). >", "features": "< Key selling points or capabilities, comma-delimited (e.g., LED, Full HD, etc.). >", "price": "< Numeric price (integer or float). >", "price_currency": "< Price currency (3 character alphabetic ISO 4217). >", "review_count": "< Total number of customer reviews, numeric only. >", "average_rating": "< Float or numeric star rating (e.g., 4.3). >", "review_summary": "< A 50 words or less summary of all the written customer feedback. >" } ) # Print the extracted data print(result) ### JavaScript ### // // Install our package: npm install web-extract-data // const { WebExtractClient } = require('web-extract-data'); // Initialize the client with your InstantAPI.ai key // Replace %% API_KEY %% with your API key from: // https://web.instantapi.ai/#pricing-03-254921 const client = new WebExtractClient("%%API_KEY%%"); // You can modify the URL and data fields to extract in JSON format client.scrape({ url: "https://www.amazon.com.au/MSI-PRO-MP341CQW-UltraWide-Compatible/dp/B09Y19TRQ2", fields: { "monitor_name": "< The product name of the monitor. >", "brand": "< The brand or manufacturer name. >", "display_size_in_inches": "< Numeric only. >", "resolution": "< Example format: 1920x1080. >", "panel_type": "< Type of panel. >", "refresh_rate_hz": "< Numeric only. >", "aspect_ratio": "< Example format: 16:9. >", "ports": "< A comma-delimited list of available ports (e.g., HDMI, DisplayPort, etc.). >", "features": "< Key selling points or capabilities, comma-delimited (e.g., LED, Full HD, etc.). >", "price": "< Numeric price (integer or float). >", "price_currency": "< Price currency (3 character alphabetic ISO 4217). >", "review_count": "< Total number of customer reviews, numeric only. >", "average_rating": "< Float or numeric star rating (e.g., 4.3). >", "review_summary": "< A 50 words or less summary of all the written customer feedback. >" } }) .then(result => { // Print the extracted data console.log(result); }) .catch(error => { console.error("Error:", error.message); }); ### HTTP ### # Initialize the client with your InstantAPI.ai key # Replace %% API_KEY %% with your API key from: # https://web.instantapi.ai/#pricing-03-254921 API_KEY="%%API_KEY%%" # API endpoint API_URL="https://instantapi.ai/api/scrape/" # You can modify the URL and data fields to extract in JSON format cat > payload.json << 'EOF' { "url": "https://www.amazon.com.au/MSI-PRO-MP341CQW-UltraWide-Compatible/dp/B09Y19TRQ2", "fields": { "monitor_name": "< The product name of the monitor. >", "brand": "< The brand or manufacturer name. >", "display_size_in_inches": "< Numeric only. >", "resolution": "< Example format: 1920x1080. >", "panel_type": "< Type of panel. >", "refresh_rate_hz": "< Numeric only. >", "aspect_ratio": "< Example format: 16:9. >", "ports": "< A comma-delimited list of available ports (e.g., HDMI, DisplayPort, etc.). >", "features": "< Key selling points or capabilities, comma-delimited (e.g., LED, Full HD, etc.). >", "price": "< Numeric price (integer or float). >", "price_currency": "< Price currency (3 character alphabetic ISO 4217). >", "review_count": "< Total number of customer reviews, numeric only. >", "average_rating": "< Float or numeric star rating (e.g., 4.3). >", "review_summary": "< A 50 words or less summary of all the written customer feedback. >" } } EOF # Make the API request and print the extracted data curl "$API_URL" \ -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $API_KEY" \ -d @payload.json \ | json_pp ### MCP ### # Initialize the client with your InstantAPI.ai key # Replace %% API_KEY %% with your API key from: # https://web.instantapi.ai/#pricing-03-254921 { "mcpServers": { "web-scraping-api-by-instantapi-ai": { "command": "npx", "args": [ "-y", "mcp-remote", "https://web-scraping-api-by-instantapi-ai.help-052.workers.dev/sse", "--header", "Authorization:${AUTH_HEADER}" ], "env": { "AUTH_HEADER": "Bearer %%API_KEY%%" } } } }
{ "scrape": < The populated JSON object that matches the structure you provided. >, "markdown": "< Markdown of the page which can be optionally saved for further analysis. >", "html": "< HTML of the page which can be optionally saved for further analysis. >" }
Scrapes and extracts links matching a description from any web page. Below is the code example.
### Python ### # # Install our package: pip install web-extract-data # from web_extract_data import WebExtractClient # Initialize the client with your InstantAPI.ai key # Replace %% API_KEY %% with your API key from: # https://web.instantapi.ai/#pricing-03-254921 client = WebExtractClient("%%API_KEY%%") # You can modify the URL and link description result = client.links( url="https://www.ikea.com/au/en/cat/quilt-cover-sets-10680/?page=3", description="individual product urls" ) # Print the extracted links print(result) ### JavaScript ### // // Install our package: npm install web-extract-data // const { WebExtractClient } = require('web-extract-data'); // Initialize the client with your InstantAPI.ai key // Replace %% API_KEY %% with your API key from: // https://web.instantapi.ai/#pricing-03-254921 const client = new WebExtractClient("%%API_KEY%%"); // You can modify the URL and link description client.links({ url: "https://www.ikea.com/au/en/cat/quilt-cover-sets-10680/?page=3", description: "individual product urls" }) .then(result => { // Print the extracted links console.log(result); }) .catch(error => { console.error("Error:", error.message); }); ### HTTP ### # Initialize the client with your InstantAPI.ai key # Replace %% API_KEY %% with your API key from: # https://web.instantapi.ai/#pricing-03-254921 API_KEY="%%API_KEY%%" # API endpoint API_URL="https://instantapi.ai/api/links/" # You can modify the URL and link description cat > payload.json << 'EOF' { "url": "https://www.ikea.com/au/en/cat/quilt-cover-sets-10680/?page=3", "description": "individual product urls" } EOF # Make the API request and print the extracted links curl "$API_URL" \ -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $API_KEY" \ -d @payload.json \ | json_pp ### MCP ### # Initialize the client with your InstantAPI.ai key # Replace %% API_KEY %% with your API key from: # https://web.instantapi.ai/#pricing-03-254921 { "mcpServers": { "web-scraping-api-by-instantapi-ai": { "command": "npx", "args": [ "-y", "mcp-remote", "https://web-scraping-api-by-instantapi-ai.help-052.workers.dev/sse", "--header", "Authorization:${AUTH_HEADER}" ], "env": { "AUTH_HEADER": "Bearer %%API_KEY%%" } } } }
{ "links": [< An array of URLs that match the description you provided. >], "markdown": "< Markdown of the page which can be optionally saved for further analysis. >", "html": "< HTML of the page which can be optionally saved for further analysis. >" }
Scrapes and extracts the 'next page' links from any web page with pagination. Below is the code example.
### Python ### # # Install our package: pip install web-extract-data # from web_extract_data import WebExtractClient # Initialize the client with your InstantAPI.ai key # Replace %% API_KEY %% with your API key from: # https://web.instantapi.ai/#pricing-03-254921 client = WebExtractClient("%%API_KEY%%") # You can modify the URL result = client.next( url="https://www.ikea.com/au/en/cat/quilt-cover-sets-10680/" ) # Print the extracted next page URLs print(result) ### JavaScript ### // // Install our package: npm install web-extract-data // const { WebExtractClient } = require('web-extract-data'); // Initialize the client with your InstantAPI.ai key // Replace %% API_KEY %% with your API key from: // https://web.instantapi.ai/#pricing-03-254921 const client = new WebExtractClient("%%API_KEY%%"); // You can modify the URL client.next({ url: "https://www.ikea.com/au/en/cat/quilt-cover-sets-10680/" }) .then(result => { // Print the extracted next page URLs console.log(result); }) .catch(error => { console.error("Error:", error.message); }); ### HTTP ### # Initialize the client with your InstantAPI.ai key # Replace %% API_KEY %% with your API key from: # https://web.instantapi.ai/#pricing-03-254921 API_KEY="%%API_KEY%%" # API endpoint API_URL="https://instantapi.ai/api/next/" # You can modify the URL cat > payload.json << 'EOF' { "url": "https://www.ikea.com/au/en/cat/quilt-cover-sets-10680/" } EOF # Make the API request and print the extracted next page URLs curl "$API_URL" \ -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $API_KEY" \ -d @payload.json \ | json_pp ### MCP ### # Initialize the client with your InstantAPI.ai key # Replace %% API_KEY %% with your API key from: # https://web.instantapi.ai/#pricing-03-254921 { "mcpServers": { "web-scraping-api-by-instantapi-ai": { "command": "npx", "args": [ "-y", "mcp-remote", "https://web-scraping-api-by-instantapi-ai.help-052.workers.dev/sse", "--header", "Authorization:${AUTH_HEADER}" ], "env": { "AUTH_HEADER": "Bearer %%API_KEY%%" } } } }
{ "next": [< An array of all matched 'next page' URLs. >], "markdown": "< Markdown of the page which can be optionally saved for further analysis. >", "html": "< HTML of the page which can be optionally saved for further analysis. >" }
Scrapes and extracts relevant URLs from Google search results pages. Below is the code example.
### Python ### # # Install our package: pip install web-extract-data # from web_extract_data import WebExtractClient # Initialize the client with your InstantAPI.ai key # Replace %% API_KEY %% with your API key from: # https://web.instantapi.ai/#pricing-03-254921 client = WebExtractClient("%%API_KEY%%") # You can modify the search query, google_domain, and page number result = client.search( query="AVID POWER 20V MAX Lithium Ion Cordless Drill Set", google_domain="www.google.com", page=1 ) # Print the extracted search result URLs print(result) ### JavaScript ### // // Install our package: npm install web-extract-data // const { WebExtractClient } = require('web-extract-data'); // Initialize the client with your InstantAPI.ai key // Replace %% API_KEY %% with your API key from: // https://web.instantapi.ai/#pricing-03-254921 const client = new WebExtractClient("%%API_KEY%%"); // You can modify the search query, google_domain, and page number client.search({ query: "AVID POWER 20V MAX Lithium Ion Cordless Drill Set", google_domain: "www.google.com", page: 1 }) .then(result => { // Print the extracted search result URLs console.log(result); }) .catch(error => { console.error("Error:", error.message); }); ### HTTP ### # Initialize the client with your InstantAPI.ai key # Replace %% API_KEY %% with your API key from: # https://web.instantapi.ai/#pricing-03-254921 API_KEY="%%API_KEY%%" # API endpoint API_URL="https://instantapi.ai/api/search/" # You can modify the search query, google_domain, and page number cat > payload.json << 'EOF' { "query": "AVID POWER 20V MAX Lithium Ion Cordless Drill Set", "google_domain": "www.google.com", "page": 1 } EOF # Make the API request and print the extracted search result URLs curl "$API_URL" \ -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $API_KEY" \ -d @payload.json \ | json_pp ### MCP ### # Initialize the client with your InstantAPI.ai key # Replace %% API_KEY %% with your API key from: # https://web.instantapi.ai/#pricing-03-254921 { "mcpServers": { "web-scraping-api-by-instantapi-ai": { "command": "npx", "args": [ "-y", "mcp-remote", "https://web-scraping-api-by-instantapi-ai.help-052.workers.dev/sse", "--header", "Authorization:${AUTH_HEADER}" ], "env": { "AUTH_HEADER": "Bearer %%API_KEY%%" } } } }
{ "search": [< An array of relevant URLs for any of the promoted and organic search results. >], "markdown": "< Markdown of the page which can be optionally saved for further analysis. >", "html": "< HTML of the page which can be optionally saved for further analysis. >" }
How to handle errors and their messages.
### Python ### # # Install our package: pip install web-extract-data # # # The package will raise exceptions if the API returns an error. # You can handle these exceptions with a try-except block: # from web_extract_data import WebExtractClient # Initialize the client with your InstantAPI.ai key # Replace %% API_KEY %% with your API key from: # https://web.instantapi.ai/#pricing-03-254921 client = WebExtractClient("%%API_KEY%%") try: result = client.scrape(url="https://www.somemadeupurl.com/", fields={"title": "< The title of the page. >"}) print(result) except Exception as e: print(f"An error occurred: {e}") ### JavaScript ### // // Install our package: npm install web-extract-data // // // The package will throw errors if the API returns an error. // You can handle these errors with a try-catch block or the Promise catch method: // const { WebExtractClient } = require('web-extract-data'); // Initialize the client with your InstantAPI.ai key // Replace %% API_KEY %% with your API key from: // https://web.instantapi.ai/#pricing-03-254921 const client = new WebExtractClient("%%API_KEY%%"); // Promise catch method client.scrape({ url: "https://www.somemadeupurl.com/", fields: { "title": "< The title of the page. >" } }) .then(result => { console.log(result); }) .catch(error => { console.error("An error occurred:", error.message); }); // Try-catch block async function getData() { try { const result = await client.scrape({ url: "https://www.somemadeupurl.com/", fields: { "title": "< The title of the page. >" } }); console.log(result); } catch (error) { console.error("An error occurred:", error.message); } } getData(); ### HTTP ### # # The API will return an error response if the API returns an error. # It will respond with a JSON payload with the following structure: # # { # "error": true, # "reason": "< The reason for the error. >" # } # # Initialize the client with your InstantAPI.ai key # Replace %% API_KEY %% with your API key from: # https://web.instantapi.ai/#pricing-03-254921 API_KEY="%%API_KEY%%" # API endpoint API_URL="https://instantapi.ai/api/scrape/" # You can modify the URL cat > payload.json << 'EOF' { "url": "https://www.somemadeupurl.com/", "fields": { "title": "< The title of the page. >" } } EOF # Make the API request with error handling response=$(curl -s -w "\n%{http_code}" "$API_URL" \ -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $API_KEY" \ -d @payload.json) # Get the status code (last line) and response body (everything else) http_code=$(echo "$response" | tail -n1) response_body=$(echo "$response" | sed '$ d') # Parse and process the response with proper JSON handling if [[ "$http_code" == "200" ]]; then # Even with HTTP 200, check if the API returned an error in the response is_error=$(echo "$response_body" | jq -r '.error // false') if [[ "$is_error" == "true" ]]; then # API returned an error with a reason error_reason=$(echo "$response_body" | jq -r '.reason') echo "An error occurred: $error_reason" else # Success response echo "$response_body" | json_pp fi else # HTTP error occurred echo "HTTP error occurred (status code: $http_code):" echo "$response_body" | json_pp fi