Back to blog

How to Scrape Amazon ASIN With Python in 2026: Step-by-Step

-
Table of contents
-

Scraping Amazon for product information can unlock valuable insights, from price tracking to market research. In other words, if you scrape data from Amazon effectively, you can build rich datasets for analysis. At the heart of Amazon's product catalog is the ASIN code: a unique identifier for each item.

In this step-by-step guide, we'll show you how to scrape Amazon ASIN numbers and related data using Python. You'll learn what an ASIN is, how to find it, and how to build an Amazon ASIN scraper that navigates Amazon’s defenses.

By the end, you'll be able to extract ASINs from search results on Amazon and gather ASIN information from Amazon for your projects, all while avoiding common pitfalls when scraping ASIN codes on Amazon programmatically.

What is an Amazon ASIN and how to find it?

ASIN stands for Amazon Standard Identification Number, and it's essentially a unique code Amazon assigns to every product in its marketplace. The ASIN is a 10-character alphanumeric code (think of it like a 10-digit ASIN number) that acts as an identifier for products.

For example, a product might have an ASIN code like B08Z5NYG12. This ASIN number is Amazon's code for that product. Every product's ASIN is unique across Amazon's catalog (except for books, which use the ISBN as the ASIN).

You can find a product’s ASIN in a few ways. The easiest is to look at the product page URL. When you open an Amazon product page, the Amazon URL typically contains the ASIN code. For instance, in the URL https://www.amazon.com/dp/B08Z5NYG12, the string after /dp/ (here B08Z5NYG12) is the product's ASIN.

You can also scroll down the page to the "Product details" section, where Amazon usually lists the product's ASIN along with other information.

Another method (useful when browsing Amazon search results) is to inspect the page's HTML. In Amazon’s search results pages, each listing is wrapped in an element that includes the ASIN. Specifically, the listing elements have a data-asin attribute containing the product's ASIN code.

That means you can often gather ASINs directly from search result HTML, which is exactly what we'll do when scraping.

Why are ASIN codes useful? For sellers and analysts, ASINs are vital. They allow you to track and compare products across Amazon's catalog. You might use ASIN codes to monitor inventory and pricing (for example, checking an item's price over time via its ASIN), to analyze Amazon data for market trends, or to ensure you're referencing the exact item in analytics.

In reseller tools and affiliate marketing, having the ASIN helps you fetch product details or reviews via Amazon's APIs or scraped pages. Essentially, ASIN data is a key that unlocks a wealth of product information.

Scraping ASINs with Python: step-by-Step

Now that we know what an ASIN is, let's dive into how to scrape Amazon ASIN data with Python. We will go through setting up the environment, understanding Amazon’s page structure, and writing a script to gather ASIN codes and product info. This section is written for intermediate Python users familiar with basic web scraping and crawling techniques.

Prerequisites

Before we start coding our Amazon scraper, make sure you have a suitable Python environment. You'll need Python 3 installed, along with some common libraries.

In this tutorial, we'll use the requests library to fetch pages and BeautifulSoup (from bs4) or lxml to parse HTML content. You can install these via pip:

pip install requests beautifulsoup4 lxml

We also plan to use Python's built-in json module for output. Having a working knowledge of HTML and how to use your browser's developer tools to inspect elements will be helpful. Lastly, for any larger project that involves extensive scraping, you should consider using proxies and be mindful of Amazon's anti-scraping measures (more on those later).

Amazon's product page structure

Understanding how Amazon pages are structured is crucial for scraping. A typical Amazon product page is delivered as static HTML (with some dynamic content loading afterward).

Key information like the product title, price, and Amazon ASIN data is present in that HTML. As mentioned, the ASIN is part of the URL, but it also appears within the page content. Often, the "Product details" section on the page explicitly lists "ASIN: <code>", which is an easy way to confirm the ASIN number.

However, when our goal is to collect many ASINs, we don't actually need to scrape each product page just to get the ASIN. It's more efficient to retrieve ASINs from the listing or search page.

Amazon’s search results pages are essentially lists of products. Each product listing on a search page typically has that data-asin attribute (or a similar one like data-csa-c-asin) containing the ASIN code. By fetching the HTML of a search results page, we can collect multiple ASINs in one go.

Scrape ASINs from search results pages

To scrape Amazon ASIN codes in bulk, the search results page is the best starting point. Let's say we want the ASINs for the first page of results for a query like "laptop". We can use the Amazon search URL for that query, for example:

https://www.amazon.com/s?k=laptop

When you load this page in a browser, you'll see a list of product results. Under the hood, the HTML for this page contains the ASIN for each product listing.

Using Python, we can send an HTTP GET request to that Amazon search URL. It's important to include a valid User-Agent header so that Amazon returns the normal page (Amazon may return a captcha or an error if the request looks suspicious or lacks a proper agent string). For example:

import requests
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9"
}
url = "https://www.amazon.com/s?k=laptop"
response = requests.get(url, headers=headers)
html_content = response.text

Now, html_content holds the HTML of the search results for "laptop". Next, we'll parse this HTML to find all the ASINs on the page. We look for any element with the data-asin attribute, since those carry the ASIN codes. Using BeautifulSoup:

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_content, "lxml")
asin_elements = soup.find_all(attrs={"data-asin": True})
asins = [elem["data-asin"] for elem in asin_elements if elem["data-asin"]]
print(asins)

This will print a list of ASINs found on the page. We filter out any empty strings (Amazon sometimes uses data-asin="" for layout containers that are not actual products). Typically, you'll get around 20 or more ASINs from one page of search results. We have successfully extracted ASINs from the search results HTML without having to click each product.

One thing to watch out for: search pages often contain sponsored products or other non-product elements. The approach above grabs any data-asin attribute. If you want to be more precise, you might refine the selector to something like

soup.select("div.s-result-item[data-asin]")

which targets only product result containers.

Scrape product data

Collecting Amazon ASIN data is useful, but often we need more than just the codes; we want data about each product. Once you have a list of ASINs (for example, from the step above), you can use them to fetch each product's page and extract details like title, price, rating, etc.

To get a product page, use the ASIN in the URL. Amazon product pages can be accessed via a URL format that includes the ASIN, such as:

https://www.amazon.com/dp/ASINCODE

Replace ASINCODE with the actual code. For instance, if we have an ASIN B08Z5NYG12, the product page URL would be:

https://www.amazon.com/dp/B08Z5NYG12

Let's take one ASIN from the earlier list and fetch its product page:

asin = asins[0]  # first ASIN from our list
product_url = f"https://www.amazon.com/dp/{asin}"
product_resp = requests.get(product_url, headers=headers)
product_html = product_resp.text

After this, product_html contains the HTML of the product detail page for that ASIN. Now we parse this HTML to extract the Amazon data we want. For example, to get the product title and price:

prod_soup = BeautifulSoup(product_html, "lxml")
title_elem = prod_soup.find(id="productTitle")
price_elem = prod_soup.find("span", {"class": "a-price-whole"})
title = title_elem.get_text(strip=True) if title_elem else "N/A"
price = price_elem.get_text(strip=True) if price_elem else "N/A"
print(title, price)

Here, we look for the element with id "productTitle" (which holds the product name on Amazon pages) and an element with class "a-price-whole" (which is part of the price display). The get_text(strip=True) calls give us the text content without extra whitespace. If these elements aren't found (maybe the product is unavailable or the structure is different), we default to "N/A".

Finally, let's store the collected data. We can compile everything into a list of dictionaries in Python and then write it to a JSON file:

products_data = []
for asin in asins:
    product_url = f"https://www.amazon.com/dp/{asin}"
    prod_resp = requests.get(product_url, headers=headers)
    prod_soup = BeautifulSoup(prod_resp.text, "lxml")
    title_elem = prod_soup.find(id="productTitle")
    price_elem = prod_soup.find("span", {"class": "a-price-whole"})
    title = title_elem.get_text(strip=True) if title_elem else "N/A"
    price = price_elem.get_text(strip=True) if price_elem else "N/A"
    products_data.append({
        "asin": asin,
        "title": title,
        "price": price
    })

# Save results to a JSON file
with open("amazon_products.json", "w") as f:
    json.dump(products_data, f, indent=4)

Our Amazon scraper goes through each ASIN, scrapes the product page, and appends a dictionary of data to the list. Then we dump the list to amazon_products.json. Each entry in that file will contain the ASIN and the extracted details for that product.

Using proxies

Scraping Amazon consistently without getting blocked can be tricky. If you try to scrape too many pages or product listings in quick succession from one IP address, Amazon will likely detect the unusual traffic and start blocking requests. They might serve you CAPTCHA challenges or HTTP 503 errors as a defense. At this point, using proxies becomes essential.

Proxies allow your requests to appear as if they're coming from different IP addresses. By rotating through a pool of IPs, you make it much harder for Amazon to identify a single source of traffic. For any serious attempt at ASIN web scraping at scale, proxies are a must.

There are several proxy options:

  • Residential proxies route your requests through real user devices (residential IPs). They tend to be more reliable for Amazon since they look like ordinary customer traffic, but they can be slower and more costly.
  • Datacenter proxies are fast and cheap, but many datacenter IP ranges are known to Amazon and might be blocked or challenged more readily.
  • ISP proxies are IP addresses provided by internet service providers for scraping purposes, combining some advantages of both residential and datacenter proxies.

In our Python code, we can specify a proxy by providing a proxies dictionary to requests.get. For example:

proxies = {
    "http": "http://USERNAME:PASSWORD@proxy-address:port",
    "https": "http://USERNAME:PASSWORD@proxy-address:port"
}
response = requests.get(url, headers=headers, proxies=proxies)

You would fill in the USERNAME, PASSWORD, proxy-address, and port according to the proxy service you're using. Services like MarsProxies provide endpoints that automatically handle rotation for you, so each request can go out via a different IP address.

Full code

Bringing it all together, below is the full Python script for our Amazon scraper that searches for a keyword, collects ASIN numbers, then fetches each product’s title and price, using headers and optional proxies:

import requests
from bs4 import BeautifulSoup
import json
import time
import random

# Setup headers and (optional) proxies
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/110.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9"
}
proxies = {
    # "http": "http://USERNAME:[email protected]:port",
    # "https": "http://USERNAME:[email protected]:port"
}

query = "laptop"
search_url = f"https://www.amazon.com/s?k={query}"
res = requests.get(search_url, headers=headers, proxies=proxies)
soup = BeautifulSoup(res.text, "lxml")

# Extract ASINs from search results
asins = []
for item in soup.select("div.s-result-item[data-asin]"):
    asin = item.get("data-asin")
    if asin:
        asins.append(asin)

print(f"Found {len(asins)} ASINs for search '{query}'")
products_data = []

# Fetch each product page
for asin in asins:
    product_url = f"https://www.amazon.com/dp/{asin}"
    prod_res = requests.get(product_url, headers=headers, proxies=proxies)
    prod_soup = BeautifulSoup(prod_res.text, "lxml")
    title_elem = prod_soup.find(id="productTitle")
    price_elem = prod_soup.find("span", {"class": "a-price-whole"})
    title = title_elem.get_text(strip=True) if title_elem else "N/A"
    price = price_elem.get_text(strip=True) if price_elem else "N/A"
    products_data.append({
        "asin": asin,
        "title": title,
        "price": price
    })
    time.sleep(random.uniform(1.0, 3.0))  # polite delay

# Save data to JSON
with open("products_data.json", "w") as f:
    json.dump(products_data, f, indent=4)

This script will perform a search on Amazon, retrieve the ASIN numbers from the results, and then scrape each product page for the title and price. It prints out how many ASIN numbers were found and then saves all the collected data into products_data.json. Remember to insert your proxy details if you have any, and you can adjust the delays and fields as needed.

Advanced ASIN scraping at scale

When you need to handle hundreds or thousands of products, basic scraping might not be enough. Large-scale ASIN scraping brings its own set of challenges and requires additional tools or strategies:

  • CAPTCHA bypass strategies

At high volumes, Amazon will likely throw CAPTCHAs at your scraper. For example, you might suddenly get a page asking you to "Type the characters" to prove you're not a bot. You can solve these automatically with third-party CAPTCHA-solving services (which typically provide an API: you send them the image and they return the text).

  • Headless browsers

Using a headless browser like Selenium (with Chrome or Firefox) or Puppeteer can help when Amazon’s anti-bot measures defeat simple scripts. A headless browser actually runs a real browser engine without a visible window. It can execute JavaScript and maintain a normal browsing flow (complete with loading images, waiting for page render, and so on).

  • Parallel requests with asyncio or multi-threading

If you have a large list of ASIN numbers to process, doing them one by one will be slow. Python's asyncio or multi-threading can help send multiple requests in parallel. Libraries like aiohttp combined with asyncio allow you to fetch many pages at once asynchronously.

  • Storing and managing data

As you gather large amounts of product data via ASIN numbers, consider using a database or structured storage. For example, you could use an SQLite or MySQL database to store the product data keyed by ASIN.

Troubleshooting common issues

Even with a solid plan, you might encounter some common issues while scraping Amazon:

Empty or login pages

Instead of the content you expect, Amazon might return an empty page or redirect you to a login/verification page. This often means your request was flagged. To troubleshoot, check if your response actually contains HTML of a login page or a message like "enable JavaScript" or "allow cookies".

HTTP 503 errors (service unavailable)

A 503 status code accompanied by an Amazon error page indicates a block. Amazon uses 503 responses to throttle or block scrapers. If you get a 503, implement a backoff: pause your scraping for a few minutes, then resume more slowly or with a new IP. It's a red flag telling you that you need to reduce the load.

CAPTCHA pages

If you suddenly get a page asking for CAPTCHA input (even if the HTTP status is 200 OK), your scraper has been challenged. The HTML of such a page is usually much smaller and contains the word "captcha" or has an image with /captcha/. In this case, you either need to solve it (which might not be straightforward via code) or discard that proxy/IP and switch to a new one, and possibly slow down.

Incomplete data or missing elements

If your parser can't find certain elements (like price or title), it could be that the page format is slightly different (e.g., some products might have price in a different span class if there's a deal, or no price at all if unavailable). Always code defensively, checking for alternate selectors or conditions (for example, look for "a-price-whole" but if not found, maybe the price is in "a-size-medium a-color-price" for some deal pages).

Ban messages or frequent redirects

Amazon might temporarily ban an IP and present a "Sorry, we just need to make sure you're not a robot" message continuously. This message often persists even if you slow down, because the IP reputation is now tainted. The solution is to switch to a fresh proxy or IP.

Challenges in scraping Amazon

Scraping Amazon is more challenging than many other websites due to its robust anti-scraping mechanisms. Here are some major challenges:

  • IP Blocking

Amazon will notice if one IP address is making too many requests or visiting pages too quickly. The result is usually an IP ban (either temporary or longer-term) for that address. The banned IP will start seeing nothing but CAPTCHA pages or errors from Amazon.

  • CAPTCHA challenges

Amazon employs CAPTCHA prompts to verify suspicious traffic. These could be image CAPTCHA or other puzzles. When scraping, encountering a CAPTCHA is a clear sign you've been detected as a bot. As discussed, dealing with CAPTCHA prompts is tough as it often requires external services or manual intervention.

  • Rate limiting

Amazon might not outright ban you, but could throttle responses if you hit it with too many requests. You might experience slower response times or partial data.

There's also a concept of "soft blocks", where Amazon serves you an outdated or simplified page when it suspects a scraper, to throw you off. Respecting rate limits by adding delays and not scraping too fast is crucial.

  • JavaScript and dynamic content

While much of Amazon's content is server-rendered, the website does use dynamic content for certain features (e.g., updating prices when you select a different product option, or loading more reviews on scroll).

If the data you need is loaded via JavaScript after the initial page load, a simple requests-based scraper won't see it. You might need to simulate those XHR (Ajax) requests if possible (by finding the API endpoint in the network calls) or use a headless browser to let the page fully render.

The legality of web scraping can be a gray area, but here's the breakdown for ASIN numbers:

Public data vs. private data

ASIN numbers and the basic product information on Amazon are public. If you can view it in your browser without logging in, it's public data. Scraping such public data (product titles, prices, ASINs, etc.) is generally legal in many jurisdictions.

What crosses the line is accessing data that is not public (like someone’s personal account details) or bypassing a login gate. As long as you're gathering data that any regular user could see on the site, you're on safer legal ground.

Amazon's terms of service

While scraping public data isn’t illegal, it can violate Amazon’s site terms. Amazon's terms of use explicitly prohibit using automated tools to scrape data from their site without permission.

If you violate those terms, Amazon could take action, such as banning your accounts or IPs. Typically, companies enforce terms through technical measures (blocks, legal cease-and-desist letters in extreme cases) rather than lawsuits, especially if the scraping is for personal or academic use.

It's worth noting that a famous legal case (hiQ Labs vs. LinkedIn) in the U.S. leaned in favor of allowing the scraping of public data. However, Amazon is a private platform and can choose to cut off access as it sees fit.

Staying compliant

If you want to avoid any trouble, consider using Amazon's official APIs (like the Product Advertising API), which are legal ways to get product data. Those come with usage policies and limits, but ensure you're compliant. If you do scrape, do it responsibly:

  • Do not try to scrape sensitive information.
  • Do not launch an unreasonable amount of traffic to Amazon (which could be seen as a denial-of-service attack).
  • Use the data in ways that respect user privacy and Amazon's rights (for instance, scraping and republishing all of Amazon's content would likely get you in hot water).

In summary, Amazon ASIN scraping is not inherently illegal. Many businesses and researchers do it. Just remember that it does violate Amazon’s terms of service, so they are within their rights to block you technologically. As long as you stick to public data and use it ethically, the legal risk stays low.

Is ASIN the same as SKU?

No. An ASIN is an Amazon-assigned identifier (the 10-character code) for a product in Amazon's catalog. In contrast, a SKU (Stock Keeping Unit) is typically an internal code that a seller or retailer uses to track their own inventory. In other words, the ASIN code is universal across Amazon for a given product, while a SKU is often unique to a particular seller or warehouse system.

Does every product on Amazon have an ASIN?

Yes, every product listing on Amazon has an ASIN. Whenever a new product is added to Amazon’s catalog, an ASIN is generated for it. If multiple sellers are selling the same item, they will all use the same ASIN for that product. For books, the ASIN is usually the same as the ISBN-10 or ISBN-13, reflecting the book's standard number.

What are the risks of scraping Amazon without proxies?

Performing Amazon ASIN scraping without proxies means all your requests come from a single IP address. Amazon's systems will quickly notice if that one IP is making numerous rapid requests, which is not typical for a regular shopper.

The risks include your IP being temporarily or permanently blocked from accessing Amazon, receiving frequent 503 errors or being served CAPTCHA challenges, and generally failing to get the data you want. Essentially, without proxies (and related countermeasures), a scraper will have a very short lifespan before getting shut out by Amazon.

Learn more
-

Related articles