Key takeaways:
- Scraping Google Trends only works reliably if you behave like a real user.
- Stability and consistency matter more than raw extraction speed.
- Google Trends data is contextual - you should always interpret it alongside the location, time range, and search intent.
Many tools claim to help you scrape Google Trends and pull usable search data with minimal effort. Once you try to do it, the gap between expectation and reality becomes clear. Some approaches, like relying on a simple tool like requests, fail almost immediately.
That’s made worse by the current state of the Google Trends API landscape. Google only recently tested and introduced an official option. But access still requires early approval, so the official Google Trends API is not something you can just turn on and use.
This article will show you how to scrape Google Trends using a method that actually works. You'll see why traditional scraping methods break on a site like this, and how to pull steady, reliable data while reducing soft and hard blocks.
Why scrape Google Trends?
Say what you will about AI changing how we search, but the reality is this: Google is still where the world goes to ask questions. Even in this era of LLMs (Large Language Models) blended with search, if millions of people are suddenly curious, worried, or fired up about something, that behavior shows up in Google search first.
That’s what makes Google Trends so valuable. It works as a kind of proxy for public curiosity, demand, and intent. The free tool built by Google analyzes all search queries going through the search engine - billions of queries worldwide.
Scraping Google Trends data can help you generate content ideas by exploring regional interests and see which areas are searching for specific topics the most. You won’t get hard numbers like, “X people searched for ‘IShowSpeed’ this month.” Instead, you get relative interest scores, where 100 is the peak popularity for a term in the time and location you selected. Everything else is scaled from there.
If you're trying to track trending topics, uncover shifts in search keywords, or analyze public interest tied to a brand, event, or cultural moment, data from Google Trends is one of the few places where that signal actually exists. You can organize the scraped data in CSV, JSON, XML, or even XML.
So now let’s talk about how this tool can help you and your business:
Market trend analysis
Take a business that sells mosquito repellent spray. Demand for this kind of product doesn’t stay steady year-round because it’s tied to weather, temperature, and mosquito activity. People simply don’t search for it in the same way during colder months.
This is where Google Trends becomes useful. You might see demand start picking up around April in places like Florida, while in other states it doesn’t rise until May. That kind of regional difference becomes clear when you track keywords over time.
By scraping Google Trends data, you can get valuable insights into keyword popularity, consumer behavior, and market trends. This allows you to plan marketing efforts around when demand actually starts to rise in each location, instead of guessing or reacting too late based on past sales alone.
And when you combine the scraped Trends data with other market research tools and methods, you get a more complete picture of consumer behavior, allowing your business to make informed marketing decisions.
Content strategy planning
Scraping Google Trends can also be extremely useful if you publish content online, whether that’s a blog, a media site, or an SEO team supporting clients.
At the end of the day, you want people to read what you publish, and that usually means staying close to what they are already searching for. Google Trends is one of the cleanest ways to check that.
Even better, you can automate this process with a custom scraper, track what’s rising, and plan topics around where public interest is moving instead of guessing.
Brand popularity tracking
Google Trends can also help you see how your brand is performing in search vs. competitors. If you just ran a campaign across TV, YouTube, or billboards, it can show if people actually started searching for your brand during that campaign window.
It’s not a perfect measure of sales, but it’s a clean way to track awareness. You can watch for keywords tied to your brand name, product name, or tagline and compare them with other brands in the same space.
If you choose to scrape Google Trends, you can automate the monitoring and catch shifts in public interest as they happen instead of checking manually and missing the spike.
How to scrape Google Trends
So, how do you scrape this website?
There are a few ways to do it with different tools and techniques. Each option has its own upsides and downsides, depending on how often you need search data, how many terms you’re tracking, and how much reliability you need.
- Manual use through the Google Trends interface
This is the baseline. You open Google Trends, type your query, adjust the region and time range, then download a CSV file using the 'Download' button on the charts.
It works well for one-off checks and quick snapshots of search trends. It’s not great when you need consistency over time, which is why it rarely scales as a business process.
- Custom scraper
If you have the skills, you can build your own scraper in Python. You can also start with generated boilerplate code and then refine it.
The main upside is control: you decide what to track and how you want to scrape data from the tool. The downside is that Google’s bot detection is strong, so your scraper has to be built with that reality in mind. If you get it working and keep it stable, you can collect a lot of search data on your own terms.
- Unofficial or reverse-engineering APIs
Developers often use reverse engineering techniques to find the private API endpoints used by Google Trends. This is how unofficial libraries like Pytrends came to be. It sounds like a smart shortcut, and sometimes it is.
The problem is stability. These endpoints are not public or guaranteed, so they can change without warning. If a library is unmaintained, as Pytrends was after its GitHub repository was archived, you end up hitting walls even when your logic is fine.
- Third-party scraping services and hosted scrapers
You can also use commercial services that provide Google Trends data through an API. These tools usually handle proxy rotation. They also handle automation and parsing, so you don’t have to.
You pay for that convenience, and it can be worth it if you care more about speed and consistency than full customization.
- Official Google Trends API
Released in July 2025, this is the cleanest option in theory because it’s the official API. In practice, access is limited because you need approval to use it.
So the Google Trends API is not something most people can just turn on today. It's best suited for developers and researchers who require stable, production-ready data without the risks associated with scraping.
We’re going to show you how to build a custom scraper using Python, the most popular language for web scraping due to its simplicity and robust ecosystem of libraries.
Building a custom Google Trends web scraper with Python
Google Trends isn’t a typical website, and if you don't take the time to understand how it works, your custom scraper setup will come crashing down after just a few test runs (or it won't work at all).
Here is why scraping Google Trends keywords requires a different approach:
- The data just doesn't exist in a form you can easily grab until you've started a search and the results have loaded. That’s why this is a web scraping job, not web crawling. Crawlers are built to discover links and index pages, not extract information from a stateful, UI-driven app. If you want the sharper distinction, see our breakdown of web crawling vs. web scraping.
- The site uses UI-driven events. The charts and tables pop up after you interact with them, not after you load the whole page.
- If there's not enough search data for a particular location and keyword, you won't see a regional breakdown.
- And then there's the issue of Google detecting automation. Expect to see some frustrating roadblocks, like CAPTCHAs, partial page loads, and the like.
- Finally, this is a bit of a web app. The page is constantly rebuilding itself, which means that fixed selectors won't cut it in the long term.
So all of this means you need to think about your Google Trends scraper in a different way. If you want to collect data you can rely on, you've got to make it behave more like a real user, clicking through the site and waiting for the data to load.
Here is how it will work:
- Step 1: Create a user session with a genuine user agent, a matching timezone, language settings, a viewport size, and geolocation that all match up with your proxy.
- Step 2: Sort out your cookies early so the session doesn't look like it's just been made or is about to be discarded.
- Step 3: Follow the natural path a real user would take on the site. To warm up the session, we typically visit a few random sites, then head to Google, run a normal search, scan the results, and then click through to Google Trends the same way a person would.
- Step 4: On Google Trends, the scraper interacts like a real person. It enters the keyword, selects the correct location and timeframe, and makes requests at a measured pace using smart backoff.
- Step 5: Be prepared for things to go wrong and deal with them neatly. The code checks whether the charts are actually loaded. When Google pushes back, it waits a bit, switches over to a new identity, and then tries again. The only human intervention required is if it hits a CAPTCHA - that's the only bit that needs doing manually.
- Step 6: The final step is extraction. It pulls data that's visible on the page using text parsing and pattern matching. We're not hitting hidden endpoints or leaning on a Google Trends API. We rely on old-school web scraping.
Install the prerequisites
Let's get started with the system-level requirements before we dive into building this scraper. If any of these are missing, you'll probably hit some avoidable errors later on.
- Google Chrome
Head over to the official Chrome website and download the latest version of Chrome, not Chromium, though. That's because undetected-chromedriver uses a real Chrome session.
- Python
Go to python.org to download the latest Python version. For us, it's Python 3.11.x. Tick the 'Add Python to PATH' box, then customize the installation and make sure pip is enabled. This will let you install the packages we use to get the data we need.
- Microsoft Visual C++
Install the Microsoft Visual C++ 2015–2022 Redistributable. Some Python wheels we're going to use for this scraper depend on it, and skipping this step can lead to installs failing even if your code is rock solid.
Python environment setup
Now we need to set up a virtual environment. This gets long fast, and isolating dependencies early helps avoid conflicts when you start scraping Google Trends more aggressively.
Open your Windows Command Prompt, then run this command:
python -m venv venv && venv\Scripts\activate
That creates a virtual environment and activates it in one line. From here on, everything you install will be contained, which makes working with Google Trends data much easier to manage.
Now install everything we need in one line:
pip install --upgrade pip && pip install selenium undetected-chromedriver requests pandas numpy pillow typing-extensions
Here’s what each package does:
- Selenium drives the browser so the scraper can interact with Google Trends like a user would
- Undetected-chromedriver launches Chrome in a way that reduces automation fingerprints and helps avoid early blocks
- Requests is used for proxy checks and lightweight HTTP calls alongside browser automation
- Pandas handles Google Trends data once it’s collected, including structuring and exporting
- Numpy supports simple numerical checks when working with screenshots
- Pillow processes screenshots so we can confirm pages rendered correctly before we scrape data
- Typing-extensions keeps type hints compatible across Python versions
You’ll also need a clean set of residential proxies. We use London-based proxies for this setup. Make sure to whitelist your IPs so you don’t have to deal with errors associated with username/password authentication.
Create your Python scraper file
Stay in Command Prompt for this part. You need an actual script file sitting in a folder you can find later. We’ll use your Desktop for now, so everything stays obvious.
Run this one-liner in CMD:
cd %USERPROFILE%\Desktop && type nul > google_trends_scraper.py
Just make sure to replace %USERPROFILE% with your actual profile name.
That command switches you into your Desktop folder and creates a blank google_trends_scraper.py file.
Write your Google Trends scraper code
Once the file is created, open your IDE and load that file, then we can start coding the scraper.
Step 1: Write your import and dependencies
With your Python file open, the first step is to write all the imports and dependencies your scraper will use:
import time
import zipfile
import tempfile
import os
import math
import json
import random
import re
import logging
import requests
import pandas as pd
from PIL import Image
import numpy as np
import io
from typing import Dict, Any
from urllib.parse import unquote, quote
from datetime import datetime
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import undetected_chromedriver as uc
Step 2: Set up logging configuration
Next, we want our scraper to report everything it's doing as it runs. That makes debugging easier when things break (and they almost always do).
Add this block below your imports:
# === Setup Logging ===
logging.basicConfig(
level=logging.INFO,
format='[%(asctime)s] %(message)s',
datefmt='%H:%M:%S'
)
Step 3: Set up global runtime controls
Now, instead of hard-coding timing, retries, and interaction behavior throughout our very long script, let's define them here and let the rest of the code refer back to them when needed.
Add this block:
# === Globals ===
HEADLESS = False
MAX_RETRIES = 5
GOOGLE_BACKOFF_STEPS = [10, 20, 30, 60, 120] # Increased backoff times
HUMAN_TYPING_SPEED = (0.08, 0.3)
RANDOM_MOUSE_MOVEMENTS = True
Step 4: Define the targets
Next, we need to define the location and keyword for what we want to scrape from Google Trends. For location, we first define the machine-facing location (GEO) and the human-facing version of the same location, then we define the keyword.
Because we’re using London proxies, we keep the target location set to London for consistency. You can swap the keyword anytime. If you want to change the location, do it after you’ve finished the full build so you can update every function that depends on the location settings.
# === Trends Target ===
GEO = "GB-LON" # London, UK
GEO_HUMAN_NAME = "London, United Kingdom"
KEYWORD = "windsor castle"
Step 5: Write your Google Trends parser class
You made it past the basics. Now it's time to get your hands dirty with the messy part of scraping Google Trends: taking the actual page that loads and turning it into data you can reuse.
What you need here is a parser that can take the raw page text and make sense of it, turning it into something you can actually work with - like search terms, locations, and timelines.
Here are the functions the parser handles:
- A function to extract all the useful bits from the raw page text - search terms, locations, timelines, and regional data
- A function to get the page text into a state where it's more consistent, so it's easier to search and work with
- A function to check that the search was successful and the page actually loaded some real results
- A function to figure out what location Google Trends is using
- A function to pull out the timeline values from the "Interest over time" chart
- A function to extract the "Interest by subregion" data when it's available
Each function handles a specific task. By keeping them together in one class, you can mess around with your parsing logic without having to touch the rest of your web scraper.
Add this parser class right beneath your targets:
class GoogleTrendsParser:
"""Parse REAL Google Trends data from actual page content"""
def parse(self, text_content: str) -> Dict[str, Any]:
"""Parse actual Google Trends page content (not hypothetical patterns)"""
try:
result = {
'search_term': KEYWORD,
'location': GEO_HUMAN_NAME,
'url': '',
'interest_over_time': {'y_axis': [], 'x_axis': []},
'interest_by_subregion': {}
}
lines = self._clean_text(text_content)
# 1. Extract what's ACTUALLY on the page
result['search_term'] = self._extract_actual_search_term(lines)
result['location'] = self._extract_actual_location(lines)
# 2. Extract chart data from page
chart_data = self._extract_chart_data(lines)
if chart_data:
result['interest_over_time'] = chart_data
# 3. Extract regional data
regional_data = self._extract_regional_data(text_content)
if regional_data:
result['interest_by_subregion'] = regional_data
else:
logging.info(" No regional data found on page - returning empty dict")
logging.info(f" Parser found: '{result['search_term']}' in {result['location']}")
if result['interest_by_subregion']:
logging.info(f" Found {len(result['interest_by_subregion'])} regions")
return result
except Exception as e:
logging.error(f"Parser error: {e}")
# Return minimal structure
return {
'search_term': KEYWORD,
'location': GEO_HUMAN_NAME,
'interest_over_time': {'y_axis': [], 'x_axis': []},
'interest_by_subregion': {}
}
def _clean_text(self, text: str) -> list:
"""Clean and split text."""
return [line.strip() for line in text.split('\n') if line.strip()]
def _extract_actual_search_term(self, lines: list) -> str:
"""
Extract the search term that's ACTUALLY displayed on Google Trends.
Not looking for "Search term:" which doesn't exist.
"""
# Google Trends shows the term prominently at the top
for i, line in enumerate(lines[:10]): # Check first 10 lines
if line and len(line) < 100: # Reasonable length for a search term
# Exclude common UI elements
exclude_words = ['Explore', 'Compare', 'Trends', 'Google', 'Interest',
'Location', 'Time', 'Download', 'Share', 'Embed', 'Home']
if not any(word in line for word in exclude_words):
# Likely the search term
return line.strip()
return KEYWORD # Fallback
def _extract_actual_location(self, lines: list) -> str:
"""Extract location from page - SPECIFICALLY FOR UK/LONDON"""
for line in lines:
line_lower = line.lower()
# Check for UK indicators first
if "london" in line_lower or "gb-lon" in line_lower:
return "London, UK"
elif "united kingdom" in line_lower or "uk" in line_lower:
return "United Kingdom"
elif "england" in line_lower:
return "England"
# Check for location dropdown text
elif "location" in line_lower:
idx = lines.index(line)
# Check next few lines for location
for i in range(idx + 1, min(idx + 4, len(lines))):
next_line = lines[i].lower()
if any(indicator in next_line for indicator in
['london', 'united kingdom', 'uk', 'england']):
return lines[i].strip()
# If not found, check the URL
return "United Kingdom"
def _extract_chart_data(self, lines: list) -> Dict[str, Any]:
"""
Extract chart data from page.
Google Trends shows: months and values like "100", "75", "50", etc.
"""
result = {'y_axis': [], 'x_axis': []}
# Look for month names (chart x-axis)
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
month_years_found = []
for line in lines:
for month in months:
if month in line:
# Extract potential date pattern
date_match = re.search(rf'({month})\s*(\d{{4}})?', line)
if date_match:
date_str = date_match.group(0)
if date_str not in month_years_found:
month_years_found.append(date_str)
result['x_axis'] = month_years_found[:12] # Up to 12 months
# Look for chart values (0-100 range)
chart_values = []
for line in lines:
# Find numbers in 0-100 range (common for Google Trends)
numbers = re.findall(r'\b(\d{1,3})\b', line)
for num in numbers:
if num.isdigit():
value = int(num)
if 0 <= value <= 100:
if value not in chart_values: # Avoid duplicates
chart_values.append(value)
# Sort and limit
chart_values.sort()
result['y_axis'] = chart_values[:20] # Up to 20 values
return result
def _extract_regional_data(self, text_content: str) -> Dict[str, int]:
"""
Extract regional interest data from Google Trends page.
Only extracts what's actually on the page - no fake data.
"""
result = {}
# UK cities/regions to look for (expanded list)
uk_regions = [
"London", "Birmingham", "Manchester", "Glasgow", "Liverpool",
"Leeds", "Sheffield", "Bristol", "Edinburgh", "Cardiff",
"Newcastle", "Leicester", "Coventry", "Nottingham", "Southampton",
"Belfast", "Aberdeen", "Cambridge", "Oxford", "Brighton",
"Portsmouth", "Plymouth", "Swansea", "York", "Bath",
"Wales", "Scotland", "Northern Ireland", "England"
]
# Split text into lines for analysis
lines = [line.strip() for line in text_content.split('\n') if line.strip()]
# Method 1: Look for "Interest by subregion" or "Interest by region" section
interest_section_start = -1
for i, line in enumerate(lines):
line_lower = line.lower()
if ('interest by subregion' in line_lower or
'interest by region' in line_lower or
'interest by city' in line_lower):
interest_section_start = i
logging.info(" Found regional interest section")
break
# If we found the section, extract data from it
if interest_section_start != -1:
# Look for regional data in the next 50 lines after the section header
section_end = min(interest_section_start + 50, len(lines))
for j in range(interest_section_start + 1, section_end):
current_line = lines[j]
current_line_lower = current_line.lower()
# Skip empty lines or lines that are too long (probably not region data)
if not current_line or len(current_line) > 50:
continue
# Check if this line contains a UK region
for region in uk_regions:
if region.lower() in current_line_lower:
# Look for a number in this line
numbers_in_line = re.findall(r'\b(\d{1,3})\b', current_line)
# If no number in this line, check the next line
if not numbers_in_line and j + 1 < len(lines):
next_line = lines[j + 1]
numbers_in_line = re.findall(r'\b(\d{1,3})\b', next_line)
# Process found numbers
for num in numbers_in_line:
if num.isdigit():
value = int(num)
# Validate it's a reasonable interest value (0-100)
if 0 <= value <= 100:
# Don't overwrite if we already have this region
if region not in result:
result[region] = value
logging.info(f" Found {region}: {value}")
break
# Method 2: If no structured section found, scan for region+number patterns
if not result:
logging.info(" Scanning for regional data patterns...")
for i, line in enumerate(lines):
if len(line) > 50: # Skip long lines (not likely region data)
continue
for region in uk_regions:
if region.lower() in line.lower():
# Extract numbers from this line
numbers = re.findall(r'\b(\d{1,3})\b', line)
valid_numbers = [int(n) for n in numbers if n.isdigit() and 0 <= int(n) <= 100]
if valid_numbers:
# Use the largest valid number
best_value = max(valid_numbers)
if region not in result: # Don't overwrite
result[region] = best_value
logging.info(f" Found {region}: {best_value}")
# Method 3: Check for specific regional data in the Related topics/queries section
# Sometimes regional data appears near related topics
if not result:
related_section_found = False
for i, line in enumerate(lines):
if 'related topics' in line.lower() or 'related queries' in line.lower():
related_section_found = True
# Check lines before this section for regional data
for j in range(max(0, i - 10), i):
check_line = lines[j]
for region in uk_regions:
if region.lower() in check_line.lower():
numbers = re.findall(r'\b(\d{1,3})\b', check_line)
valid_numbers = [int(n) for n in numbers if n.isdigit() and 0 <= int(n) <= 100]
if valid_numbers and region not in result:
result[region] = max(valid_numbers)
break
# Sort by value (descending) and limit to top regions
if result:
sorted_result = dict(sorted(result.items(), key=lambda x: x[1], reverse=True))
# Take top regions (max 15)
top_count = min(15, len(sorted_result))
return dict(list(sorted_result.items())[:top_count])
return result
Step 6: Define functions to handle cookies
Google will occasionally throw a cookie consent screen because each scrape runs in an isolated session. If you don't handle it, the scraper can stall because it doesn’t know how to make the screen go away.
The functions we're about to define will handle this in two different ways. First, they'll actively look for the cookie consent screen and click on the appropriate buttons to get rid of it.
Second, they'll pre-emptively inject Google consent cookies as soon as we know we're dealing with a Google domain, which hopefully keeps the consent screen from popping up in the first place.
Add this code to your script:
def handle_cookie_consent(driver, timeout=6):
"""
Handle Google cookie/consent UI if present.
Returns True if clicked something, False otherwise.
"""
logging.info("Checking for cookie consent dialog...")
def try_click(elem):
try:
driver.execute_script("arguments[0].scrollIntoView({block:'center'});", elem)
except:
pass
try:
elem.click()
return True
except:
try:
driver.execute_script("arguments[0].click();", elem)
return True
except:
return False
# Common Google button ids that sometimes exist
id_candidates = ["L2AGLb", "W0wltc"] # L2AGLb often = accept, W0wltc often = reject (varies)
for bid in id_candidates:
try:
elem = WebDriverWait(driver, 1).until(EC.presence_of_element_located((By.ID, bid)))
if elem.is_displayed() and elem.is_enabled():
logging.info(f" Found consent button by id: {bid} text='{elem.text.strip()}'")
if try_click(elem):
time.sleep(1.5)
return True
except:
pass
# XPath candidates (robust text matching)
xpath_candidates = [
"//button[contains(., 'Reject all')]",
"//button[contains(., 'Accept all')]",
"//button[contains(., 'I agree')]",
"//div[@role='dialog']//button[contains(., 'Reject')]",
"//div[@role='dialog']//button[contains(., 'Accept')]",
"//form//button[contains(., 'Reject')]",
"//form//button[contains(., 'Accept')]",
]
# CSS candidates
css_candidates = [
"div[role='dialog'] button",
"form button",
"button[aria-label*='Accept']",
"button[aria-label*='Reject']",
]
# 1) Try on main document
end = time.time() + timeout
while time.time() < end:
for xp in xpath_candidates:
try:
elems = driver.find_elements(By.XPATH, xp)
for e in elems:
if e.is_displayed() and e.is_enabled():
txt = (e.text or "").strip()
if txt:
logging.info(f" Found consent button (xpath): '{txt}'")
if try_click(e):
time.sleep(1.5)
return True
except:
pass
for cs in css_candidates:
try:
elems = driver.find_elements(By.CSS_SELECTOR, cs)
for e in elems:
if e.is_displayed() and e.is_enabled():
txt = (e.text or "").strip().lower()
if any(k in txt for k in ["accept", "reject", "agree"]):
logging.info(f" Found consent button (css): '{e.text.strip()}'")
if try_click(e):
time.sleep(1.5)
return True
except:
pass
time.sleep(0.25)
# 2) Try inside consent iframes
try:
frames = driver.find_elements(By.CSS_SELECTOR, "iframe[src*='consent'], iframe[name*='consent'], iframe[title*='consent']")
for fr in frames:
try:
driver.switch_to.frame(fr)
for xp in xpath_candidates:
elems = driver.find_elements(By.XPATH, xp)
for e in elems:
if e.is_displayed() and e.is_enabled():
logging.info(f" Found consent button in iframe: '{(e.text or '').strip()}'")
if try_click(e):
driver.switch_to.default_content()
time.sleep(1.5)
return True
driver.switch_to.default_content()
except:
try:
driver.switch_to.default_content()
except:
pass
except:
pass
logging.info("No cookie consent dialog found")
return False
def set_google_consent_cookies(driver):
"""
Set Google consent cookies.
Call this AFTER navigating to https://www.google.com (or /ncr).
"""
try:
logging.info("Setting Google consent cookies...")
# Only do this if we are on a google domain
if "google." not in (driver.current_url or ""):
logging.warning(f" Not on a Google domain ({driver.current_url}). Skipping cookie injection.")
return False
consent_cookie = {
"name": "CONSENT",
"value": "YES+GB.en-GB+V9+BX",
"domain": ".google.com",
"path": "/",
"secure": True,
"httpOnly": False,
}
socs_cookie = {
"name": "SOCS",
"value": "CAISHAgCEhJnd3NfMjAyNTAxMjEtMF9SQzIaAmVuIAEaBgiA_LCgBg",
"domain": ".google.com",
"path": "/",
"secure": True,
"httpOnly": False,
}
driver.add_cookie(consent_cookie)
driver.add_cookie(socs_cookie)
logging.info("Injected CONSENT and SOCS cookies")
return True
except Exception as e:
logging.error(f" Failed to set consent cookies: {e}")
return False
Step 7: Page-level data validation
At this point, we're not even bothering to navigate the page or try to get around blocks just yet. Instead, we're just setting up the groundwork for the scraper to become way more robust later on.
Our main aim here is to check out what the scraper is picking up immediately after it reaches a page - no matter how it got there. We'll get into the nitty-gritty of how the browser got there in later steps, but for now, it's all about observing and making sure the page is what we think it is.
This function acts as a middleman between the browser and the parser, essentially putting the page on ice at the exact moment we're looking at it and sending the raw info on to the next step in the logic chain.
In detail, this function:
- Screenshots the page
- Notes where the data came from, so we can go back and check if things go wrong
- Stores some evidence to help with debugging and verification when things don't quite work as planned
- Hands everything over to the parser so it can start making sense of it all
def extract_trends_data_from_current_page(driver) -> Dict[str, Any]:
"""
Extract Google Trends data from the CURRENT page.
"""
logging.info("Extracting data from Google Trends page...")
try:
# Get page text
body = driver.find_element(By.TAG_NAME, "body")
page_text = body.text
# Also get URL for context
current_url = driver.current_url
logging.info(f" Current URL: {current_url}")
logging.info(f" Page text length: {len(page_text)} characters")
# Debug: Save page text for analysis
with open("debug_page_text.txt", "w", encoding="utf-8") as f:
f.write(f"URL: {current_url}\n")
f.write(f"Time: {datetime.now()}\n")
f.write("-" * 80 + "\n")
f.write(page_text[:2000]) # First 2000 chars
# Parse with IMPROVED parser
parser = GoogleTrendsParser()
parsed_data = parser.parse(page_text)
# Always add URL
parsed_data['url'] = current_url
# Log what we got
logging.info(f" Extracted data for: '{parsed_data.get('search_term', 'Unknown')}'")
logging.info(f" Location: {parsed_data.get('location', 'Unknown')}")
regions = parsed_data.get('interest_by_subregion', {})
if regions:
logging.info(f" Regional data ({len(regions)} regions):")
for region, value in regions.items():
logging.info(f" • {region}: {value}")
else:
logging.info("No regional data found")
# Take screenshot for visual verification
timestamp = datetime.now().strftime('%H%M%S')
screenshot_name = f"extraction_{timestamp}.png"
driver.save_screenshot(screenshot_name)
logging.info(f"📸 Saved screenshot: {screenshot_name}")
return parsed_data
except Exception as e:
logging.error(f" Failed to extract page data: {e}")
# Emergency debug
try:
timestamp = datetime.now().strftime('%H%M%S')
driver.save_screenshot(f"error_extraction_{timestamp}.png")
# Save error page
with open(f"error_page_{timestamp}.txt", "w", encoding="utf-8") as f:
f.write(f"Error: {e}\n")
f.write(f"URL: {driver.current_url}\n")
f.write(driver.page_source[:5000])
except:
pass
return {
'search_term': KEYWORD,
'location': GEO_HUMAN_NAME,
'url': driver.current_url if 'driver' in locals() else '',
'interest_over_time': {'y_axis': [], 'x_axis': []},
'interest_by_subregion': {}
}
Step 8: Give your Google Trends scraper a distinct identity and behavior
Next, we want our scraper to behave like a real, persistent human user rather than some disposable robot. This section is all about making the browser look like it's coming from a real place, with a distinct identity and a bit of personality. We achieve this in three key ways:
- We start with proxies and identity profiles, which essentially give our browser a consistent home base and a unique way of identifying itself to the websites it’ll visit. This all helps to establish a consistent network presence and device fingerprint - just like a real person would have.
- Then there's cookie management. It keeps all the session history together so that repeat visits look natural, like you're coming back to a familiar place.
- And last but not least, smart backoff kicks in with some realistic timing and hesitation - ditching those rigid sleep times for something that feels more like a human being thinking before they do anything.
Add this code block to your script:
# === London Proxy Pool ===
PROXIES = [
"ultra.marsproxies.com:44443",
]
# === London Identity Pool (3 Profiles) ===
IDENTITY_POOL = [
# 1) Windows + Chrome (Desktop) - ENHANCED
{
"timezone": "Europe/London",
"user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36",
"viewport": (1366, 768),
"platform": "Win32",
"hardware_concurrency": 8,
"max_touch_points": 0,
"language": "en-GB,en;q=0.9",
# Add these for better stealth:
"webgl_vendor": "Google Inc. (Intel)", # Common for Chrome on Intel PCs
"webgl_renderer": "ANGLE (Intel, Intel(R) UHD Graphics...)", # Example
"device_memory": 8, # In GB
"do_not_track": 0, # 0 = DNT not enabled (common)
"canvas_hash": "needs_calculated_value", # You'll need to generate this
"audio_hash": "needs_calculated_value", # You'll need to generate this
},
# 2) macOS + Safari (Desktop) - ENHANCED
{
"timezone": "Europe/London",
"user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.1 Safari/605.1.15",
"viewport": (1440, 900),
"platform": "MacIntel",
"hardware_concurrency": 4, # Consider 8 for Apple Silicon
"max_touch_points": 0,
"language": "en-GB,en;q=0.9",
# Add these for better stealth:
"webgl_vendor": "Apple Inc.",
"webgl_renderer": "Apple M1 Pro", # Match your hardware choice
"device_memory": 8, # In GB
"do_not_track": 1, # Safari more commonly has DNT enabled
"canvas_hash": "needs_calculated_value",
"audio_hash": "needs_calculated_value",
},
]
# === Cookie Management ===
def save_cookies(driver, filename="cookies.json"):
"""Save cookies to file"""
try:
cookies = driver.get_cookies()
with open(filename, 'w') as f:
json.dump(cookies, f, indent=2)
logging.info(f" Saved cookies to {filename}")
return True
except Exception as e:
logging.error(f" Failed to save cookies: {e}")
return False
def load_cookies(driver, filename="cookies.json"):
"""Load cookies from file"""
try:
with open(filename, 'r') as f:
cookies = json.load(f)
# Navigate to domain first
driver.get("https://trends.google.com")
time.sleep(2)
for cookie in cookies:
try:
driver.add_cookie(cookie)
except:
pass
logging.info(f" Loaded {len(cookies)} cookies from {filename}")
return True
except Exception as e:
logging.debug(f"Could not load cookies: {e}")
return False
# === Smart Backoff Logic ===
def smart_backoff(min_wait=1, max_wait=5, reason="", driver=None):
"""
Intelligent waiting with exponential backoff and jitter
"""
# Add more randomness - sometimes wait much longer
if random.random() < 0.15: # 15% chance of longer wait
min_wait = min_wait * 1.5
max_wait = max_wait * 2
wait_time = random.uniform(min_wait, max_wait)
# Add micro-variations during wait
if wait_time > 1.5 and driver and random.random() < 0.4:
# Break long waits into smaller chunks with micro-movements
chunks = max(2, int(wait_time / 0.7))
chunk_time = wait_time / chunks
for i in range(chunks):
time.sleep(chunk_time)
# Occasionally trigger tiny mouse movements during long waits
if random.random() < 0.3:
try:
driver.execute_script("""
var evt = new MouseEvent('mousemove', {
clientX: Math.random() * window.innerWidth,
clientY: Math.random() * window.innerHeight,
bubbles: true
});
document.dispatchEvent(evt);
""")
except:
pass
else:
time.sleep(wait_time)
if reason:
logging.info(f" {reason} (waiting {wait_time:.1f}s)")
return wait_time
Step 9: Add a dash of human touch to your interactions
To get our scraper to interact with the page like a real person, we need to think about how we'd do it ourselves. That means we have to type carefully into input fields, scroll elements into view when we need them, and make sure what we typed actually landed how we intended it to.
At this stage, we need to introduce a utility function called human_type_with_smart_correction, and a helper positioning function called scroll_element_into_view, both of which make this whole process a whole lot more reliable.
Together, these helpers let our scraper scroll to the bit it needs to type in, clear out anything that's already there, type with small pauses and a few typos along the way, then check if the final input matches what we intended. If it doesn’t, the function corrects it before moving on.
Anytime our scraper needs to type something into the browser - whether it's a Google search, a Trends keyword, or a location field - it's going to use these helpers to make sure the whole interaction looks like a human did it.
Here are the functions to add:
def human_type_with_smart_correction(element, text, context="search", driver=None):
"""
Human-ish typing that ALWAYS ends with the exact desired text in the input.
If driver is provided, we can use JS as a last-resort clear.
"""
logging.info(f" Typing '{text}' ({context})")
def hard_clear():
element.click()
smart_backoff(0.15, 0.35)
# Try select-all + backspace (both control and command)
for combo in (Keys.CONTROL + "a", Keys.COMMAND + "a"):
try:
element.send_keys(combo)
smart_backoff(0.05, 0.15)
element.send_keys(Keys.BACKSPACE)
smart_backoff(0.1, 0.25)
if not (element.get_attribute("value") or "").strip():
return True
except:
pass
# Try element.clear()
try:
element.clear()
smart_backoff(0.1, 0.25)
if not (element.get_attribute("value") or "").strip():
return True
except:
pass
# JS last resort if driver is available
if driver is not None:
try:
driver.execute_script("arguments[0].value = '';", element)
smart_backoff(0.1, 0.2)
if not (element.get_attribute("value") or "").strip():
return True
except:
pass
return False
# 1) Clear reliably
if not hard_clear():
logging.warning("Could not fully clear field, proceeding anyway")
# 2) Type with small human errors, but keep it controlled
error_chance = 0.03 if context == "search" else 0.02
for ch in text:
# occasional hesitation
if random.random() < 0.05:
smart_backoff(0.08, 0.25)
if random.random() < error_chance and ch.isalpha():
err_type = random.choice(["typo", "extra"])
if err_type == "typo":
wrong = random.choice("abcdefghijklmnopqrstuvwxyz")
element.send_keys(wrong)
smart_backoff(0.06, 0.18)
element.send_keys(Keys.BACKSPACE)
smart_backoff(0.05, 0.12)
elif err_type == "extra":
extra = random.choice("aeious")
element.send_keys(extra)
smart_backoff(0.05, 0.12)
element.send_keys(Keys.BACKSPACE)
smart_backoff(0.05, 0.12)
element.send_keys(ch)
smart_backoff(0.03, 0.12)
# 3) Final verification + hard correction (guarantee)
smart_backoff(0.25, 0.6, "Final verification")
final_text = element.get_attribute("value") or ""
logging.info(f" Typed: '{final_text}'")
logging.info(f" Expected: '{text}'")
if final_text.strip() != text.strip():
logging.warning("Text mismatch. Hard-correcting...")
hard_clear()
for ch in text:
element.send_keys(ch)
smart_backoff(0.02, 0.06)
final_text2 = element.get_attribute("value") or ""
if final_text2.strip() != text.strip():
raise RuntimeError(f"Typing correction failed. Got '{final_text2}' expected '{text}'")
logging.info("Text corrected")
else:
logging.info("Text typed correctly")
smart_backoff(0.2, 0.6)
return True
# === Helper: Safe Element Scroll ===
def scroll_element_into_view(driver, element):
"""Safely scroll an element into view."""
try:
driver.execute_script("arguments[0].scrollIntoView({behavior: 'smooth', block: 'center'});", element)
time.sleep(random.uniform(0.2, 0.5))
return True
except:
return False
Step 10: Define a function for timeframe control
Let's assume your scraper has reached the Google Trends results page for the keyword you're tracking. What time period should this data actually reflect?
Google gives you a bunch of options here, like the past 24 hours, week, month, 12 months, or you can set up your own custom range.
This is where a dedicated helper function comes in handy. It grabs the timeframe you actually need straight from the interface.
By default, it'll set things to the past 12 months, but you can easily switch it to match any period that makes sense for your use case. Add this function to your script:
def set_timeframe(driver, timeframe="Past 12 months"):
"""
More flexible version of setting the timeframe on Google Trends
"""
logging.info(f" Setting timeframe: {timeframe}")
try:
# STEP 1: Open the dropdown
dropdown = WebDriverWait(driver, 15).until(
EC.element_to_be_clickable((By.CSS_SELECTOR, "md-select[aria-label^='Select time period']"))
)
logging.info(" Found timeframe dropdown")
random_mouse_movement(driver, dropdown)
smart_backoff(0.5, 1.0, "Hovering over timeframe dropdown", driver)
dropdown.click()
smart_backoff(1.5, 2.5, "Dropdown menu opening", driver)
# STEP 2: Find all potential options (md-option and _md-text)
options = driver.find_elements(By.CSS_SELECTOR, "md-option, div._md-text")
logging.info(f" Found {len(options)} dropdown items")
for option in options:
try:
if option.is_displayed() and timeframe.lower() in option.text.lower():
logging.info(f" Matching timeframe found: {option.text.strip()}")
random_mouse_movement(driver, option)
smart_backoff(0.5, 1.0, "Hovering over timeframe option", driver)
try:
option.click()
except Exception:
driver.execute_script("arguments[0].click();", option)
logging.info(f" Timeframe '{timeframe}' selected")
smart_backoff(2.5, 4.0, "Waiting for charts to reload", driver)
return True
except Exception as e:
logging.debug(f" Option check error: {e}")
continue
logging.warning(f" Timeframe option not found: {timeframe}")
return False
except Exception as e:
logging.error(f" Failed to set timeframe: {e}")
return False
Step 11: Make sure your scraper is in the right location
Even if you've got the GEO parameter right and you're using region-matched proxies, Google can still default to the wrong location.
That's where this next function comes in as a safety net. First, it looks for clues in the URL and on the page to work out where you are.
If it can't confirm the right location, it steps in, opens up the location picker, searches for the location you're after, and applies it before you try to extract any data.
This same logic can be used for any location you want to target - swap out the target location you pass into the function.
def verify_and_set_location(driver, target_location="London"):
"""
Verify we're in the right location, set if not
"""
logging.info(f" Verifying location: {target_location}")
try:
# Check current URL for geo parameter
current_url = driver.current_url
if "geo=" in current_url:
geo_match = re.search(r'geo=([A-Za-z]{2}-[A-Za-z]{2,})', current_url)
if geo_match:
logging.info(f" Location already set in URL: {geo_match.group(1)}")
return True
# Check page text for location indicators
page_text = driver.find_element(By.TAG_NAME, "body").text.lower()
location_indicators = ["london", "united kingdom", "gb", "england", "uk"]
for indicator in location_indicators:
if indicator in page_text:
logging.info(f" Location detected in page: {indicator}")
return True
# If location not detected, try to set it
logging.info(" Location not detected, attempting to set...")
# Look for location selector
location_selectors = [
"div[aria-label*='Location']",
"div[aria-label*='location']",
"button[aria-label*='Location']",
"md-select[aria-label*='Location']",
"//div[contains(text(), 'Worldwide') or contains(text(), 'Location')]",
".hierarchy-select",
"div.custom-location-picker-select",
"//div[@role='button' and contains(@aria-label, 'Location')]",
]
location_button = None
for selector in location_selectors:
try:
if selector.startswith("//"):
elements = driver.find_elements(By.XPATH, selector)
else:
elements = driver.find_elements(By.CSS_SELECTOR, selector)
for element in elements:
try:
if element.is_displayed():
text = element.text.lower()
if any(word in text for word in ['location', 'worldwide', 'country', 'region']):
location_button = element
logging.info(f" Found location selector")
break
except:
continue
if location_button:
break
except:
continue
if location_button:
# Click location selector
random_mouse_movement(driver, location_button)
smart_backoff(0.8, 1.5)
try:
location_button.click()
except:
driver.execute_script("arguments[0].click();", location_button)
logging.info("Opened location dropdown")
smart_backoff(2, 4, "Waiting for location dropdown")
# Search for Dubai
try:
location_search = driver.find_element(By.CSS_SELECTOR, "input[placeholder*='location'], input[placeholder*='Location'], input[aria-label*='search'], input[type='search']")
location_search.clear()
human_type_with_smart_correction(location_search, "London", context="location")
smart_backoff(2, 4, "Waiting for location results")
# Select London from results
london_options = [
"//div[contains(text(), 'London')]",
"//md-option[contains(., 'London')]",
"//div[contains(@class, '_md-text') and contains(text(), 'London')]",
"//div[contains(text(), 'London, United Kingdom')]",
]
for option_selector in london_options:
try:
elements = driver.find_elements(By.XPATH, option_selector)
for element in elements:
if element.is_displayed() and "london" in element.text.lower():
element.click()
logging.info("Selected London location")
smart_backoff(3, 6, "Waiting for location to apply")
return True
except:
continue
except Exception as e:
logging.warning(f" Could not search for location: {e}")
logging.warning(" Could not set location via UI, might already be correct")
return False
except Exception as e:
logging.error(f" Location verification failed: {e}")
return False
Step 12: Get the session to act like a normal person
Back to our browser session: before we even try to get any Google Trends data, we want the whole session to seem normal, like a regular person using a browser. The next bit of code adds in some human-like session behaviour to try and make things look lived-in:
- Getting the session up and running
Our pre_warm_browser() function does a few laps around neutral sites, scrolls a bit, and maybe even clicks a safe link to give the session a bit of a lived-in feel. You're free to swap out these sites for whatever fits your target location.
- Making the browser look like it's being used
Our random_window_resize() function occasionally changes the window size, so we don't look like we're just running the same old script every single time.
- Adding some idle time
Our simulate_human_scanning() and random_mouse_movement() functions try to add in some of that downtime stuff that a normal user would do, like moving the mouse around a bit or scrolling lightly - just to make the session look like it's not just sitting there.
- Wrapping things up like a human
Finally, our clean_exit() function tries to make the exit look a bit more natural, too, by pausing briefly, saving any cookies, tidying up a bit, and then closing the browser, rather than just vanishing into thin air.
Add this block to your script:
# === Pre-warm Browser ===
def pre_warm_browser(driver):
"""Pre-warm the browser with human-like activity before actual scraping"""
logging.info("Pre-warming browser with human-like activity...")
# Visit a few neutral sites first
neutral_sites = [
"https://www.theguardian.com/uk",
"https://en.wikipedia.org/wiki/London",
"https://www.reddit.com/r/london",
]
sites_to_visit = random.sample(neutral_sites, min(2, len(neutral_sites)))
for site in sites_to_visit:
try:
site_name = site.split('//')[1].split('/')[0]
logging.info(f" Visiting {site_name}")
driver.get(site)
smart_backoff(4, 8, f"Browsing {site_name}", driver)
# Do some human-like scrolling
scrolls = random.randint(2, 4)
for scroll_num in range(scrolls):
scroll_amount = random.randint(300, 700)
direction = random.choice([-1, 1])
driver.execute_script(f"window.scrollBy(0, {scroll_amount * direction});")
smart_backoff(1, 3, f"Reading content (scroll {scroll_num+1}/{scrolls})", driver)
# Occasionally click on a random safe link
if random.random() < 0.4:
try:
links = driver.find_elements(By.TAG_NAME, "a")
safe_links = []
for link in links:
try:
if link.is_displayed() and link.is_enabled():
href = link.get_attribute("href") or ""
text = link.text.lower()
# Avoid problematic links
if href and not any(x in href for x in ["javascript:", "mailto:", "#"]) and len(text) > 2:
safe_links.append(link)
except:
continue
if safe_links:
link = random.choice(safe_links[:15])
logging.info(f"🔗 Clicking on: {link.text[:50]}...")
driver.execute_script("arguments[0].click();", link)
smart_backoff(3, 7, "Following link", driver)
driver.back()
smart_backoff(2, 4, "Going back", driver)
except:
pass
except Exception as e:
logging.debug(f"Pre-warm site {site} failed: {e}")
continue
logging.info("Browser pre-warmed")
# === Random Window Resize ===
def random_window_resize(driver):
"""Randomly resize browser window to appear more human"""
if random.random() < 0.25: # 25% chance
try:
sizes = [
(1366, 768),
(1440, 900),
(1536, 864),
(1600, 900),
(1920, 1080),
(1280, 720),
]
width, height = random.choice(sizes)
driver.set_window_size(width, height)
logging.debug(f" Resized window to {width}x{height}")
smart_backoff(0.8, 2, "Window resized", driver)
except:
pass
# === Random Human Interactions ===
def simulate_human_scanning(driver, duration=2, intensity="light"):
"""
Simulate human scanning/reading behavior WITH SAFETY CHECKS
"""
logging.info(" Simulating human scanning...")
intensities = {
"very_light": {"scrolls": 0, "moves": 1, "clicks": 0},
"light": {"scrolls": 1, "moves": 1, "clicks": 0},
"medium": {"scrolls": 1, "moves": 2, "clicks": 0},
"heavy": {"scrolls": 2, "moves": 3, "clicks": 1}
}
config = intensities.get(intensity, intensities["very_light"])
# Random mouse movements - MORE NATURAL
for i in range(config["moves"]):
try:
window_size = driver.get_window_size()
start_x = random.randint(50, window_size['width'] - 50)
start_y = random.randint(50, window_size['height'] - 50)
end_x = random.randint(50, window_size['width'] - 50)
end_y = random.randint(50, window_size['height'] - 50)
# Create more natural mouse movement with curve
driver.execute_script(f"""
var startX = {start_x};
var startY = {start_y};
var endX = {end_x};
var endY = {end_y};
var steps = 15;
var currentStep = 0;
function moveStep() {{
if (currentStep <= steps) {{
var t = currentStep / steps;
// Bezier curve for natural movement
var x = startX + (endX - startX) * t;
var y = startY + (endY - startY) * t;
// Add slight curve
x += Math.sin(t * Math.PI) * 20;
y += Math.cos(t * Math.PI) * 15;
var evt = new MouseEvent('mousemove', {{
clientX: Math.round(x),
clientY: Math.round(y),
view: window,
bubbles: true
}});
document.dispatchEvent(evt);
currentStep++;
setTimeout(moveStep, 20 + Math.random() * 15);
}}
}}
moveStep();
""")
# Wait for movement to complete plus some random time
movement_time = 20 * 15 / 1000 + random.uniform(0.1, 0.4)
time.sleep(movement_time)
smart_backoff(0.2, 0.8, f"Mouse movement {i+1}/{config['moves']}", driver)
except:
pass
# Random scrolling - WITH BOUNDS CHECK
for i in range(config["scrolls"]):
try:
current_scroll = driver.execute_script("return window.pageYOffset;")
window_height = driver.execute_script("return window.innerHeight;")
document_height = driver.execute_script("return document.body.scrollHeight;")
max_scroll = document_height - window_height
direction = random.choice([-1, 1])
if direction == 1 and current_scroll < max_scroll - 100:
amount = random.randint(100, min(400, max_scroll - current_scroll - 50))
elif direction == -1 and current_scroll > 100:
amount = random.randint(100, min(400, current_scroll))
else:
direction = direction * -1
amount = random.randint(100, 300)
# Smooth scrolling
steps = random.randint(3, 6)
step_amount = amount / steps
for step in range(steps):
driver.execute_script(f"window.scrollBy(0, {step_amount * direction});")
time.sleep(random.uniform(0.05, 0.15))
smart_backoff(0.3, 1.2, f"Scrolling {i+1}/{config['scrolls']}", driver)
except:
pass
# Random clicks (on safe elements) - WITH VISIBILITY CHECK (REDUCED)
for i in range(config["clicks"]):
try:
safe_elements = driver.find_elements(By.CSS_SELECTOR, "body, div, span, p, a, button")
visible_elements = []
for element in safe_elements:
try:
if element.is_displayed() and element.is_enabled():
location = element.location
size = element.size
if location and size:
text = element.text.lower()
# Avoid clicking on important elements
if not any(word in text for word in ['submit', 'login', 'sign', 'buy', 'purchase', 'download']):
visible_elements.append(element)
except:
continue
if visible_elements:
element = random.choice(visible_elements[:5]) # Reduced from 10 to 5
scroll_element_into_view(driver, element)
smart_backoff(0.3, 0.8, "Preparing to click", driver)
try:
actions = ActionChains(driver)
actions.move_to_element(element)
actions.click()
actions.perform()
except:
driver.execute_script("arguments[0].click();", element)
smart_backoff(0.3, 0.8, f"Random click {i+1}/{config['clicks']}", driver)
except Exception as e:
pass
# === Random Mouse Movements (Updated) ===
def random_mouse_movement(driver, element=None):
if not RANDOM_MOUSE_MOVEMENTS:
return
try:
window_size = driver.get_window_size()
if element:
try:
scroll_element_into_view(driver, element)
smart_backoff(0.3, 0.7, "Scrolled to element", driver)
element_location = element.location
if element_location:
# Move to element with natural motion
actions = ActionChains(driver)
# Get element center
size = element.size
target_x = element_location['x'] + size['width'] / 2
target_y = element_location['y'] + size['height'] / 2
# Add some offset for more natural movement
offset_x = random.randint(-10, 10)
offset_y = random.randint(-10, 10)
# Move in natural curve
actions.move_to_element_with_offset(element, int(size['width']/2) + offset_x, int(size['height']/2) + offset_y)
actions.perform()
smart_backoff(0.2, 0.5, "Moved to element", driver)
return
except:
pass
# Random movement within viewport bounds
actions = ActionChains(driver)
target_x = random.randint(50, window_size['width'] - 50)
target_y = random.randint(50, window_size['height'] - 50)
steps = random.randint(3, 6)
current_x, current_y = 0, 0
for step in range(steps):
step_x = current_x + (target_x - current_x) * (step + 1) / steps
step_y = current_y + (target_y - current_y) * (step + 1) / steps
# Add natural curve to movement
curve_x = math.sin(step * 0.5) * 30
curve_y = math.cos(step * 0.5) * 25
step_x += curve_x + random.randint(-15, 15)
step_y += curve_y + random.randint(-15, 15)
# Ensure we stay within bounds
step_x = max(10, min(step_x, window_size['width'] - 10))
step_y = max(10, min(step_y, window_size['height'] - 10))
actions.move_by_offset(int(step_x - current_x), int(step_y - current_y))
current_x, current_y = step_x, step_y
time.sleep(random.uniform(0.03, 0.1))
actions.perform()
smart_backoff(0.1, 0.3, "Random mouse movement", driver)
except Exception as e:
pass
# === Clean Exit ===
def clean_exit(driver):
"""Clean exit to look more human with proxy extension cleanup"""
try:
# Scroll to top
driver.execute_script("window.scrollTo({top: 0, behavior: 'smooth'});")
smart_backoff(1, 2, "Scrolling to top", driver)
# Random mouse movement to top-left
driver.execute_script("""
var evt = new MouseEvent('mousemove', {
clientX: 50,
clientY: 50,
view: window,
bubbles: true
});
document.dispatchEvent(evt);
""")
# Wait a bit
smart_backoff(2, 4, "Final pause before exit", driver)
# Save cookies before quitting
try:
identity_hash = hash(str(driver.execute_script("return navigator.userAgent;")))
cookie_file = f"cookies_{identity_hash}.json"
save_cookies(driver, cookie_file)
except:
pass
# Clean up proxy extension files
try:
if hasattr(driver, 'proxy_extension_cleanup'):
cleanup = driver.proxy_extension_cleanup
if os.path.exists(cleanup.get('extension_path', '')):
os.remove(cleanup['extension_path'])
if os.path.exists(cleanup.get('temp_dir', '')):
os.rmdir(cleanup['temp_dir'])
logging.info(" Cleaned up proxy extension files")
except:
pass
# Close
driver.quit()
logging.info("Browser closed cleanly")
except:
try:
driver.quit()
except:
pass
Step 13: Making your scraper tough enough to bounce back
Now, we make our scraper smart and resilient.
First, we deal with any CAPTCHAs that may get in the way by setting up a function that looks for CAPTCHA elements or any language indicating that a human verification is needed on the page.
The moment it finds one, the script pauses itself, sends a clear message to a human to come sort it out, and then resumes once that hurdle is cleared. This usually only happens at the very beginning, not throughout the run.
Next up, we add a way to detect both hard and soft blocks, so the scraper knows when it should back off a bit and not just keep going. And finally, we add a safe_click_with_retry helper function that handles fragile interactions that can go wrong.
# === CAPTCHA Handling Functions ===
def detect_and_handle_captcha(driver):
"""
Detect if CAPTCHA is present and handle it appropriately.
Returns True if CAPTCHA was detected, False otherwise.
"""
try:
# Check for CAPTCHA iframes or elements
captcha_selectors = [
"iframe[src*='captcha']",
"iframe[src*='recaptcha']",
"div.g-recaptcha",
"div.captcha",
"div#captcha",
"//div[contains(text(), 'CAPTCHA')]",
"//div[contains(text(), 'captcha')]",
"//div[contains(text(), 'Robot') or contains(text(), 'robot')]",
"//img[contains(@src, 'captcha')]",
]
captcha_detected = False
# Check for CAPTCHA iframes
for selector in captcha_selectors:
try:
if selector.startswith("//"):
elements = driver.find_elements(By.XPATH, selector)
else:
elements = driver.find_elements(By.CSS_SELECTOR, selector)
if elements and len(elements) > 0:
for element in elements:
try:
if element.is_displayed():
logging.warning(f" CAPTCHA detected with selector: {selector}")
captcha_detected = True
break
except:
continue
if captcha_detected:
break
except:
continue
# Check page text for CAPTCHA indicators
if not captcha_detected:
try:
page_text = driver.find_element(By.TAG_NAME, "body").text.lower()
captcha_indicators = [
"captcha",
"verify you are human",
"i'm not a robot",
"prove you're not a robot",
"security check",
"type the text",
"enter the characters",
"select all images with",
"verify you're human"
]
for indicator in captcha_indicators:
if indicator in page_text:
logging.warning(f" CAPTCHA text detected: {indicator}")
captcha_detected = True
break
except:
pass
if captcha_detected:
# Visual CAPTCHA Alert
logging.warning("=" * 60)
logging.warning(" CAPTCHA DETECTED! ")
logging.warning("=" * 60)
logging.warning("Human intervention required!")
logging.warning("The script will wait 60 seconds for you to solve it.")
logging.warning("=" * 60)
# Try to make system notification
try:
import platform
system = platform.system()
if system == "Darwin": # macOS
import os
os.system(f"""
osascript -e 'display notification "CAPTCHA detected! Please solve it in the browser." with title "Google Trends Scraper" sound name "Submarine"'
""")
elif system == "Linux":
import os
os.system('notify-send "Google Trends Scraper" "CAPTCHA detected! Please solve it in the browser."')
elif system == "Windows":
# Windows notification
import ctypes
ctypes.windll.user32.MessageBoxW(0,
"CAPTCHA detected!\n\nPlease solve the CAPTCHA in the browser window.\n\nThe script will wait 60 seconds.",
"Google Trends Scraper",
0x40 | 0x0) # MB_ICONINFORMATION | MB_OK
except:
pass
logging.warning(" CAPTCHA detected - human intervention required!")
logging.info(" Waiting 60 seconds for manual CAPTCHA solving...")
# Play a sound to alert user (if possible)
try:
import sys
if sys.platform == "win32":
import winsound
winsound.Beep(1000, 1000)
elif sys.platform == "darwin": # macOS
import os
os.system('afplay /System/Library/Sounds/Ping.aiff')
except:
pass
# Wait for user to solve CAPTCHA
time.sleep(60)
# Check if CAPTCHA is still there
smart_backoff(5, 10, "Checking if CAPTCHA was solved", driver)
# Verify if we can proceed
try:
page_text = driver.find_element(By.TAG_NAME, "body").text.lower()
if any(indicator in page_text for indicator in ["captcha", "verify", "robot"]):
logging.error("CAPTCHA still present after wait")
return True
else:
logging.info("CAPTCHA appears to be solved")
return False
except:
logging.info("Could not verify CAPTCHA status")
return False
return False
except Exception as e:
logging.error(f" Error in CAPTCHA detection: {e}")
return False
# === Enhanced Block Detection ===
def detect_block(driver):
try:
# Wait for page to load
time.sleep(2)
html = driver.page_source.lower()
page_text = driver.find_element(By.TAG_NAME, "body").text.lower()
# STRICT block indicators only
strict_block_indicators = [
"unusual traffic from your computer network",
"automated queries",
"our systems have detected unusual traffic",
"this page appears when google automatically detects requests",
"we're sorry...",
"we've detected unusual activity",
"distressed search page",
]
for indicator in strict_block_indicators:
if indicator in html or indicator in page_text:
logging.warning(f" STRICT Block detected: {indicator}")
return True
# Check for CAPTCHA separately
if detect_and_handle_captcha(driver):
return "CAPTCHA"
# Check for rate limiting
rate_limit_indicators = [
"rate limit",
"try again in a bit",
"please try again later",
"quota exceeded",
]
for indicator in rate_limit_indicators:
if indicator in page_text:
logging.warning(f" Rate limit detected: {indicator}")
return "RATE_LIMIT"
return False
except Exception as e:
logging.error(f" Error in detect_block: {e}")
return False
# === Safe Click Helper Function ===
def safe_click_with_retry(driver, element_name="Explore button", selector_type="xpath", selector="//span[contains(text(), 'Explore')]"):
"""
Safely click an element with retry logic for stale element references
"""
max_retries = 3
for retry in range(max_retries):
try:
# Find the element fresh each time
if selector_type == "xpath":
element = WebDriverWait(driver, 5).until(
EC.element_to_be_clickable((By.XPATH, selector))
)
elif selector_type == "css":
element = WebDriverWait(driver, 5).until(
EC.element_to_be_clickable((By.CSS_SELECTOR, selector))
)
# Try to click with multiple methods
try:
# Method 1: Normal click
element.click()
logging.info(f" Clicked {element_name} (normal click)")
return True
except:
# Method 2: JavaScript click
driver.execute_script("arguments[0].click();", element)
logging.info(f" Clicked {element_name} (JavaScript)")
return True
except Exception as e:
if retry < max_retries - 1:
logging.debug(f"Retry {retry + 1}/{max_retries} for {element_name}: {str(e)[:100]}")
smart_backoff(1, 2, f"Retrying {element_name}", driver)
else:
logging.warning(f" Failed to click {element_name} after {max_retries} retries")
return False
return False
Step 14: Write a function to simulate a complete Google Trends scraping session
Now we define a single function that makes the whole scraping process feel like one long, continuous browsing session, starting at Google, moving into Google Trends, and then keyword results.
This function doesn't do the actual work. Instead, it calls out to all the other functions we've defined, decides when to run each one, and determines what to do if something goes wrong along the way.
Here's what it does, step by step:
- Ensures the browser is warmed up
It gets the session ready, resizes the window, and strolls through some neutral pages for a bit.
- Starts exactly where a human would start
Then, it opens Google, deals with the cookie consent box, types into the search bar, waits to see what comes up, then clicks the official link - all done with delays and some clever backoff logic to avoid looking like a bot.
- Handles early pushback from Google
If a CAPTCHA suddenly appears out of nowhere, it lets a human sort that out. And if Google decides to block us or rate-limits the session, it backs off and tries again, safely and without crashing or brute-forcing the page.
- Gets into Google Trends as a real person would
It deals with the Trends cookie dialog (if present), scrolls the page, verifies and sets our location, finds the search box, carefully types in the keyword, and generally tries to avoid getting caught out by the usual UI traps.
- Gets Google Trends into the right state before we start scraping
If clicking doesn't work, it falls back to just pressing Enter, makes sure the correct time frame is set, and confirms the charts are actually there before we go any further.
- Stabilizes the page before pulling out the data
It waits for the charts to load like a human would, double-checks our location is still correct, and only then pulls out the visible data from the page.
Here is the code block to add to your script:
def complete_human_journey(driver, keyword):
"""
Complete human-like journey from Google search to Google Trends data
"""
logging.info(" Starting complete human journey...")
# PRE-WARM PHASE
pre_warm_browser(driver)
# Random window resize
random_window_resize(driver)
# PHASE 1: Start at Google homepage
logging.info(" Phase 1: Starting at Google homepage")
driver.get("https://www.google.com")
smart_backoff(4, 8, "Loading Google homepage", driver)
# === CRITICAL: HANDLE COOKIE CONSENT DIALOG ===
cookie_handled = handle_cookie_consent(driver)
if cookie_handled:
logging.info("Cookie consent dialog handled")
smart_backoff(2, 4, "After cookie handling", driver)
else:
logging.info("No cookie consent dialog found or already handled")
# Human scanning of homepage
simulate_human_scanning(driver, duration=2, intensity="very_light")
# Find Google search box
logging.info(" Looking for Google search box...")
google_search_box = None
google_selectors = [
"textarea[name='q']",
"input[name='q']",
"textarea[title='Search']",
"input[title='Search']",
"textarea[aria-label='Search']",
"input[aria-label='Search']",
"textarea.gLFyf",
"input.gLFyf",
]
for selector in google_selectors:
try:
element = WebDriverWait(driver, 15).until(
EC.element_to_be_clickable((By.CSS_SELECTOR, selector))
)
if element.is_displayed():
google_search_box = element
logging.info(f" Found Google search: {selector}")
break
except:
continue
if not google_search_box:
raise Exception(" Could not find Google search box")
# Type "Google trends" in Google search
logging.info("⌨️ Typing 'Google trends' in Google search...")
random_mouse_movement(driver, google_search_box)
smart_backoff(0.5, 1.2, "Moving to search box", driver)
google_search_box.click()
smart_backoff(0.3, 0.8, "Clicked search box", driver)
human_type_with_smart_correction(google_search_box, "google trends", context="search")
# Wait for Google suggestions
smart_backoff(2, 4, "Waiting for Google suggestions", driver)
# Press Enter to search
google_search_box.send_keys(Keys.RETURN)
logging.info(" Searching for 'Google trends' on Google")
smart_backoff(3, 6, "Loading Google search results", driver)
# === ADDED: CAPTCHA CHECK RIGHT AFTER SEARCHING ===
# Check for CAPTCHA after Google search
captcha_found = detect_and_handle_captcha(driver)
if captcha_found:
logging.warning(" CAPTCHA encountered, trying alternative approach...")
# Try to go back and search again
try:
driver.back()
smart_backoff(3, 5, "Went back due to CAPTCHA", driver)
# Try a different search approach
google_search_box = driver.find_element(By.CSS_SELECTOR, "textarea[name='q'], input[name='q']")
google_search_box.clear()
smart_backoff(0.5, 1)
# Type differently
google_search_box.send_keys("Google Trends official")
smart_backoff(1, 2)
google_search_box.send_keys(Keys.RETURN)
smart_backoff(4, 7, "Retrying search with different terms", driver)
except:
pass
# PHASE 2: Scan search results
logging.info(" Phase 2: Scanning Google search results")
simulate_human_scanning(driver, duration=3, intensity="light")
# Find and click the Google Trends result
logging.info(" Looking for Google Trends link...")
# Wait for search results to load
smart_backoff(3, 5, "Waiting for search results", driver)
# Look for the official Google Trends link
trends_link = None
trends_selectors = [
"a[href*='trends.google.com']",
"a:has(h3:contains('Google Trends'))",
"//a[contains(@href, 'trends.google.com') and contains(., 'Trends')]",
"a[ping*='/url?sa=t&source=web&rct=j&url=https://trends.google.com']",
"//a[contains(@href, 'trends.google.com')]//h3[contains(text(), 'Trends')]",
]
for selector in trends_selectors:
try:
if selector.startswith("//"):
elements = driver.find_elements(By.XPATH, selector)
else:
elements = driver.find_elements(By.CSS_SELECTOR, selector)
for element in elements:
try:
if element.is_displayed():
# Get link text for verification
link_text = element.text.lower() if element.text else ""
href = element.get_attribute("href") or ""
if "trends" in link_text or "trends.google.com" in href:
trends_link = element
logging.info(f" Found Google Trends link: {link_text[:50]}...")
logging.info(f" URL: {href[:80]}...")
break
except:
continue
if trends_link:
break
except:
continue
if not trends_link:
# Fallback: Look for any link containing "trends"
try:
all_links = driver.find_elements(By.TAG_NAME, "a")
for link in all_links:
try:
if link.is_displayed():
text = link.text.lower()
href = link.get_attribute("href") or ""
if "trends" in text and "google" in text:
trends_link = link
logging.info(f" Found fallback Trends link: {text[:50]}...")
break
except:
continue
except:
pass
if not trends_link:
# Ultimate fallback: Go directly but simulate human behavior
logging.warning(" Could not find Trends link, going directly")
driver.get("https://trends.google.com")
smart_backoff(4, 7, "Loading Google Trends directly", driver)
else:
# Click the link (human-like) - WITH INTERCEPTION HANDLING
random_mouse_movement(driver, trends_link)
smart_backoff(0.8, 1.5, "Hovering over Trends link", driver)
try:
# Try normal click first
trends_link.click()
logging.info("🔗 Clicked Google Trends link (normal click)")
except Exception as e:
if "element click intercepted" in str(e):
logging.info(" Element click intercepted, trying JavaScript click...")
try:
# Try JavaScript click
driver.execute_script("arguments[0].click();", trends_link)
logging.info(" Clicked Google Trends link (JavaScript click)")
except Exception as js_e:
logging.warning(f" JavaScript click failed: {js_e}, trying scroll and click...")
try:
# Scroll element into view and try again
driver.execute_script("arguments[0].scrollIntoView({block: 'center', behavior: 'smooth'});", trends_link)
smart_backoff(0.8, 1.5, "Scrolled to link", driver)
trends_link.click()
logging.info(" Clicked Google Trends link (after scroll)")
except Exception as scroll_e:
logging.warning(f" All clicks failed, going directly: {scroll_e}")
driver.get("https://trends.google.com")
else:
# Some other error
logging.warning(f" Click failed: {e}, going directly")
driver.get("https://trends.google.com")
smart_backoff(4, 8, "Loading Google Trends website", driver)
# PHASE 3: On Google Trends website
logging.info(" Phase 3: On Google Trends website")
# === CHECK FOR COOKIE DIALOG ON GOOGLE TRENDS TOO ===
cookie_handled_trends = handle_cookie_consent(driver)
if cookie_handled_trends:
logging.info(" Cookie consent handled on Google Trends")
smart_backoff(2, 4, "After cookie handling", driver)
# Simulate human exploring the page
simulate_human_scanning(driver, duration=4, intensity="light")
# Verify and set London location
verify_and_set_location(driver, "London")
# PHASE 4: Search for keyword on Google Trends
logging.info(f" Phase 4: Searching for '{keyword}' on Google Trends")
# === DEBUG: Print all search-like inputs first ===
logging.info(" DEBUG: Looking for search inputs...")
try:
all_inputs = driver.find_elements(By.TAG_NAME, "input")
logging.info(f" Found {len(all_inputs)} total input elements")
visible_inputs = []
for i, inp in enumerate(all_inputs[:15]): # Check first 15
try:
if inp.is_displayed():
inp_id = inp.get_attribute("id") or "no-id"
inp_type = inp.get_attribute("type") or "no-type"
inp_class = inp.get_attribute("class") or "no-class"
inp_aria = inp.get_attribute("aria-label") or "no-aria"
inp_placeholder = inp.get_attribute("placeholder") or "no-placeholder"
logging.info(f" Input #{i}: id='{inp_id}', type='{inp_type}', aria='{inp_aria}'")
logging.info(f" class='{inp_class[:50]}...', placeholder='{inp_placeholder}'")
if inp_type == "text" and ("search" in inp_aria.lower() or "search" in inp_placeholder.lower()):
visible_inputs.append(inp)
logging.info(f" This looks like a search box!")
except:
pass
logging.info(f"🔍 Found {len(visible_inputs)} visible text inputs that might be search boxes")
except:
logging.warning(" Could not debug inputs")
# Find the search box on Google Trends
trends_search_box = None
trends_search_selectors = [
"input#i4", # Always the correct one for real keyword search
"input[jsname='YPqjbf'][aria-label='Search']:not([disabled])", # Skip top disabled box
"input[aria-label='Search']:not([disabled])",
"input[placeholder*='earch']:not([disabled])",
"input[type='text']:not([disabled])",
]
for selector in trends_search_selectors:
try:
logging.info(f" Trying selector: {selector}")
elements = driver.find_elements(By.CSS_SELECTOR, selector)
logging.info(f" Found {len(elements)} elements with selector: {selector}")
for element in elements:
try:
if element.is_displayed() and element.is_enabled():
# Additional verification
el_id = element.get_attribute("id") or ""
element_aria = element.get_attribute("aria-label") or ""
el_class = element.get_attribute("class") or ""
el_aria = element.get_attribute("aria-label") or ""
if "explore topics" not in (element.get_attribute("value") or "").lower():
trends_search_box = element
logging.info(f" Selected search box: id={el_id}, aria={el_aria}")
break
except Exception as e:
logging.debug(f" Element check failed: {e}")
continue
if trends_search_box:
break
except Exception as e:
logging.debug(f"Selector {selector} failed: {e}")
continue
if not trends_search_box:
# Last resort: Look for any input that looks like a search box
logging.info(" Trying last resort: scanning all inputs...")
try:
all_inputs = driver.find_elements(By.TAG_NAME, "input")
for inp in all_inputs:
try:
if inp.is_displayed() and inp.get_attribute("type") == "text":
val = inp.get_attribute("value") or ""
aria = inp.get_attribute("aria-label") or ""
if "explore topics" not in val.lower() and "search" in aria.lower():
trends_search_box = inp
logging.info(" Found fallback search box (filtered)")
break
except:
continue
except:
pass
if not trends_search_box:
raise Exception(" Could not find search box on Google Trends")
# Human-like interaction with search box
logging.info(" Moving to search box...")
random_mouse_movement(driver, trends_search_box)
smart_backoff(0.8, 1.5, "Moving to search box", driver)
trends_search_box.click()
smart_backoff(0.5, 1.2, "Clicked search box", driver)
# === ENHANCED: CLEAR THE SEARCH BOX PROPERLY ===
logging.info(" Checking and clearing search box...")
# Check if there's existing text
existing_text = trends_search_box.get_attribute("value") or trends_search_box.text
if existing_text and existing_text.strip():
logging.info(f" Found existing text in search box: '{existing_text[:50]}...'")
# Clear using multiple methods for reliability
clear_methods = [
lambda: trends_search_box.send_keys(Keys.CONTROL + 'a'),
lambda: trends_search_box.send_keys(Keys.COMMAND + 'a'),
lambda: trends_search_box.clear(),
lambda: driver.execute_script("arguments[0].value = '';", trends_search_box)
]
for method_num, clear_method in enumerate(clear_methods):
try:
clear_method()
smart_backoff(0.2, 0.4, f"Clearing method {method_num + 1}")
# Verify clear
current_text = trends_search_box.get_attribute("value") or trends_search_box.text
if not current_text or not current_text.strip():
logging.info(" Search box cleared successfully")
break
except:
continue
# Final verification
trends_search_box.click()
for _ in range(5):
trends_search_box.send_keys(Keys.BACKSPACE)
smart_backoff(0.05, 0.1)
# Send a space then backspace to ensure clear
trends_search_box.send_keys(' ')
smart_backoff(0.1, 0.2)
trends_search_box.send_keys(Keys.BACKSPACE)
smart_backoff(0.2, 0.4)
else:
logging.info(" Search box is already empty")
# Type the keyword with human-like errors and corrections
logging.info(f"⌨️ Typing '{keyword}' on Google Trends...")
human_type_with_smart_correction(trends_search_box, keyword, context="trends_search")
smart_backoff(2, 4, "Waiting for keyword suggestions", driver)
# FIX: Scroll back up to remove 'Explore topics' from view
try:
logging.info(" Scrolling back to top to avoid 'Explore topics' interference...")
driver.execute_script("window.scrollTo({ top: 0, behavior: 'smooth' });")
smart_backoff(1.5, 2.8, "Scrolling to top", driver)
except Exception as e:
logging.warning(f" Could not scroll to top: {e}")
# DEBUG: Take a screenshot to see what the page looks like
try:
driver.save_screenshot("debug_after_typing.png")
logging.info(" Saved debug screenshot: debug_after_typing.png")
except:
pass
# DEBUG: Check page text
try:
page_text = driver.find_element(By.TAG_NAME, "body").text.lower()
if "explore" in page_text:
logging.info(" 'Explore' text found somewhere on page")
else:
logging.info(" 'Explore' text NOT found on page")
# Show first 500 chars of page
logging.info(f" Page text preview: {page_text[:500]}...")
except:
pass
# === ADD THIS: Close dropdown before looking for Explore button ===
try:
# Try to close any open dropdowns by clicking on a neutral area
body = driver.find_element(By.TAG_NAME, "body")
# Click near the top-left corner to avoid clicking on important elements
actions = ActionChains(driver)
actions.move_to_element_with_offset(body, 10, 10)
actions.click()
actions.perform()
logging.info(" Clicked neutral area to close dropdowns")
smart_backoff(1, 2, "Waiting for dropdowns to close")
except:
pass
# ====== FIND AND CLICK CORRECT EXPLORE BUTTON ======
logging.info(" Looking for CORRECT Explore button (not 'Explore topics')...")
explore_clicked = False
explore_button = None
# Wait a bit for the button to appear after typing
smart_backoff(2, 4, "Waiting for Explore button to appear", driver)
# First, let's debug: show all Explore-like elements
try:
explore_elements = driver.find_elements(By.XPATH, "//*[contains(text(), 'Explore')]")
logging.info(f" DEBUG: Found {len(explore_elements)} elements with 'Explore' text")
for i, elem in enumerate(explore_elements[:5]):
try:
if elem.is_displayed():
elem_text = elem.text.strip()
elem_tag = elem.tag_name
elem_loc = elem.location
logging.info(f" Element {i}: '{elem_text}' ({elem_tag}) at {elem_loc}")
except:
continue
except:
pass
# Try specific selectors for the CORRECT Explore button
correct_selectors = [
"button:has(span.UywwFc-vQzf8d)",
"button:has(span[jsname='V67aGc'])",
"//button[.//span[text()='Explore'] and not(contains(., 'topics'))]",
"//button[contains(., 'Explore') and not(contains(., 'topics'))]",
]
for selector in correct_selectors:
try:
if selector.startswith("//"):
elements = driver.find_elements(By.XPATH, selector)
else:
elements = driver.find_elements(By.CSS_SELECTOR, selector)
for element in elements:
try:
if element.is_displayed() and element.is_enabled():
text = element.text.strip().lower()
# CRITICAL: Filter out "Explore topics" - we want JUST "Explore"
if "explore" in text and "topics" not in text:
explore_button = element
logging.info(f" Found CORRECT Explore button: '{element.text}'")
logging.info(f" Using selector: {selector}")
# Check if it's a span, find parent button
if explore_button.tag_name == 'span':
logging.info(" Found Explore span, looking for parent button...")
try:
parent_button = explore_button.find_element(By.XPATH, "./ancestor::button[1]")
if parent_button and parent_button.is_displayed():
logging.info(" Found parent button for Explore span")
explore_button = parent_button # Use the button instead of span
except:
pass # Keep using the span if can't find parent
break
except Exception as e:
logging.debug(f" Element check failed: {e}")
continue
if explore_button:
break
except Exception as e:
logging.debug(f"Selector {selector} failed: {e}")
continue
# If still not found, try scanning all buttons with better filtering
if not explore_button:
logging.warning(" Explore button not found with selectors, checking all buttons...")
try:
all_buttons = driver.find_elements(By.TAG_NAME, "button")
logging.info(f" Found {len(all_buttons)} total buttons")
for i, btn in enumerate(all_buttons[:15]): # Check first 15
try:
if btn.is_displayed() and btn.is_enabled():
btn_text = btn.text.strip().lower()
# CRITICAL FILTER: Must have "explore" but NOT "topics"
if "explore" in btn_text and "topics" not in btn_text:
explore_button = btn
logging.info(f" Found Explore button by scanning: '{btn.text}'")
break
except:
continue
except:
pass
# ====== CLICK THE BUTTON ======
if explore_button:
# Double-check it's the right button
button_text = explore_button.text.strip().lower()
if "topics" in button_text:
logging.error(" OOPS! Found WRONG 'Explore topics' button! Skipping...")
explore_button = None
else:
# Human-like click
logging.info(f" Clicking CORRECT Explore button: '{explore_button.text}'")
random_mouse_movement(driver, explore_button)
smart_backoff(0.5, 1.2, "Hovering over Explore button", driver)
try:
# Try multiple click methods
click_success = False
# Method 1: Normal click
try:
explore_button.click()
click_success = True
logging.info(" Normal click successful")
except:
# Method 2: JavaScript click
try:
driver.execute_script("arguments[0].click();", explore_button)
click_success = True
logging.info(" JavaScript click successful")
except:
# Method 3: ActionChains click
try:
ActionChains(driver).click(explore_button).perform()
click_success = True
logging.info(" ActionChains click successful")
except:
logging.warning(" All click methods failed")
if click_success:
explore_clicked = True
smart_backoff(4, 7, "Waiting after Explore click", driver)
# Check if we're on the right page
try:
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "svg, canvas, div[jsname='hiK3ld']"))
)
logging.info(" Charts detected - success!")
except:
logging.info(" Charts not immediately detected, but continuing...")
except Exception as click_error:
logging.warning(f" Error clicking Explore button: {click_error}")
explore_clicked = False
else:
logging.warning(" CORRECT Explore button not found")
# ====== FALLBACK: PRESS ENTER ======
if not explore_clicked:
logging.info(" Fallback: Pressing Enter on search box...")
try:
# Re-find search box
trends_search_box = driver.find_element(
By.CSS_SELECTOR,
"input[aria-label*='earch'], input[placeholder*='earch'], input[type='text']"
)
trends_search_box.send_keys(Keys.RETURN)
logging.info(" Pressed Enter on search box")
smart_backoff(6, 10, "Waiting for results after Enter", driver)
explore_clicked = True # Consider this success
except Exception as e:
logging.error(f" Could not press Enter: {e}")
# Set the timeframe to "Past 12 months"
set_timeframe(driver, "Past 12 months")
# Check if charts loaded
if not check_charts_loaded(driver):
logging.warning(" Charts didn't load properly, trying recovery...")
# Multiple recovery attempts
for recovery_attempt in range(2):
logging.info(f" Recovery attempt {recovery_attempt + 1}/2")
smart_backoff(10, 15, f"Extended wait attempt {recovery_attempt + 1}", driver)
# Try refreshing
driver.refresh()
smart_backoff(6, 10, "Page refreshed", driver)
# Set timeframe again
set_timeframe(driver, "Past 12 months")
if check_charts_loaded(driver):
logging.info(" Charts loaded after recovery")
break
if recovery_attempt == 0:
# Try going back and searching again
driver.back()
smart_backoff(4, 7, "Went back to search", driver)
# Find search box again and retry
try:
trends_search_box = driver.find_element(By.CSS_SELECTOR, "input[aria-label='Search'][type='text']")
trends_search_box.click()
smart_backoff(1, 2)
trends_search_box.clear()
human_type_with_smart_correction(trends_search_box, keyword, context="trends_search")
smart_backoff(2, 4)
trends_search_box.send_keys(Keys.RETURN)
smart_backoff(8, 12, "Retried search", driver)
except:
pass
# PHASE 5: Wait for results with intelligent backoff
logging.info(" Phase 5: Loading results with smart backoff...")
# Progressive loading with increasing patience
load_stages = [
(random.uniform(3, 6), "Initial page load"),
(random.uniform(2.5, 5), "Content rendering"),
(random.uniform(2, 4), "Chart elements"),
(random.uniform(1, 3), "Final touches"),
]
for wait_time, stage in load_stages:
logging.info(f" {stage}...")
time.sleep(wait_time)
# Simulate human watching the load
if random.random() < 0.3:
simulate_human_scanning(driver, duration=1, intensity="very_light")
# Final scan of the results page
logging.info(" Final scan of results page...")
simulate_human_scanning(driver, duration=3, intensity="light")
# Verify we're in London
page_source = driver.page_source.lower()
if any(indicator in page_source for indicator in ["london", "united kingdom", "uk", "gb"]):
logging.info(" Confirmed: Viewing London trends")
else:
logging.warning(" London location not clearly detected")
# Additional wait for any final loading
smart_backoff(3, 7, "Final page stabilization", driver)
logging.info(" Complete human journey finished successfully!")
# Extract data from the page
trends_data = extract_trends_data_from_current_page(driver)
return driver, trends_data # Return both driver AND data
Step 15: Give your human journey a real and reassuring identity
Up next are the functions that hammer home who your browser appears to be and where it's coming from.
How the pieces all fit together:
- assign_proxy_and_identity
This function pairs a proxy up with a matching browser identity, and it can easily handle those two different proxy formats (whitelisted and authenticated) so you can switch between them without any hassle.
- launch_stealth_browser
This one does the heavy lifting here. It takes that paired proxy + identity and fires up a genuine, undetected Chrome session, with all the right settings applied from the start.
It sets up proxy routing, window size, and language, user agent and hardware signals, all the necessary WebDriver stealth patches, timezone and geolocation settings, and even handles cookie consent before the session even gets to Google.
To make sure everything is going as expected, it verifies the proxy from inside the browser, gives the session a moment to settle, and then pauses briefly for good measure.
Add this block to your code:
def assign_proxy_and_identity():
proxy_raw = random.choice(PROXIES)
identity = random.choice(IDENTITY_POOL)
# Parse proxy - handle whitelisted case
parts = proxy_raw.split(":")
if len(parts) == 2:
# Format: host:port (whitelisted)
proxy_host_port = proxy_raw
logging.info(f" London Proxy (Whitelisted): {proxy_host_port}")
elif len(parts) == 4:
# Format: host:port:user:pass
proxy_host_port = f"{parts[0]}:{parts[1]}"
logging.info(f" London Proxy (Authenticated): {proxy_host_port}")
else:
raise ValueError(f"Invalid proxy format: {proxy_raw}")
logging.info(f" Identity: {identity['user_agent'][:50]}...")
return proxy_raw, identity # Return full proxy string
# === Launch Stealth Browser (Enhanced) ===
def launch_stealth_browser(proxy_raw: str, identity: dict):
"""
Launch browser with WHITELISTED proxy (NO authentication needed)
Format: ultra.marsproxies.com:44443:ignore:ignore
"""
try:
# === PARSE PROXY ===
parts = proxy_raw.split(":")
if len(parts) < 2:
raise ValueError(f"Invalid proxy format: {proxy_raw}")
# Extract just host and port
host, port = parts[0], parts[1]
logging.info(f" Using WHITELISTED proxy: {host}:{port}")
logging.info(" No authentication needed (IP whitelisted)")
# === CHROME OPTIONS ===
options = uc.ChromeOptions()
# === CRITICAL: PROXY SETTING FOR WHITELISTED ===
options.add_argument(f"--proxy-server=http://{host}:{port}")
# === ESSENTIAL ARGUMENTS ===
options.add_argument(f"--window-size={identity['viewport'][0]},{identity['viewport'][1]}")
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument("--disable-gpu")
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
options.add_argument(f"--lang={identity['language'].split(',')[0]}")
options.add_argument(f'--user-agent={identity["user_agent"]}')
# === PROXY-SPECIFIC ARGUMENTS ===
options.add_argument("--ignore-certificate-errors")
options.add_argument("--allow-running-insecure-content")
options.add_argument("--disable-web-security")
# === ADDITIONAL STEALTH ARGUMENTS ===
options.add_argument("--disable-popup-blocking")
options.add_argument("--disable-notifications")
options.add_argument("--disable-background-timer-throttling")
options.add_argument("--disable-backgrounding-occluded-windows")
options.add_argument("--disable-breakpad")
options.add_argument("--disable-component-update")
options.add_argument("--disable-domain-reliability")
options.add_argument("--disable-features=AudioServiceOutOfProcess")
options.add_argument("--disable-hang-monitor")
options.add_argument("--disable-ipc-flooding-protection")
options.add_argument("--disable-renderer-backgrounding")
# === HEADLESS MODE ===
if HEADLESS:
options.add_argument("--headless=new")
# === LAUNCH CHROME ===
logging.info(" Launching Chrome with whitelisted proxy...")
driver = uc.Chrome(
options=options,
version_main=144,
headless=HEADLESS,
suppress_welcome=True,
use_subprocess=True,
)
# === STORE PROXY INFO ===
driver.proxy_info = {
'host': host,
'port': port,
'type': 'whitelisted_no_auth'
}
# === COMPREHENSIVE STEALTH SCRIPTS ===
stealth_scripts = [
"""
// Overwrite the navigator.webdriver property
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
});
""",
"""
// Overwrite the navigator.plugins property
Object.defineProperty(navigator, 'plugins', {
get: () => [1, 2, 3, 4, 5]
});
""",
"""
// Overwrite the navigator.languages property
Object.defineProperty(navigator, 'languages', {
get: () => ['en-US', 'en']
});
""",
"""
// Overwrite the navigator.connection property
Object.defineProperty(navigator, 'connection', {
get: () => ({
downlink: 10,
effectiveType: '4g',
rtt: 50,
saveData: false
})
});
""",
"""
// Spoof Chrome runtime
window.chrome = {
runtime: {},
loadTimes: function() {},
csi: function() {},
app: {}
};
""",
"""
// Add hairline to canvas
const originalGetContext = HTMLCanvasElement.prototype.getContext;
HTMLCanvasElement.prototype.getContext = function() {
const context = originalGetContext.apply(this, arguments);
if (context && context.constructor.name === 'CanvasRenderingContext2D') {
const originalFillText = context.fillText;
context.fillText = function() {
if (arguments[0] === '') {
arguments[0] = ' ';
}
return originalFillText.apply(this, arguments);
};
}
return context;
};
""",
f"""
// Set user agent
Object.defineProperty(navigator, 'userAgent', {{
get: () => '{identity['user_agent']}'
}});
// Set platform
Object.defineProperty(navigator, 'platform', {{
get: () => '{identity['platform']}'
}});
// Set hardware concurrency
Object.defineProperty(navigator, 'hardwareConcurrency', {{
get: () => {identity['hardware_concurrency']}
}});
// Set max touch points
Object.defineProperty(navigator, 'maxTouchPoints', {{
get: () => {identity['max_touch_points']}
}});
// Set language
Object.defineProperty(navigator, 'language', {{
get: () => '{identity['language'].split(',')[0]}'
}});
// Set languages
Object.defineProperty(navigator, 'languages', {{
get: () => ['en-US', 'en']
}});
// Set device memory
Object.defineProperty(navigator, 'deviceMemory', {{
get: () => {identity.get('device_memory', 8)}
}});
"""
]
# Execute all stealth scripts
for script in stealth_scripts:
try:
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": script
})
except Exception as e:
logging.debug(f"Could not execute stealth script: {e}")
# === SET LONDON TIMEZONE AND GEOLOCATION ===
try:
# Set London timezone
driver.execute_cdp_cmd("Emulation.setTimezoneOverride", {
"timezoneId": identity["timezone"]
})
# Set London geolocation
driver.execute_cdp_cmd("Emulation.setGeolocationOverride", {
"latitude": 51.5074,
"longitude": -0.1278,
"accuracy": 100
})
logging.info(" Set London timezone and geolocation")
except Exception as e:
logging.warning(f"Could not set location/timezone: {e}")
# === SET USER AGENT OVERRIDE ===
try:
driver.execute_cdp_cmd("Emulation.setUserAgentOverride", {
"userAgent": identity["user_agent"],
"acceptLanguage": identity["language"],
"platform": identity["platform"]
})
except Exception as e:
logging.debug(f"Could not set user agent override: {e}")
# ====== CRITICAL: SET COOKIES BEFORE FIRST NAVIGATION ======
# This must happen BEFORE any navigation to Google domains
logging.info(" Setting Google consent cookies...")
# Navigate to about:blank first (clean state)
driver.get("https://www.google.com/ncr")
time.sleep(2)
# First try the UI dialog (most reliable)
handle_cookie_consent(driver, timeout=8)
# Then set cookies (optional but ok as reinforcement)
set_google_consent_cookies(driver)
# Refresh so cookies apply to the session
driver.refresh()
time.sleep(2)
# Set the consent cookies
set_google_consent_cookies(driver)
# === VERIFY PROXY IN BROWSER ===
logging.info(" Verifying proxy in browser...")
try:
driver.get("https://api.ipify.org?format=json")
time.sleep(3)
body = driver.find_element(By.TAG_NAME, "body").text
browser_ip_data = json.loads(body)
browser_ip = browser_ip_data.get('ip', 'Unknown')
logging.info(f" Browser IP through proxy: {browser_ip}")
# Try to get proxy IP for comparison
proxy_ip = None
try:
proxy_url = f"http://{host}:{port}"
proxies = {'http': proxy_url, 'https': proxy_url}
response = requests.get(
'https://api.ipify.org?format=json',
proxies=proxies,
timeout=15,
headers={'User-Agent': identity['user_agent']}
)
if response.status_code == 200:
proxy_ip = response.json()['ip']
# Try to get location info
try:
loc_response = requests.get(
f'https://ipapi.co/{proxy_ip}/json/',
timeout=5,
headers={'User-Agent': identity['user_agent']}
)
if loc_response.status_code == 200:
loc_data = loc_response.json()
proxy_country = loc_data.get('country_code', 'Unknown')
proxy_city = loc_data.get('city', 'Unknown')
logging.info(f" Proxy location: {proxy_city}, {proxy_country}")
driver.proxy_info['country'] = proxy_country
driver.proxy_info['city'] = proxy_city
except:
pass
except:
pass
if proxy_ip and browser_ip == proxy_ip:
logging.info(" Proxy confirmed working in browser!")
elif browser_ip != 'Unknown':
logging.info(f" Browser using proxy (IP: {browser_ip})")
else:
logging.warning(" Could not get browser IP")
# Navigate back to blank page
driver.get("about:blank")
time.sleep(1)
except Exception as e:
logging.warning(f" Could not verify proxy in browser: {e}")
# Navigate to blank page anyway
try:
driver.get("about:blank")
except:
pass
# === WAIT FOR BROWSER TO STABILIZE ===
smart_backoff(2, 4, "Browser stabilizing", driver)
logging.info(" Browser launched successfully with whitelisted proxy and pre-set cookies!")
return driver
except Exception as e:
logging.error(f" FAILED to launch browser with proxy: {e}")
logging.error(f" Proxy: {proxy_raw}")
logging.error(f" Error details: {type(e).__name__}")
# Try to provide helpful error messages
if "chrome not reachable" in str(e).lower():
logging.error(" Tip: Check if Chrome is installed and up to date")
elif "proxy" in str(e).lower():
logging.error(" Tip: Proxy might be blocked or not whitelisted correctly")
elif "timeout" in str(e).lower():
logging.error(" Tip: Proxy might be too slow or not responding")
raise Exception(f"Browser launch failed: {e}")
# === Convert Selenium Cookies to Dict ===
def get_cookies_as_dict(driver):
cookies = driver.get_cookies()
cookie_dict = {}
for c in cookies:
cookie_dict[c['name']] = c['value']
return cookie_dict
Step 16: Write your Google Trends scraping logic
Believe it or not, the 2,000 lines of code we have written so far have all been dedicated to making sure this tiny next block runs just fine. We’re not relying on DOM scraping here because Google Trends runs on Angular, and Google can change class names frequently and without notice.
So instead of relying on backend elements that can change at any time, we rely on what Google actually shows the user, having given it every reason to believe we are a real user.
From there, regex uses pattern recognition to extract the data that we need from the visible page content.
Add this short block to your script:
# === Main Scraping Function ===
def scrape_google_trends():
"""
Main function that:
1. Does the full human journey (Google → Google Trends → search gym membership)
2. Extracts data using regex from the page
3. Saves the data
"""
for attempt in range(MAX_RETRIES):
try:
logging.info(f" Attempt {attempt+1}/{MAX_RETRIES}")
# Assign proxy and identity
proxy_raw, identity = assign_proxy_and_identity()
# Launch browser
smart_backoff(2, 5, "Starting browser")
driver = launch_stealth_browser(proxy_raw, identity)
logging.info(" Stealth browser launched")
# Try to load previous cookies if they exist
try:
identity_hash = hash(str(identity))
cookie_file = f"cookies_{identity_hash}.json"
load_cookies(driver, cookie_file)
except:
pass
# Complete human journey - NOW RETURNS DATA TOO
driver, trends_data = complete_human_journey(driver, KEYWORD)
# Check for blocks - with CAPTCHA-specific handling
block_status = detect_block(driver)
if block_status:
if block_status == "CAPTCHA":
logging.error(" CAPTCHA encountered, cleaning up and trying new identity...")
else:
logging.error(" Block detected, cleaning up...")
# Try to save screenshot for debugging
try:
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
block_screenshot = f"blocked_{timestamp}.png"
driver.save_screenshot(block_screenshot)
logging.info(f" Saved block screenshot: {block_screenshot}")
except:
pass
clean_exit(driver)
# Different backoff for CAPTCHA vs other blocks
if block_status == "CAPTCHA":
# Longer backoff for CAPTCHA
captcha_backoff = 120 + random.uniform(0, 60) # 2-3 minutes
logging.info(f" CAPTCHA encountered, waiting {captcha_backoff:.1f}s...")
time.sleep(captcha_backoff)
else:
# Normal backoff for other blocks
if attempt < len(GOOGLE_BACKOFF_STEPS):
backoff_time = GOOGLE_BACKOFF_STEPS[attempt]
else:
backoff_time = GOOGLE_BACKOFF_STEPS[-1]
jitter = backoff_time * random.uniform(-0.2, 0.2)
total_wait = max(10, backoff_time + jitter)
logging.info(f" Backing off for {total_wait:.1f}s...")
time.sleep(total_wait)
continue # Try again with new attempt
# ====== REGEX DATA EXTRACTION ======
# (Data already extracted in complete_human_journey, but verify)
if not trends_data or not trends_data.get('interest_by_subregion'):
logging.warning(" Could not extract data from page, trying again...")
trends_data = extract_trends_data_from_current_page(driver)
# Take screenshot of the results
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
screenshot_path = f"trends_{KEYWORD.replace(' ', '_')}_london_{timestamp}.png"
driver.save_screenshot(screenshot_path)
logging.info(f" Saved screenshot: {screenshot_path}")
# Save cookies for next time
try:
identity_hash = hash(str(identity))
cookie_file = f"cookies_{identity_hash}.json"
save_cookies(driver, cookie_file)
except:
pass
# Clean exit
clean_exit(driver)
# ====== SAVE THE DATA ======
if trends_data:
# Create enhanced data structure
final_data = {
"extraction_info": {
"keyword": KEYWORD,
"human_journey": "Google → Google Trends → Search",
"location_target": GEO_HUMAN_NAME,
"extracted_at": datetime.now().isoformat(),
"method": "regex_text_parsing",
"screenshot": screenshot_path
},
"google_trends_data": trends_data,
"summary": {
"subregions_count": len(trends_data.get('interest_by_subregion', {})),
"timeline_points": len(trends_data.get('interest_over_time', {}).get('x_axis', [])),
"regional_data": trends_data.get('interest_by_subregion', {})
}
}
# Save JSON
json_filename = f"trends_{KEYWORD.replace(' ', '_')}_london_{timestamp}.json"
with open(json_filename, 'w', encoding='utf-8') as f:
json.dump(final_data, f, indent=2, ensure_ascii=False)
logging.info(f" Saved data to: {json_filename}")
# Print summary
print("\n" + "="*60)
print(" HUMAN JOURNEY COMPLETE - DATA EXTRACTED")
print("="*60)
print(f"Journey: Google Search → Google Trends → '{KEYWORD}'")
print(f"Location: {GEO_HUMAN_NAME}")
regions = trends_data.get('interest_by_subregion', {})
if regions:
print(f"\n REGIONAL INTEREST DATA:")
for region, value in regions.items():
print(f" {region}: {value}")
dates = trends_data.get('interest_over_time', {}).get('x_axis', [])
if dates:
print(f"\n TIMELE: {dates[0]} to {dates[-1]}")
print(f"\n Files saved:")
print(f" • Data (JSON): {json_filename}")
print(f" • Screenshot: {screenshot_path}")
print("="*60)
return final_data
else:
raise Exception("Failed to extract any data from page")
except Exception as e:
logging.error(f"Attempt {attempt+1} failed: {e}")
try:
clean_exit(driver)
except:
pass
# Exponential backoff
if attempt < len(GOOGLE_BACKOFF_STEPS):
backoff_time = GOOGLE_BACKOFF_STEPS[attempt]
else:
backoff_time = GOOGLE_BACKOFF_STEPS[-1]
jitter = backoff_time * random.uniform(-0.2, 0.2)
total_wait = max(10, backoff_time + jitter) # Minimum 10 seconds
logging.info(f" Backing off for {total_wait:.1f}s...")
time.sleep(total_wait)
logging.critical(" All attempts failed.")
return None
Step 17: Execute your script
Now we add the bit that actually gets everything up and running - the main execution block. This one only gets triggered when you run the script directly.
We will add a little delay first, so that every run doesn't start at the same time. Once that delay is over, it runs the full scraping function by calling scrape_google_trends(). This function ties together everything we've been building.
After that's done, it checks to make sure we actually got something usable back before it starts parsing the returned JSON. It then logs a quick rundown of what we found: the top region, the full regional breakdown, the timeline range, the data points we were able to grab, and where we saved the screenshot.
If all our retries come up short and the run still fails, we log a clean message saying it failed, and the program exits. Here is the execution block to add to your script:
# ===== MAIN EXECUTION BLOCK =====
# This part runs when you execute: python your_script.py
if __name__ == "__main__":
startup_delay = random.uniform(0, 15)
logging.info(f" Starting in {startup_delay:.1f}s...")
time.sleep(startup_delay)
# Now returns the parsed data
result_data = scrape_google_trends()
if result_data is not None:
logging.info(" HUMAN JOURNEY COMPLETED SUCCESSFULLY!")
logging.info(f" Extracted data for '{KEYWORD}' in {GEO_HUMAN_NAME}")
# Show quick summary
trends_data = result_data.get('google_trends_data', {})
regions = trends_data.get('interest_by_subregion', {})
if regions:
top_region = max(regions.items(), key=lambda x: x[1]) if regions else ("None", 0)
logging.info(f" Highest interest: {top_region[0]} ({top_region[1]})")
# Log all regions
logging.info(" Regional breakdown:")
for region, value in sorted(regions.items(), key=lambda x: x[1], reverse=True):
logging.info(f" • {region}: {value}")
# Log timeline info
timeline = trends_data.get('interest_over_time', {})
dates = timeline.get('x_axis', [])
if dates:
logging.info(f" Timeline range: {dates[0]} to {dates[-1]}")
logging.info(f" Data points: {len(dates)}")
# Log file info
if result_data.get('extraction_info', {}).get('screenshot'):
logging.info(f" Screenshot saved: {result_data['extraction_info']['screenshot']}")
# Success message
print("\n" + "="*60)
print(" MISSION ACCOMPLISHED!")
print("="*60)
print(f"Your human journey successfully extracted Google Trends data.")
print(f"Keyword: '{KEYWORD}'")
print(f"Location: {GEO_HUMAN_NAME}")
print(f"Method: Regex parsing from page text")
print("="*60)
else:
logging.error(" Scraping failed - all attempts exhausted.")
That's the end result. If everything works, the script will save an output file in the same folder that your project is running out of. You'll end up with a JSON file (plus a screenshot), and that JSON is the final, tidied-up result of the whole human journey.
It captures the run's context, the structured Trends data we extracted, and a quick summary so you can sanity-check the scrape in no time.
Here's what the JSON file will look like with placeholders:
{
"extraction_info": {
"keyword": "<KEYWORD_STRING>",
"human_journey": "Google → Google Trends → Search",
"location_target": "<LOCATION_NAME>",
"extracted_at": "<ISO_TIMESTAMP>",
"method": "regex_text_parsing",
"screenshot": "<SCREENSHOT_FILENAME>.png"
},
"google_trends_data": {
"interest_over_time": {
"x_axis": ["<DATE_1>", "<DATE_2>", "<DATE_3>"],
"y_axis": ["<VALUE_1>", "<VALUE_2>", "<VALUE_3>"]
},
"interest_by_subregion": {
"<REGION_1>": "<SCORE_1>",
"<REGION_2>": "<SCORE_2>",
"<REGION_3>": "<SCORE_3>"
}
},
"summary": {
"subregions_count": "<INT>",
"timeline_points": "<INT>",
"regional_data": {
"<REGION_1>": "<SCORE_1>",
"<REGION_2>": "<SCORE_2>",
"<REGION_3>": "<SCORE_3>"
}
}
}
What all this means:
- extraction_info
Some metadata about the run - what keyword you searched for, what location you targeted, when the scrape happened, what method was used, and the exact screenshot file that was saved for debugging or verification.
- google_trends_data
Where the actual extracted payload lives:
◦ interest_over_time: Just two arrays that are lined up. x_axis holds the timeline labels, and y_axis holds the interest value for each matching point.
◦ interest_by_subregion: A map of the region to score, which gets pulled from what Google shows on the page.
- summary
A quick set of validation numbers that you can grab and go with:
◦ subregions_count: How many regions we managed to capture.
◦ timeline_points: How many time points we managed to capture.
◦ regional_data: Repeated in case you want to grab it quickly without having to dig around.
Here's a link to the full code. You now have a fully functional Google Trends scraper setup that works with any keyword or location.
Advanced scraping techniques
Scraping related queries
The fairly large script we just built can pull basic keyword data without much trouble. However, you'll probably notice that it doesn't go after related queries or related topics. That's a deliberate choice, because those elements don't reliably appear in Trends results.
They are only likely to show up when a few specific requirements are met:
- There needs to be enough search volume
- Google must have enough data to be able to show comparisons
- The location you selected has to have access to the data
- You've got a time frame that's wide enough to support looking at trends
Because those conditions don't automatically fall into place, the practical approach is to make your scraper pretty defensive. Rather than automatically assuming related queries or topics are present, add checks to confirm they actually exist before trying to collect them. If they do show up, collect the data. If they don't, just skip that part of the process cleanly.
From there, you need to start paying attention to logging. Your logs should give you a picture of whether related queries or topics were found on a particular run, not just whether the script ran to completion. That feedback is key to understanding when Google actually decides to expose that sort of data, and under what conditions.
Automating with cron jobs or scripts
Let's say you want to run this scraper every day at 6 AM, get data for 10 keywords, and switch up locations. That's exactly where a cron job makes life a lot easier. You set it up once, and it runs on schedule without you having to babysit it.
Here's a high-level explanation of how you can do it:
- Step 1: Set up a dedicated folder for your scraper project. Drop the scraper into it, and then create subfolders for data, logs, and screenshots.
- Step 2: Write a script that handles navigating to the scraper directory, activates Python, runs the scraper, and moves any output files into the right folders. Log each action as it happens.
- Step 3: Add in some error handling. That includes timeouts, retries, and sending notifications when something breaks.
- Step 4: Set up logging so you can see what ran, what failed, and where the output ended up.
- Step 5: Schedule the cron job for a specific time. Add the job to your cron editor, then run a test to make sure it runs the script the way you want it to.
- Step 6: Put it all into action, keep an eye on the first few runs, and then tweak things once you start seeing some real logs coming in.
That's basically how you create a scraper system that gives your business a steady stream of data - just like a person, but without you having to lift a finger beyond setting and maintaining the schedule.
Dealing with rate limits & CAPTCHAs
We've done a good job handling rate limits. Don't expect things to change any time soon. Google isn't exactly thrilled when someone tries to scrape their data using a bot, unless you're using their official API, of course.
The problem is that the API is still in its alpha phase, and you have to apply for access to use it. Even then, it's only available to researchers, so you won’t be able to get it right now.
That's why we've made a point of designing this scraper to be pretty resilient against rate limits. Just about every step it takes is designed to avoid getting blocked by Google. First of all, it doesn't just click through pages like a robot - it behaves like a real person by scrolling through a page at a snail's pace, taking in the sights and occasionally giving the page a little glance over.
There's also some light interaction built in, like hovering over links now and again to make things look as natural as possible. You'll find that nearly every step has a bit of randomness built into its timing. That's no accident. Try to rush it, and you'll find yourself getting blocked in no time.
And because no system's ever 100% foolproof, we have some checks in place to spot when we're getting blocked or a CAPTCHA shows up.
That brings us to the one downside to this approach: if a CAPTCHA does appear, the script will pause, send up a red flag, and let a human step in and sort it out. If it's not a CAPTCHA and just a temporary block getting in the way, the script retries the process.
Rotating proxies
You probably noticed how we put London proxies to good use by pairing them with a London-specific identity pool and then rotating them. We did the right thing.
If you need to pull data from London today and then switch to the U.S. or Africa, you need a separate list of proxies for each location. You also need user agents and identity pools that actually match the IP’s geographical location.
It's one thing to have a proxy, but another story to use it cleanly. Run a London proxy while your browser is set to French language preferences and your timezone to Africa, and those anomalies stick out like a sore thumb. They will trigger a soft block because the session spooks Google’s anti-bot mechanisms.
How you integrate the proxy is also important. Here’s the traditional way to do it:
import requests
proxies = {'http': 'proxy_ip:port'}
response = requests.get(url, proxies=proxies)
It's simple enough, but it’s also much easier to flag because requests-based traffic follows fairly predictable patterns and doesn't give you any of the browser's realism. That's why we approach it differently:
options.add_argument(f"--proxy-server=http://{host}:{port}")
driver = uc.Chrome(options=options)
When you do it like this, Chrome runs the proxy natively. You still get complete JavaScript execution, and your traffic flows through an actual browser session, not a flimsy HTTP client.
Handling request headers
If you’re still seeing rate-limit weirdness or occasional CAPTCHA, there’s another lever you can pull: set modern Chrome headers as soon as the browser session starts. This helps the session look more legitimate at the network level, right from the very first request.
Add this inside launch_stealth_browser(), right after driver = uc.Chrome(...):
# In launch_stealth_browser() function, after driver = uc.Chrome(...)
# ADD MODERN CHROME HEADERS
logging.info(" Setting modern Chrome security headers...")
try:
# Enable Network domain first
driver.execute_cdp_cmd("Network.enable", {})
# Set the exact headers modern Chrome sends
driver.execute_cdp_cmd("Network.setExtraHTTPHeaders", {
"headers": {
# Language preferences
"Accept-Language": "en-GB,en;q=0.9",
# Content negotiation
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"Accept-Encoding": "gzip, deflate, br",
# Security upgrades
"Upgrade-Insecure-Requests": "1",
# ====== ANTI-BOT CRITICAL HEADERS ======
# Fetch Metadata API headers (Google checks these!)
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
# Client Hints (Modern Chrome)
"Sec-CH-UA": f'"Chromium";v="121", "Google Chrome";v="121", "Not-A.Brand";v="99"',
"Sec-CH-UA-Mobile": "?0",
"Sec-CH-UA-Platform": f'"{identity["platform"]}"',
# Do Not Track (common in UK browsers)
"DNT": "1",
# Cache control
"Cache-Control": "max-age=0",
}
})
logging.info(" Modern Chrome headers set successfully")
except Exception as e:
logging.warning(f" Could not set extra headers: {e}")
Best practices for ethical scraping
Scraping publicly available data isn’t automatically illegal, but that doesn’t make it a free-for-all either. If you want to do this the right way, keep a few ethical basics in mind:
- Be careful with data types
Avoid collecting personal data. If you don’t need it for your use case, don’t store it.
- Keep your footprint small
Use low request rates and conservative concurrency, especially on sites that weren’t built to handle heavy automated traffic.
- Read the robots.txt file
It won’t answer every legal question, but it does tell you what the site is asking automated visitors to avoid.
Alternatives to Google Trends scraping
If the code above is too complex or too time-consuming to run and maintain, here are alternatives you could rely on:
Programmatic Trends data
Access is gated via an alpha application program
You can get approved for the alpha and want an official route
Interest over time, interest by region, related topics, related queries
Still third-party extraction, you depend on vendor upkeep
You want Trends → JSON without running a scraper
Keyword popularity over time, location-specific popularity, related topics/queries
You have to work with their task workflow (post task, fetch result)
You’re building a pipeline and need throughput plus predictable billing
Related topics, related queries, geo interest, interest over time, structured JSON
Vendor-dependent, like any third-party API
You want quick integration and consistent response shapes
Realtime trending searches, interest over time, interest by region, related queries, and topics (varies by actor)
Quality and stability vary by actor. Pricing may be subscription, usage, or per-result, depending on the actor
You want “run and export” without maintaining the scraping infrastructure
Google Trends “Explore” -style extraction with search types and filters
Documentation warns that scraped data may not always match browser output exactly
You need a managed approach and can tolerate occasional discrepancies
Google Trends search data (marketed as Trends scraper), plus broader SERP tooling)
Details vary by endpoint. You’ll want to verify the exact outputs you need in the documentation
You’re already using a SERP vendor and want Trends in the same stack
Varies by provider - commonly interest over time, interest by region, related topics, and queries
Reliability and schemas vary a lot - requires vendor vetting
You want options fast, and are willing to test multiple providers
Conclusion
That’s a wrap from us. If you’ve read up to this point, kudos. Follow the steps we laid out, and you’ll be able to build a custom Google Trends scraper that behaves like a regular user. From here, you can tweak the code to pull data for whatever keywords and locations you need. If you want more tips on scraping Google Trends and other sites, head over to our Discord.
Is Google Trends scrapable?
Yes, it's scrapable. Google Trends delivers the data you see directly to your browser and renders it client-side. But it's not handed over in a neat and stable format. Elements pop up and disappear all the time, depending on search volume, location, and time range, which means scraping them reliably calls for some complicated logic and browser automation rather than just sending a few HTTP requests.
Is it legal to scrape Google Trends?
Scraping Google Trends is fine as long as you are just accessing publicly available information and not trying to bypass any paywalls, authentication, or private systems. It's the way you use the data that can get you into trouble - how aggressively you collect it, and whether you happen to violate Google's terms of service.
How do you get raw data from Google Trends?
Google doesn't provide totally raw data in the sense of actual search counts. What you get is some normalized data on a scale of 0 to 100. You can access this data by exporting CSV files from the interface, using third-party APIs that do the legwork for you, or by building your own scraper as we did.
What is the best way to scrape Google Trends?
The best approach is to use browser automation and ensure it has solid anti-detection measures in place. Google Trends relies heavily on JavaScript, and its data is rendered dynamically, so the lightweight approaches tend to break or get blocked. Our scraper is a strong foundation for scraping the site.
Is there an API for Google Trends?
Google does have an official Trends API, but it's still in alpha. Access is restricted, and it's mainly for researchers. For most practical purposes, people end up using paid third-party APIs, hosted scraping services, or rolling their own scraping solution.