How to Scrape Google Results for Free Using Python

[This article was first published on Python – Predictive Hacks, and kindly contributed to python-bloggers]. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

There are a lot of paid services that are providing google results and it’s for a reason. In the right hands, google results can be gold. In this post, we will show you how you can get the results in a few lines of code for free.

#importing the libraries we will need
import pandas as pd
import numpy as np
import urllib
from fake_useragent import UserAgent
import requests
import re
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup

The key here is to build the google URL using our keyword and the number of results. To do this we have to encode the keyword into HTML using urllib and add the id to the URL. Let’s say our keyword is “elbow method python”.

keyword= "elbow method python"
html_keyword= urllib.parse.quote_plus(keyword)
print(html_keyword)
'elbow+method+python'

Now let’s build the google URL

number_of_result=20
google_url = "https://www.google.com/search?q=" + html_keyword + "&num=" + str(number_of_result)
print(google_url)
'https://www.google.com/search?q=elbow+method+python&num=20'

We need now to hit the URL and get the results. Fake Useragent and Beautiful Soup will help us with that.

response = requests.get(google_url, {"User-Agent": ua.random})
soup = BeautifulSoup(response.text, "html.parser")

The only thing we need now is regular expressions to extract the information we want.

result = soup.find_all('div', attrs = {'class': 'ZINbbc'})
results=[re.search('\/url\?q\=(.*)\&sa',str(i.find('a', href = True)['href'])) for i in result]

#this is because in rare cases we can't get the urls
links=[i.group(1) for i in results if i != None]

links
['https://predictivehacks.com/k-means-elbow-method-code-for-python/',
 'https://www.scikit-yb.org/en/latest/api/cluster/elbow.html',
 'https://pythonprogramminglanguage.com/kmeans-elbow-method/',
 'https://www.geeksforgeeks.org/elbow-method-for-optimal-value-of-k-in-kmeans/',
 'https://blog.cambridgespark.com/how-to-determine-the-optimal-number-of-clusters-for-k-means-clustering-14f27070048f',
 'https://medium.com/analytics-vidhya/elbow-method-of-k-means-clustering-algorithm-a0c916adc540',
 'https://www.youtube.com/watch%3Fv%3Dqs8nfzUsW5U',
 'https://www.youtube.com/watch%3Fv%3DnMXg0f5HBac',
 'https://www.youtube.com/watch%3Fv%3DzQfEc7vA1gU',
 'https://stackoverflow.com/questions/41540751/sklearn-kmeans-equivalent-of-elbow-method',
 'https://campus.datacamp.com/courses/cluster-analysis-in-python/k-means-clustering-3%3Fex%3D6',
 'https://github.com/topics/elbow-method',
 'https://github.com/topics/elbow-method%3Fl%3Dpython',
 'https://towardsdatascience.com/clustering-metrics-better-than-the-elbow-method-6926e1f723a6',
 'https://vitalflux.com/k-means-elbow-point-method-sse-inertia-plot-python/',
 'https://www.kdnuggets.com/2019/10/clustering-metrics-better-elbow-method.html',
 'https://www.kaggle.com/abhishekyadav5/kmeans-clustering-with-elbow-method-and-silhouette',
 'https://realpython.com/k-means-clustering-python/',
 'https://pyclustering.github.io/docs/0.8.2/html/d3/d70/classpyclustering_1_1cluster_1_1elbow_1_1elbow.html',
 'https://jtemporal.com/kmeans-and-elbow-method/']

And this is how you can scrape Google results using python. If you want to go even further you can use a VPN so you can have google results from different Countries and Cities.

The Google Results Scraper Function

Let’s sum it up in a single function.

def google_results(keyword, n_results):
    query = keyword
    query = urllib.parse.quote_plus(query) # Format into URL encoding
    number_result = n_results
    ua = UserAgent()
    google_url = "https://www.google.com/search?q=" + query + "&num=" + str(number_result)
    response = requests.get(google_url, {"User-Agent": ua.random})
    soup = BeautifulSoup(response.text, "html.parser")
    result = soup.find_all('div', attrs = {'class': 'ZINbbc'})
    results=[re.search('\/url\?q\=(.*)\&sa',str(i.find('a', href = True)['href'])) for i in result]
    links=[i.group(1) for i in results if i != None]
    return (links)
google_results('machine learning in python', 10)
['https://www.coursera.org/learn/machine-learning-with-python',
 'https://www.w3schools.com/python/python_ml_getting_started.asp',
 'https://machinelearningmastery.com/machine-learning-in-python-step-by-step/',
 'https://www.tutorialspoint.com/machine_learning_with_python/index.htm',
 'https://towardsai.net/p/machine-learning/machine-learning-algorithms-for-beginners-with-python-code-examples-ml-19c6afd60daa',
 'https://www.youtube.com/watch%3Fv%3DujTCoH21GlA',
 'https://www.youtube.com/watch%3Fv%3DRnFGwxJwx-0',
 'https://www.edx.org/course/machine-learning-with-python-a-practical-introduct',
 'https://scikit-learn.org/',
 'https://www.geeksforgeeks.org/introduction-machine-learning-using-python/']

To leave a comment for the author, please follow the link and comment on their blog: Python – Predictive Hacks.

Want to share your content on python-bloggers? click here.