Download Image from URL using Python

This article was first published on PyShark , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

In this tutorial we will explore how to download image from URL using Python.

Table of Contents


Introduction

Working with images in Python became a very popular topic in the recent years. The tasks and automations range from similar image processing to more advanced projects like text extraction.

The training and testing images are usually either available locally or are downloaded from different websites.

Using Python we can automate downloading images from different URLs and Webpages.

To continue following this tutorial we will need the following two Python libraries: requests and beautifulsoup4.

Requests is a simple Python library that allows you to send HTTP requests.

Beautiful Soup is a Python library for pulling data out of HTML files.

If you don’t have it installed, please open “Command Prompt” (on Windows) and install it using the following code:

pip install requests
pip install beautifulsoup4

Download image from URL using Python

In this section we will learn how to download an image from URL using Python.

Here, we will assume you have the URL of the specific image (and not just a webpage).


As the first step, we will import the required dependency and define a function we will use to download images, which will have 3 inputs:

  1. url – URL of the specific image
  2. file_name – name for the saved image
  3. headers – the dictionary of HTTP Headers that will be sent with the request
import requests


def download_image(url, file_name, headers):

Now we can send a GET request to the URL along with the headers, which will return a Response (a server’s response to an HTTP request):

import requests


def download_image(url, file_name, headers):

    #Send GET request
    response = requests.get(url, headers=headers)

If the HTTP request has been successfully completed, we should receive Response code 200 (you can learn more about response codes here).

We are going to check if the response code is 200, and if it is, then we will save the image (which is the content of the request), otherwise we will print out the response code:

import requests


def download_image(url, file_name, headers):

    # Send GET request
    response = requests.get(url, headers=headers)

    # Save the image
    if response.status_code == 200:
        with open(file_name, "wb") as f:
            f.write(response.content)
    else:
        print(response.status_code)

The function to download an image from URL is ready and now we just need to define the url, file_name, and headers, and then run the code.

For example, in one of the previous tutorials, we used some sample images, and you can find one of them here.

The URL looks like this:

https://pyshark.com/wp-content/uploads/2022/05/sampletext1-ocr-539x450.png

You can see that it has the .png extension, meaning that this is a URL to a specific image.

We will save this image as ‘image1.png’.

For the headers we are only using the User-Agent request header which lets the servers identify the application of the requesting user agent (a computer program representing a person, like a browser or an app accessing the Webpage).

import requests


def download_image(url, file_name, headers):

    # Send GET request
    response = requests.get(url, headers=headers)

    # Save the image
    if response.status_code == 200:
        with open(file_name, "wb") as f:
            f.write(response.content)
    else:
        print(response.status_code)


if __name__ == "__main__":

    # Define HTTP Headers
    headers = {
        "User-Agent": "Chrome/51.0.2704.103",
    }

    # Define URL of an image
    url = "https://pyshark.com/wp-content/uploads/2022/05/sampletext1-ocr-539x450.png"

    # Define image file name
    file_name = "image1.png"

    # Download image
    download_image(url, file_name, headers)

Run the code and you should see image1.png created in the same directory as the main.py file with the code:

sample text for OCR

Download all images from Webpage using Python

In this section we will learn how to download all images from URL (specifically a webpage) using Python.

In the previous section we worked with an image specific URL and only downloaded a single image.

In this section we will be working with a URL of a webpage (not a specific image) and download all images from the webpage.

This section will have 2 main parts:

  1. Extract all image links from a Webpage
  2. Download all images

As the first step, we will import the required dependency and define a function we will use to download images, which will have 2 inputs:

  1. webpage- URL of the specific webpage with images
  2. headers – the dictionary of HTTP Headers that will be sent with the request
import requests
from bs4 import BeautifulSoup


def extract_image_links(webpage, headers):

Now we can send a GET request to the URL along with the headers, which will return a Response (a server’s response to an HTTP request):

import requests
from bs4 import BeautifulSoup


def extract_image_links(webpage, headers):

    # Send GET request
    response = requests.get(webpage, headers=headers)

If the HTTP request has been successfully completed, we should receive Response code 200.

We are going to check if the response code is 200, and if it is, then we will:

  1. Parse the HTML content of the webpage
  2. Traverse the tree to find all ‘img’ tags
  3. Extract ‘src’ attribute of every image
  4. Filter for PNG format image links

Step 1: Parse the HTML content of the webpage

import requests
from bs4 import BeautifulSoup


def extract_image_links(webpage, headers):

    # Send GET request
    response = requests.get(webpage, headers=headers)

    # Check if the status_code is 200
    if response.status_code == 200:

        # Parse the HTML content of the webpage
        soup = BeautifulSoup(response.content, 'html.parser')

If you print images, you will see a list of image tags along with all the attributes.

Step 2: Traverse the tree to find all ‘img’ tags

import requests
from bs4 import BeautifulSoup


def extract_image_links(webpage, headers):

    # Send GET request
    response = requests.get(webpage, headers=headers)

    # Check if the status_code is 200
    if response.status_code == 200:

        # Parse the HTML content of the webpage
        soup = BeautifulSoup(response.content, 'html.parser')

        # Find all of the image tags:
        images = soup.findAll('img')

Step 3: Extract ‘src’ attribute of every image

What we want to do is extract the ‘src’ attribute from each image tag which will be the URL of an image.

import requests
from bs4 import BeautifulSoup


def extract_image_links(webpage, headers):

    # Send GET request
    response = requests.get(webpage, headers=headers)

    # Check if the status_code is 200
    if response.status_code == 200:

        # Parse the HTML content of the webpage
        soup = BeautifulSoup(response.content, 'html.parser')

        # Find all of the image tags:
        images = soup.findAll('img')

        # Extract 'src' attribute of every image
        image_links = []
        for image in images:
            image_links.append(image.attrs['src'])

If you print out image_links, you will see links to all images on the webpage including logos that are in SVG format as well as embedded images.

Step 4: Filter for PNG format image links

import requests
from bs4 import BeautifulSoup


def extract_image_links(webpage, headers):

    # Send GET request
    response = requests.get(webpage, headers=headers)

    # Check if the status_code is 200
    if response.status_code == 200:

        # Parse the HTML content of the webpage
        soup = BeautifulSoup(response.content, 'html.parser')

        # Find all of the image tags:
        images = soup.findAll('img')

        # Extract 'src' attribute of every image
        image_links = []
        for image in images:
            image_links.append(image.attrs['src'])

        #Filter for PNG format image links
        image_links = [image for image in image_links if image.endswith('.png')]

        return image_links

If you print out image_links now, you should see the following image links:

['https://pyshark.com/wp-content/uploads/2022/05/Extract-Text-from-Image-using-Python-840x400.png',
'https://pyshark.com/wp-content/uploads/2022/05/sampletext1-ocr.png',
'https://pyshark.com/wp-content/uploads/2022/05/sampletext2-ocr.png',
'https://pyshark.com/wp-content/uploads/2022/05/sampletext3-ocr.png',
'https://pyshark.com/wp-content/uploads/2022/05/image-10.png',
'https://pyshark.com/wp-content/uploads/2022/04/SharpestMinds-Data-Science-Mentorship-1024x190.png']

2. Download all images

In order to download each image from URL using Python we will iterate over image_links and use the download_image function from the previous section to download each image. Here is some sample code that we will use:

for i, url in enumerate(image_links):
    file_name = f'image_{i}.png'
    download_image(url, file_name, headers)

Now let’s put all of this code together!

Let’s use the URL of one of my previous tutorials that explained how to extract text from images using Python. The webpage contains several images and we will learn how to download all of them.

The URL looks like this:

https://pyshark.com/extract-text-from-image-using-python/

For the headers we are only using the User-Agent request header which lets the servers identify the application of the requesting user agent (a computer program representing a person, like a browser or an app accessing the Webpage).


Complete code:

import requests
from bs4 import BeautifulSoup


def download_image(url, file_name, headers):
    # Send GET request
    response = requests.get(url, headers=headers)

    # Save the image
    if response.status_code == 200:
        with open(file_name, "wb") as f:
            f.write(response.content)
    else:
        print(response.status_code)


def extract_image_links(webpage, headers):
    # Send GET request
    response = requests.get(webpage, headers=headers)
    
    # Check if the status_code is 200
    if response.status_code == 200:
        
        # Parse the HTML content of the webpage
        soup = BeautifulSoup(response.content, 'html.parser')
        
        # Find all of the image tags:
        images = soup.findAll('img')
        
        # Extract 'src' attribute of every image
        image_links = []
        for image in images:
            image_links.append(image.attrs['src'])
        
        #Filter for PNG format image links
        image_links = [image for image in image_links if image.endswith('.png')]

        return image_links


if __name__ == "__main__":

    # Define HTTP Headers
    headers = {
        "User-Agent": "Chrome/51.0.2704.103",
    }

    # Define URL of the webpage
    webpage = 'https://pyshark.com/extract-text-from-image-using-python/'

    #Extract image links
    image_links = extract_image_links(webpage, headers)

    # Download all images
    for i, url in enumerate(image_links):
        file_name = f'image_{i}.png'
        download_image(url, file_name, headers)

Run the code and you should see 6 image files created in the same directory as the main.py file with the code:

Note:
You may see more images downloaded depending on when you run the script since my website has ads running on it, so the ad publisher logo can be downloaded as an image.


Conclusion

In this article we explored how to download images from URL and Webpages using Python.

Feel free to leave comments below if you have any questions or have suggestions for some edits and check out more of my Python Programming tutorials.

The post Download Image from URL using Python appeared first on PyShark.

To leave a comment for the author, please follow the link and comment on their blog: PyShark .

Want to share your content on python-bloggers? click here.