Merge PDF Files using Python

This article was first published on PyShark , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

In this tutorial we will explore how to merge PDF files using Python.

Table of Contents


Introduction

Merging PDF files is often a required operation after scanning multiple pages of documents, or saving multiple pages as individual documents on your computer.

There are several software such as Adobe as well as online tools that can help perform this task quickly. However most of them are either paid or might not have enough security features provided.

In this tutorial we will explore how to merge PDF files using Python on your computer with a few lines of code.

To continue following this tutorial we will need the following two Python library: PyPDF2.

If you don’t have them installed, please open “Command Prompt” (on Windows) and install them using the following code:

pip install PyPDF2

In order to continue in this tutorial we will some PDF files to work with.

Here are the three PDF files we will use in this tutorial:



These PDF files will reside in the pdf_files folder, which is in the same directory as the main.py with our code.

Here is how the structure of my files looks like:

PDF files to merge using Python

Merge two PDF files using Python

In order to perform PDF merging in Python we will need to import the PdfFileMerger() class from the PyPDF2 library, and create an instance of this class.

In this example we will merge two files: sample_page1.pdf and sample_page2.pdf.

In this case, the two file names can be placed into a list, which we will then iterate over and append one to another:

from PyPDF2 import PdfFileMerger

#Create and instance of PdfFileMerger() class
merger = PdfFileMerger()

#Create a list with file names
pdf_files = ['pdf_files/sample_page1.pdf', 'pdf_files/sample_page2.pdf']

#Iterate over the list of file names
for pdf_file in pdf_files:
    #Append PDF files
    merger.append(pdf_file)

#Write out the merged PDF
merger.write("merged_2_pages.pdf")
merger.close()

And you should see merged_2_pages.pdf created in the same directory as the main.py file with the code:


Merge many PDF files using Python

In this section we will explore how to merge many PDF files using Python.

One way of merging many PDF files would be to add the file names of every PDF files to a list manually and then perform the same operation as in the previous section.

But what if we have 100 PDF files in the folder? Using the os library we can access all of the file names in a given directory as a list and iterate over it:

from PyPDF2 import PdfFileMerger
import os

#Create and instance of PdfFileMerger() class
merger = PdfFileMerger()

#Create a list with PDF file names
path_to_files = r'pdf_files/'

#Get the file names in the directory
for root, dirs, file_names in os.walk(path_to_files):
    #Iterate over the list of file names
    for file_name in file_names:
        #Append PDF files
        merger.append(path_to_files + file_name)

#Write out the merged PDF
merger.write("merged_all_pages.pdf")
merger.close()

And you should see merged_all_pages.pdf created in the same directory as the main.py file with the code:


Conclusion

In this article we explored how to merge multiple PDF files using Python.

Feel free to leave comments below if you have any questions or have suggestions for some edits and check out more of my Python Programming tutorials.

The post Merge PDF Files using Python appeared first on PyShark.

To leave a comment for the author, please follow the link and comment on their blog: PyShark .

Want to share your content on python-bloggers? click here.