Reports are everywhere, so any tech professional must know how to create them. It’s a tedious and time-consuming task, which makes it a perfect candidate for automation with Python.
You can benefit from an automated report generation whether you’re a data scientist or a software developer. For example, data scientists might use reports to show performance or explanations of machine learning models.
This article will teach you how to make data-visualization-based reports and save them as PDFs. To be more precise, you’ll learn how to combine multiple data visualizations (dummy sales data) into a single PDF file.
And the best thing is – it’s easier than you think!
The article is structured as follows:
You can download the Notebook with the source code here.
You can’t have reports without data. That’s why you’ll have to generate some first—more on that in a bit.
Let’s start with the imports. You’ll need a bunch of things – but the
FPDF library is likely the only unknown. Put simply, it’s used to create PDFs, and you’ll work with it a bit later. Refer to the following snippet for the imports:
Let’s generate some fake data next. The idea is to declare a function that returns a data frame of dummy sales data for a given month. It does that by constructing a date range for the entire month and then assigning the sales amount as a random integer within a given range.
You can use the
calendar library to get the last day for any year/month combination. Here’s the entire code snippet:
A call to
generate_sales_data(month=3) generated 31 data points for March of 2020. Here’s how the first couple of rows look like:
And that’s it – you now have a function that generates dummy sales data. Let’s see how to visualize it next.
Your next task is to create a function that visualizes the earlier created dataset as a line plot. It’s the most appropriate visualization type, as you’re dealing with time series data.
Here’s the function for data visualization and an example call:
In a nutshell – you’re creating data visualization, setting the title, playing around with fonts – nothing special. The visualization isn’t shown to the user but is instead saved to the machine. You’ll see later how powerful this can be.
An example call will save a data visualization for December of 2020. Here’s how it looks like:
And that’s your visualization function. There’s only one step remaining before you can create PDF documents, and that is to save all the visualization and define the report page structure.
Create a PDF page structure
The task now is to create a function that does the following:
- Creates a folder for charts – deletes if it exists and re-creates it
- Saves a data visualization for every month in 2020 except for January – so you can see how to work with different number of elements per page (feel free to include January too)
- Creates a PDF matrix from the visualizations – a 2-dimensional matrix where a row represents a single page in the PDF report
Here’s the code snippet for the function:
It’s possibly a lot to digest, so go over it line by line. The comments should help. The idea behind sorting is to obtain the month integer representation from the string – e.g., 3 from “3.png” and use this value to sort the charts. Delete this line if the order doesn’t matter, but that’s not the case with months.
Here’s an example call of the
You should see the following in your Notebook after running the above snippet:
In case you’re wondering – here’s how the
plots/ folder looks on my machine (after calling the
And that’s all you need to construct PDF reports – you’ll learn how to do that next.
Create PDF reports
This is where everything comes together. You’ll now create a custom
FPDF. This way, all properties and methods are available in our class, if you don’t forget to call
super().__init__() in the constructor. The constructor will also hold values for page width and height (A4 paper).
header()– used to define the document header. A custom logo is placed on the left (make sure to have one or delete this code line), and a hardcoded text is placed on the right
footer()– used to define the document footer. It will simply show the page number
page_body()– used to define how the page looks like. This will depend on the number of visualizations shown per page, so positions are margins are set accordingly (feel free to play around with the values)
print_page()– used to add a blank page and fill it with content
Here’s the entire code snippet for the class:
Now it’s time to instantiate it and to append pages from the 2-dimensional content matrix:
The above cell will take some time to execute, and will return an empty string when done. That’s expected, as your report is saved to the folder where the Notebook is stored.
Here’s how to first page of the report should look like:
Of course, yours will look different due to the different logo and due to sales data being completely random.
And that’s how you create data-visualization-powered PDF reports with Python. Let’s wrap things up next.
You’ve learned many things today – how to create dummy data for any occasion, how to visualize it, and how to embed visualizations into a single PDF report. Embedding your visualizations will require minimal code changes – mostly for positioning and margins.
Let me know if you’d like to see a guide for automated report creation based on machine learning model interpretations (SHAP or LIME) or something else related to data science.
Thanks for reading.
- Top 5 Books to Learn Data Science in 2021
- SHAP: How to Interpret Machine Learning Models With Python
- Top 3 Classification Machine Learning Metrics – Ditch Accuracy Once and For All
- ROC and AUC – How to Evaluate Machine Learning Models
- Precision-Recall Curves: How to Easily Evaluate Machine Learning Models
The post How to Create PDF Reports with Python – The Essential Guide appeared first on Better Data Science.