How to use Python and Excel together with pandas and openpyxl
Want to share your content on python-bloggers? click here.
In an earlier post I wrote about the role of Python in modern Excel and what spreadsheets users stand to gain by learning the programming language:
In this post, we’ll compare and contrast two essential packages using Python with Excel: pandas
and openpyxl
.
Why pandas
for Excel?
If you’re working with any kind of tabular data in Python, you won’t get far without pandas
. This package lets you, among other operations:
- Sort and filter rows
- Add, remove and transform columns
- Aggregate and reshape a table
- Merge or append multiple tables
Think of it like the Power Query of Python — it lets you build repeatable data cleaning and transformation processes. Like Power Query, pandas
can import data from a variety of sources… including Excel. It can even export the results of your analysis back to Excel.
The limitations of working with pandas
for Excel
That said, pandas
has limited features for deeply interacting with Excel workbooks. For example, it cannot help with the following tasks:
- Advanced formatting options for cells, such as applying specific styles or conditional formatting
- Support for executing Excel macros or VBA code within workbooks
- Direct access to Excel-specific features like data validation, charts, pivot tables, or formulas
- The ability to manipulate worksheets, such as renaming, adding, or deleting sheets
- Fine-grained control over workbook properties, such as password protection or worksheet visibility
- Handling of Excel-specific file formats like .xlsb or .xlsm
- Integration with Excel add-ins or plugins
Fortunately, several packages exist to provide these more advanced Python/Excel features, most notably openpyxl
.
What openpyxl
contributes
openpyxl
(pronounced “open pie Excel”) is a Python package providing functionality for working with Excel files, specifically the newer .xlsx file format. It allows users to read, write, and modify Excel spreadsheets programmatically. openpyxl
integrates smoothly with pandas
, allowing users to clean data using pandas
and add additional functionality to the workbook using openpyxl
.
Specifically, openpyxl
can help with the following tasks where pandas
cannot:
- Advanced formatting options for cells, such as applying specific styles or conditional formatting
- The ability to manipulate worksheets, such as renaming, adding, or deleting sheets
- Fine-grained control over workbook properties, such as password protection or worksheet visibility
- Working with named ranges and tables
- Adding images, shapes, and charts to Excel files
- Handling print settings, page layout, and page breaks
- Working with formulas and formula-related functionality in Excel
How to use openpyxl
with pandas
Let’s take a typical use case for automating a routine Excel business report where an analyst needs to generate monthly sales reports from multiple Excel worksheets. The analyst might read the data from each worksheet into pandas
DataFrames, then continue to use pandas
to clean and analyze the data. Finally, openpyxl
is used to generate a consolidated report in a new Excel workbook, which contains conditional formatting, charts, and more. The analyst could then use other Python tools to automate the distribution of the report.
For these and other tasks, the basic workflow for using pandas
with openpyxl
is like so:
- Read the data: Use
pandas
to extract data from a variety of sources into tabular DataFrames - Clean and analyze the data: Use
pandas
to clean and manipulate the data, perform calculations, apply filters, handle missing values, and derive relevant insights. - Generate the report: Use
openpyxl
to create a new Excel workbook or select an existing one. Populate the workbook with the consolidated data, applying conditional formatting, creating charts, and incorporating any required visual elements. - Save the report: Save the updated Excel workbook using
openpyxl
, specifying the desired filename and location. - Distribute and automate the report: Send the generated report to the intended recipients through email, file sharing platforms, or any preferred method.
Other Python packages for Excel
Powerful as it is for Excel tasks, especially when combined with pandas
, openpyxl
has limitations. Thankfully, other packages are available to handle specific use cases. Some other packages to be aware of:
xlsxwriter
: Similar toopenpyxl
,xlsxwriter
can be used to write data, formatting, and charts to Excel files in the .xlsx format. This package is optimized for performance, particularly when working with large datasets. It also offers more advanced cell formatting options compared toopenpyxl
. That said, as the name implies,xlsxwriter
can only handle writing data to Excel, whileopenpyxl
can both read and write.xlwings
: This package enables the automation of Excel tasks, including interacting with Excel workbooks, running VBA macros, and accessing Excel’s COM API on Windows. It provides complete two-way communication between Excel and Python in a way thatopenpyxl
cannot. On the other hand, this package requires a much more complex development environment, with many features only available on Windows.pyxll
: This is a paid library that enables users to write Excel add-ins using Python. Instead of automating Excel workbooks,pyxll
allows developers to build standalone applications for data science, financial trading, and other purposes.
Python with Excel, pandas
with openpyxl
… better together
In conclusion, openpyxl
and pandas
form a powerful duo for maximizing Excel’s potential through Python automation. They allow easy Excel file handling and provide robust data analysis capabilities. By leveraging both libraries, you can integrate Python seamlessly into Excel workflows, automate tasks, analyze data, and generate informative reports. Have you used openpyxl
or pandas
, individually or in combination? Share your experiences with these or other Excel-related packages in the comments below.
Want to share your content on python-bloggers? click here.