How to use Python and Excel together with pandas and openpyxl

This article was first published on Stringfest Analytics , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

In an earlier post I wrote about the role of Python in modern Excel and what spreadsheets users stand to gain by learning the programming language:

In this post, we’ll compare and contrast two essential packages using Python with Excel: pandas and openpyxl.

Why pandas for Excel?

If you’re working with any kind of tabular data in Python, you won’t get far without pandas. This package lets you, among other operations:

  • Sort and filter rows
  • Add, remove and transform columns
  • Aggregate and reshape a table
  • Merge or append multiple tables

Think of it like the Power Query of Python — it lets you build repeatable data cleaning and transformation processes. Like Power Query, pandas can import data from a variety of sources… including Excel. It can even export the results of your analysis back to Excel.

The limitations of working with pandas for Excel

That said, pandas has limited features for deeply interacting with Excel workbooks. For example, it cannot help with the following tasks:

  • Advanced formatting options for cells, such as applying specific styles or conditional formatting
  • Support for executing Excel macros or VBA code within workbooks
  • Direct access to Excel-specific features like data validation, charts, pivot tables, or formulas
  • The ability to manipulate worksheets, such as renaming, adding, or deleting sheets
  • Fine-grained control over workbook properties, such as password protection or worksheet visibility
  • Handling of Excel-specific file formats like .xlsb or .xlsm
  • Integration with Excel add-ins or plugins

Fortunately, several packages exist to provide these more advanced Python/Excel features, most notably openpyxl.

What openpyxl contributes

openpyxl (pronounced “open pie Excel”) is a Python package providing functionality for working with Excel files, specifically the newer .xlsx file format. It allows users to read, write, and modify Excel spreadsheets programmatically. openpyxl integrates smoothly with pandas, allowing users to clean data using pandas and add additional functionality to the workbook using openpyxl.

Specifically, openpyxl can help with the following tasks where pandas cannot:

  • Advanced formatting options for cells, such as applying specific styles or conditional formatting
  • The ability to manipulate worksheets, such as renaming, adding, or deleting sheets
  • Fine-grained control over workbook properties, such as password protection or worksheet visibility
  • Working with named ranges and tables
  • Adding images, shapes, and charts to Excel files
  • Handling print settings, page layout, and page breaks
  • Working with formulas and formula-related functionality in Excel

How to use openpyxl with pandas

Let’s take a typical use case for automating a routine Excel business report where an analyst needs to generate monthly sales reports from multiple Excel worksheets. The analyst might read the data from each worksheet into pandas DataFrames, then continue to use pandas to clean and analyze the data. Finally, openpyxl is used to generate a consolidated report in a new Excel workbook, which contains conditional formatting, charts, and more. The analyst could then use other Python tools to automate the distribution of the report.

For these and other tasks, the basic workflow for using pandas with openpyxl is like so:

  1. Read the data: Use pandas to extract data from a variety of sources into tabular DataFrames
  2. Clean and analyze the data: Use pandas to clean and manipulate the data, perform calculations, apply filters, handle missing values, and derive relevant insights.
  3. Generate the report: Use openpyxl to create a new Excel workbook or select an existing one. Populate the workbook with the consolidated data, applying conditional formatting, creating charts, and incorporating any required visual elements.
  4. Save the report: Save the updated Excel workbook using openpyxl, specifying the desired filename and location.
  5. Distribute and automate the report: Send the generated report to the intended recipients through email, file sharing platforms, or any preferred method.

Other Python packages for Excel

Powerful as it is for Excel tasks, especially when combined with pandas, openpyxl has limitations. Thankfully, other packages are available to handle specific use cases. Some other packages to be aware of:

  • xlsxwriter: Similar to openpyxl, xlsxwriter can be used to write data, formatting, and charts to Excel files in the .xlsx format. This package is optimized for performance, particularly when working with large datasets. It also offers more advanced cell formatting options compared to openpyxl. That said, as the name implies, xlsxwriter can only handle writing data to Excel, while openpyxl can both read and write.
  • xlwings: This package enables the automation of Excel tasks, including interacting with Excel workbooks, running VBA macros, and accessing Excel’s COM API on Windows. It provides complete two-way communication between Excel and Python in a way that openpyxl cannot. On the other hand, this package requires a much more complex development environment, with many features only available on Windows.
  • pyxll: This is a paid library that enables users to write Excel add-ins using Python. Instead of automating Excel workbooks, pyxll allows developers to build standalone applications for data science, financial trading, and other purposes.

Python with Excel, pandas with openpyxl… better together

In conclusion, openpyxl and pandas form a powerful duo for maximizing Excel’s potential through Python automation. They allow easy Excel file handling and provide robust data analysis capabilities. By leveraging both libraries, you can integrate Python seamlessly into Excel workflows, automate tasks, analyze data, and generate informative reports. Have you used openpyxl or pandas, individually or in combination? Share your experiences with these or other Excel-related packages in the comments below.

To leave a comment for the author, please follow the link and comment on their blog: Stringfest Analytics .

Want to share your content on python-bloggers? click here.