How to work with cells, rows and columns in Excel with openpyxl

Posted on March 26, 2024 by George Mount in Data science | 0 Comments

This article was first published on python - Stringfest Analytics , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Just like in biology, where cells are the building blocks of life, in your Excel workbook, cells serve as the foundational elements. In this post, you’ll discover how to interact with individual cells and then extend your knowledge to managing entire rows and columns using openpyxl.

For this demo, I’ll assume you’re familiar with installing and working with Python packages, along with basic operations such as creating variables, indexing, and so on. If you need assistance with these topics before proceeding, consider checking out my book Advancing into Analytics:

Advancing into Analytics: From Excel to Python and R (O’Reilly)

To kick things off, start by importing workbook.Workbook and load_workbook from openpyxl.

from openpyxl.workbook import Workbook
from openpyxl import load_workbook

Next, use the load_workbook() function to read in the regions.xlsx file. I’m going to define the active worksheet as ws, and now that it’s an Excel object I can index it to access individual cells.

wb = load_workbook('regions.xlsx')
ws = wb.active

For example, I can access cell A1 of this worksheet by indexing ws at A1. Print my_cell and you’ll see the result gives the cell location, but what about its underlying value?

my_cell = ws['A1']
my_cell

<Cell 'Sheet1'.A1>

That can be found by accessing the value attribute of my_cell:

my_cell.value

'Region'

Now let’s look at working with rows and columns, which, like cells, are possible to index.

I’m going to start with rows. First, create a variable called cell_range, then specify which row to grab with the index position 1, or the first row in the workbook.

cell_range = ws[1]
cell_range

(<Cell 'Sheet1'.A1>,
 <Cell 'Sheet1'.B1>,
 <Cell 'Sheet1'.C1>,
 <Cell 'Sheet1'.D1>)

So now cell_range contains all the cells with data in row 1. You could extend this out to rows 1 through 3 with ws[1:3].

ws[1:3]

((<Cell 'Sheet1'.A1>,
  <Cell 'Sheet1'.B1>,
  <Cell 'Sheet1'.C1>,
  <Cell 'Sheet1'.D1>),
 (<Cell 'Sheet1'.A2>,
  <Cell 'Sheet1'.B2>,
  <Cell 'Sheet1'.C2>,
  <Cell 'Sheet1'.D2>),
 (<Cell 'Sheet1'.A3>,
  <Cell 'Sheet1'.B3>,
  <Cell 'Sheet1'.C3>,
  <Cell 'Sheet1'.D3>))

I can do the same with my column data, except this time I will index by column letter. For example column C will give you all those cells in the third column.

col_c = ws['C']
col_c

(<Cell 'Sheet1'.C1>,
 <Cell 'Sheet1'.C2>,
 <Cell 'Sheet1'.C3>,
 <Cell 'Sheet1'.C4>,
 <Cell 'Sheet1'.C5>,
 <Cell 'Sheet1'.C6>,
 <Cell 'Sheet1'.C7>)

If you’d prefer to index columns by number, not letter, it’s as simple as bringing in the get_column_letter() function and indexing the worksheet that way.

from openpyxl.utils import get_column_letter
ws[get_column_letter(3)]

(<Cell 'Sheet1'.C1>,
 <Cell 'Sheet1'.C2>,
 <Cell 'Sheet1'.C3>,
 <Cell 'Sheet1'.C4>,
 <Cell 'Sheet1'.C5>,
 <Cell 'Sheet1'.C6>,
 <Cell 'Sheet1'.C7>)

And of course, you can index multiple columns just like rows by, for example, printing everything in columns A through C like in col_range:

col_range = ws['A':'C']
col_range

((<Cell 'Sheet1'.A1>,
  <Cell 'Sheet1'.A2>,
  <Cell 'Sheet1'.A3>,
  <Cell 'Sheet1'.A4>,
  <Cell 'Sheet1'.A5>,
  <Cell 'Sheet1'.A6>,
  <Cell 'Sheet1'.A7>),
 (<Cell 'Sheet1'.B1>,
  <Cell 'Sheet1'.B2>,
  <Cell 'Sheet1'.B3>,
  <Cell 'Sheet1'.B4>,
  <Cell 'Sheet1'.B5>,
  <Cell 'Sheet1'.B6>,
  <Cell 'Sheet1'.B7>),
 (<Cell 'Sheet1'.C1>,
  <Cell 'Sheet1'.C2>,
  <Cell 'Sheet1'.C3>,
  <Cell 'Sheet1'.C4>,
  <Cell 'Sheet1'.C5>,
  <Cell 'Sheet1'.C6>,
  <Cell 'Sheet1'.C7>))

OK, now it’s pretty cool that openpyxl can go through your rows and columns, printing which contain data. But you probably want to know what that data actually is.

There are a number of ways to do this, but I am going to use a loop to run through a specified range of cells and print the row contents of each: that’s going to be the iter_rows() function, and I’ll say that I want to start at row 1, column 1, then go to row 2, column 3 and print the value of each cell.

for row in ws.iter_rows(min_row=1, min_col=1, 
                        max_row=2, max_col=3):
    for cell in row:
        print(cell.value)

Region
Units
Sales
South
54
332

And there you have it. You now possess the ability to navigate any part of an Excel workbook using these techniques. Do you have any questions regarding how to manipulate individual cells, rows, and columns in Excel with openpyxl? Feel free to ask in the comments. To delve deeper into these topics and for additional practice, consider exploring my course “Using Python with Excel” on LinkedIn Learning.

The post How to work with cells, rows and columns in Excel with openpyxl first appeared on Stringfest Analytics.

To leave a comment for the author, please follow the link and comment on their blog: python - Stringfest Analytics .

Want to share your content on python-bloggers? click here.

Python-bloggers

Data science news and tutorials - contributed by Python bloggers

How to work with cells, rows and columns in Excel with openpyxl

Related