Zero-based indexing: What it is and when you’ve seen it before

[This article was first published on Stringfest Analytics, and kindly contributed to python-bloggers]. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Have you ever needed to pull the first record of a dataset? What about the last, or maybe even the seventeenth? This is called indexing. There’s a lot of data out there, and indexing gives us a set of rules to extract by position.

Except… not every program indexes the same way. In particular, there is a difference in how to count the element in the first position. Let me explaining using Excel, then an everyday computing example:

How Excel does it

Indexing can be done in Excel with — go figure — the INDEX() function:

INDEX(array, row_num, [column_num])

Let’s take a look at how this plays out in both one and two dimensions.

One-dimensional

By one-dimensional I mean either a row or column of data. I will operate on a named range in the following example:

To get the third item, we pass 3 into the function.

Two-dimensional

By two-dimensional I mean an object with rows and columns. This will work similarly, just an extra argument. Note that I have stored the data in a named table — a good practice for any two-dimensional data.

Another way to count

So far, pretty intuitive. When you want to access the element you want, you start counting from one and that’s the index position. This is an example of one-based indexing. One-based indexing makes a lot of sense because as humans, we tend to start counting at one.

But computers don’t always start counting at one. Instead, they often start counting at zero. This is called (you guessed it) zero-based indexing. This may sound pretty foreign, but I’d like to show you an example of this you’ve probably seen before.

Imagine being so excited to get your hands on a dataset like this that you click “download” several times. You download folder will look something like this:

Zero-based index files

Did you notice that the second dataset is actually called dataset (1)? The first dataset is just dataset… well, zero. This is zero-based indexing, and it happens all over computing, including Python.

How Python does it

To learn more about how to index in one and two dimensions in Python, check out the below Jupyter Notebook.

Computer programmers can have strong opinions about zero- versus one-based indexing, but you should be comfortable working with both: as you’ve seen, Excel is one-based, as is R, but Python and JavaScript, among others, are zero-based.

Want to keep counting?

If you’d like to learn more about Python, including indexing and pandas, with the specific needs of an Excel user in mind, check out my book Advancing into Analytics: From Excel to Python and R

To leave a comment for the author, please follow the link and comment on their blog: Stringfest Analytics.

Want to share your content on python-bloggers? click here.