Python was not designed for data analysis (and why that’s OK)

George Mount

4 years ago

This article was first published on Stringfest Analytics , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

A major reason I think it’s easier for Excel users to pick up R versus Python is that these tools tend to “think” more similarly than Python. See what I mean here: let’s take a range of numbers and attempt to multiply it by two using the built-in range, vector and list objects in Excel, Python and R respectively:

multiply-range-by-two-in-excel-r-python Download

Looks pretty straightforward in Excel and R, right? Take the range, multiply by two, get each number times two. By contrast, Python does something rather different: it literally takes the range, and duplicates it (so we get eight numbers not four). Weird, right?

Well, not necessarily. Excel and R were designed for statistics and arithmetic. Python was designed more generally to communicate with the operating system, process errors, and so forth. The way a program ought to “think” for these tasks is rather different than for analyzing data.

“You’re crazy, bud. Python’s cleaning up in the data space right now,” you may be thinking (pun intended). That’s true. And it’s with the help of a fantastic set of packages to make analyzing data there feel a lot more natural (You may have heard of some of these: pandas, scikit-learn, numpy, etc.).

This post isn’t a takedown of Python or endorsement or R. You could never pick a favorite child. It’s just an exploration of how software objectives inform software behavior, with a very simple example.

To get started with this great set of tools for data analysis, check out my book Advancing into Analytics.

To leave a comment for the author, please follow the link and comment on their blog: Stringfest Analytics .

Want to share your content on python-bloggers? click here.