Python-bloggers

data_algebra 0.7.0 What is New

This article was first published on python – Win Vector LLC , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

I’ve been tinkering a lot recently with the data_algebra, and just released version 0.7.0 to PyPi. In this note I’ll touch on what the data algebra is, what the new features are, and my plans going forward.

 

 

The data algebra

The data algebra is a modern realization of elements of Codd’s 1969 relational model for data wrangling (see also Codd’s 12 rules).

The idea is: most data manipulation tasks can usefully be broken down into a small number of fundamental data transforms plus composition. In Codd’s initial writeup, composition was expressed using standard mathematical operator notation. For “modern” realizations one wants to use a composition notation that is natural for the language you are working in. For Python the natural composition notation is method dispatch.

The problems with the relational model were two fold:

The data algebra implements the Codd transforms (using Codd’s names where practical) in Python.  It can manipulate data in Pandas or SQL. Such a strategy is famously used in the dplyr / dbplyr R packages (which use a pipe operator for composition, as R native S3/S4 method dispatch is again through somewhat illegible nesting).

Benefits

The benefits / purposes of the data algebra include:

Example

Here is a simple data algebra example (source here).

What is new in version 0.7.0?

Version 0.7.0 is a major upgrade. The improvements include:

Conclusion

The data algebra is a great tool for Python data science projects. We are thrilled it has gotten to the point where we use it in client projects. What is missing is a “data algebra manual” and training, but with luck we hope to someday fill that gap.

To leave a comment for the author, please follow the link and comment on their blog: python – Win Vector LLC .

Want to share your content on python-bloggers? click here.
Exit mobile version