Articles by John Mount

Schemas for Python Data Frames

September 12, 2023 | John Mount

The Pandas data frame is probably the most popular tool used to model tabular data in Python. For in-memory data, Pandas serves a role that might normally fall to a relational database. Though, Pandas data frames are typically manipulated through methods, instead of with a relational query language. One can […] [...Read more...]

Experimenting with Polars for Data in Python

December 7, 2022 | John Mount

I’ve just started experimenting with the Polars data frame library in Python. I really like the programmable API it exposes. In fact I am starting an experimental adapter from the data algebra to Polars. When this is complete one can use the data algebra to run the same data ... [...Read more...]

An Effective Personal Jupyter Data Science Workflow

August 20, 2022 | John Mount

I would like to share what I have found to be a very effective personal Jupyter workflow for data science development. Jupyter (nee IPython) workbooks are JSON documents that allow a data scientist to mix: code, markdown, results, images, and graphs. They are a great contribution to scientific reproducibility, as […]
[...Read more...]

Data Algebra 0.9.0 Release

October 9, 2021 | John Mount

I am pleased to announce the 0.9.0 release of the data algebra. The data algebra is realization of the Codd relational algebra for data in written in terms of Python method chaining. It allows the concise clear specification of useful data transforms. Some examples can be found here. Benefits include […] [...Read more...]

I think Pandas may have “lost the plot.”

August 4, 2021 | John Mount

I’ve thought of Pandas as in-memory column oriented data structure with reasonable performance. If I need high performance or scale, I can move to a database. Now I kind of wonder what Pandas is, or what it wants to be. The version 1.3.0 package seems to be marking natural ways […]
[...Read more...]

Using WITH For Neater SQL

June 21, 2021 | John Mount

  I’d like to work an example of using SQL WITH Common Table Expressions to produce more legible SQL. A major complaint with SQL is that it composes statements by right-ward nesting. That is: a sequence of operations A -__ B -__ C is represented as SELECT C FROM SELECT […] [...Read more...]

data_algebra 0.7.0 What is New

June 7, 2021 | John Mount

I’ve been tinkering a lot recently with the data_algebra, and just released version 0.7.0 to PyPi. In this note I’ll touch on what the data algebra is, what the new features are, and my plans going forward.     The data algebra The data algebra is a modern realization of […] [...Read more...]
1 2 3 4