Articles by John Mount

Data re-Shaping in R and in Python

January 28, 2020 | John Mount

Nina Zumel and I have a two new tutorials on fluid data wrangling/shaping. They are written in a parallel structure, with the R version of the tutorial being almost identical to the Python version of the tutorial. This reflects our opinion on the “which is better for data science ... [...Read more...]

sklearn Pipe Step Interface for vtreat

January 14, 2020 | John Mount

We’ve been experimenting with this for a while, and the next R vtreat package will have a back-port of the Python vtreat package sklearn pipe step interface (in addition to the standard R interface). This means the user can express easily express modeling intent by choosing between coder$fit_... [...Read more...]

New vtreat Feature: Nested Model Bias Warning

January 11, 2020 | John Mount

For quite a while we have been teaching estimating variable re-encodings on the exact same data they are later naively using to train a model on, leads to an undesirable nested model bias. The vtreat package (both the R version and Python version) both incorporate a cross-frame method that allows ... [...Read more...]

A Richer Category for Data Wrangling

December 22, 2019 | John Mount

I’ve been writing a lot about a category theory interpretations of data-processing pipelines and some of the improvements we feel it is driving in both the data_algebra and in rquery/rqdatatable. I think I’ve found an even better category theory re-formulation of the package, which I will ... [...Read more...]

Better SQL Generation via the data_algebra

December 18, 2019 | John Mount

In our recent note What is new for rquery December 2019 we mentioned an ugly processing pipeline that translates into SQL of varying size/quality depending on the query generator we use. In this note we try a near-relative of that query in the data_algebra. dplyr translates the query to ... [...Read more...]

Python changing attribute mystery. Help?

December 7, 2019 | John Mount

Python peeps: any idea why this attribute changes value when I re-examine it? I am using PyCharm, but the calculation is weird even in Jupyter. It doesn’t just seem to be the debugger, running it in Jupyter gives the wrong value (just {'x'}, instead of {'x', 'y'}). The type ... [...Read more...]
1 2 3 4