A Richer Category for Data Wrangling

December 22, 2019 | John Mount

I've been writing a lot about a category theory interpretations of data-processing pipelines and some of the improvements we feel it is driving in both the data_algebra and in rquery/rqdatatable. I think I've found an even better category theory re-formulation of the package, which I will

Better SQL Generation via the data_algebra

December 18, 2019 | John Mount

In our recent note What is new for rquery December 2019 we mentioned an ugly processing pipeline that translates into SQL of varying size/quality depending on the query generator we use. In this note we try a near-relative of that query in the data_algebra. dplyr translates the query to

Python changing attribute mystery. Help?

December 7, 2019 | John Mount

Python peeps: any idea why this attribute changes value when I re-examine it? I am using PyCharm, but the calculation is weird even in Jupyter. It doesn't just seem to be the debugger, running it in Jupyter gives the wrong value (just {'x'}, instead of {'x', 'y'}). The type

New Introduction to the data_algebra

October 31, 2019 | John Mount

We've had really good progress in bringing the Python data_algebra to feature parity with R rquery. In fact we are able to reproduced the New Introduction to rquery article as a "New Introduction to the data_algebra" here. The idea is: you may have good reasons to want

AI for Engineers

October 9, 2019 | John Mount

For the last year we (Nina Zumel, and myself: John Mount) have had the honor of teaching the AI200 portion of LinkedIn's AI Academy. John Mount at the LinkedIn campus Nina Zumel designed most of the material, and John Mount has been delivering it and bringing her feedback. We'...
[...Read more...]

vtreat Cross Validation

October 6, 2019 | John Mount

Nina Zumel finished new documentation on how vtreat's cross validation works, which I want to share here. vtreat is a system that makes data preparation for machine learning a "one-liner" (available in R or available in Python). We have a set of starting off points here. These documents describe
