Articles by John Mount

data_algebra/rquery as a Category Over Table Descriptions

December 14, 2019 | John Mount

Introduction I would like to talk about some of the design principles underlying the data_algebra package (and also in its sibling rquery package). The data_algebra package is a query generator that can act on either Pandas data frames or on SQL tables. This is discussed on the project ... [...Read more...]

Python changing attribute mystery. Help?

December 7, 2019 | John Mount

Python peeps: any idea why this attribute changes value when I re-examine it? I am using PyCharm, but the calculation is weird even in Jupyter. It doesn’t just seem to be the debugger, running it in Jupyter gives the wrong value (just {'x'}, instead of {'x', 'y'}). The type ... [...Read more...]

Slides for PyData LA 2019 vtreat Talk

December 5, 2019 | John Mount

Slides for PyData LA 2019 vtreat Talk are here! [...Read more...]

Slides from the PyData2019 data_algebra lightning talk

December 4, 2019 | John Mount

Slides from my PyData2019 data_algebra lightning talk are here. [...Read more...]

New Introduction to the data_algebra

October 31, 2019 | John Mount

We’ve had really good progress in bringing the Python data_algebra to feature parity with R rquery. In fact we are able to reproduced the New Introduction to rquery article as a “New Introduction to the data_algebra” here. The idea is: you may have good reasons to want ... [...Read more...]

Free R/datascience Extract: Evaluating a Classification Model with a Spam Filter

October 15, 2019 | John Mount

We are excited to share a free extract of Zumel, Mount, Practical Data Science with R, 2nd Edition, Manning 2019: Evaluating a Classification Model with a Spam Filter. This section reflects an important design decision in the book: teach model evaluation first, and as a step separate from model construction. It ...

[...Read more...]

AI for Engineers

October 9, 2019 | John Mount

For the last year we (Nina Zumel, and myself: John Mount) have had the honor of teaching the AI200 portion of LinkedIn’s AI Academy. John Mount at the LinkedIn campus Nina Zumel designed most of the material, and John Mount has been delivering it and bringing her feedback. We’...

[...Read more...]

vtreat Cross Validation

October 6, 2019 | John Mount

Nina Zumel finished new documentation on how vtreat‘s cross validation works, which I want to share here. vtreat is a system that makes data preparation for machine learning a “one-liner” (available in R or available in Python). We have a set of starting off points here. These documents describe ... [...Read more...]

New vtreat Documentation (Starting with Multinomial Classification)

October 1, 2019 | John Mount

Nina Zumel finished some great new documentation showing how to use Python vtreat to prepare data for multinomial classification mode. And I have finally finished porting the documentation to R vtreat. So we now have good introductions on how to use vtreat to prepare data for the common tasks of: ... [...Read more...]

How to Prepare Data

September 26, 2019 | John Mount

Real world data can present a number of challenges to data science workflows. Even properly structured data (each interesting measurement already landed in distinct columns), can present problems, such as missing values and high cardinality categorical variables. In this note we describe some great tools for working with such data. ... [...Read more...]

« 1 2 3 4 »

Python-bloggers

Data science news and tutorials - contributed by Python bloggers