Data science

New Introduction to the data_algebra

October 31, 2019 | 0 Comments

We’ve had really good progress in bringing the Python data_algebra to feature parity with R rquery. In fact we are able to reproduced the New Introduction to rquery article as a “New Introduction to the data_algebra” here. The idea is: you may have good reasons to want to work in R or to want to … Continue reading New Introduction to the data_algebra [...Read more...]

Free R/datascience Extract: Evaluating a Classification Model with a Spam Filter

October 15, 2019 | 0 Comments

We are excited to share a free extract of Zumel, Mount, Practical Data Science with R, 2nd Edition, Manning 2019: Evaluating a Classification Model with a Spam Filter. This section reflects an important design decision in the book: teach model evaluation first, and as a step separate from model construction. It is funny, but it … Continue reading Free R/datascience Extract: Evaluating a Classification Model with a Spam Filter [...Read more...]

AI for Engineers

October 9, 2019 | 0 Comments

For the last year we (Nina Zumel, and myself: John Mount) have had the honor of teaching the AI200 portion of LinkedIn’s AI Academy. John Mount at the LinkedIn campus Nina Zumel designed most of the material, and John Mount has been delivering it and bringing her feedback. We’ve just started our 9th cohort. We … Continue reading AI for Engineers [...Read more...]

vtreat Cross Validation

October 6, 2019 | 0 Comments

Nina Zumel finished new documentation on how vtreat‘s cross validation works, which I want to share here. vtreat is a system that makes data preparation for machine learning a “one-liner” (available in R or available in Python). We have a set of starting off points here. These documents describe what vtreat does for you, you … Continue reading vtreat Cross Validation [...Read more...]

New vtreat Documentation (Starting with Multinomial Classification)

October 1, 2019 | 0 Comments

Nina Zumel finished some great new documentation showing how to use Python vtreat to prepare data for multinomial classification mode. And I have finally finished porting the documentation to R vtreat. So we now have good introductions on how to use vtreat to prepare data for the common tasks of: Regression: R regression example, Python … Continue reading New vtreat Documentation (Starting with Multinomial Classification) [...Read more...]

How to Prepare Data

September 26, 2019 | 0 Comments

Real world data can present a number of challenges to data science workflows. Even properly structured data (each interesting measurement already landed in distinct columns), can present problems, such as missing values and high cardinality categorical variables. In this note we describe some great tools for working with such data. For an example: consider the … Continue reading How to Prepare Data [...Read more...]

Preparing Data for Supervised Classification

September 24, 2019 | 0 Comments

Nina Zumel has been polishing up new vtreat for Python documentation and tutorials. They are coming out so good that I find to be fair to the R community I must start to back-port this new documentation to vtreat for R. vtreat is a package for systematically preparing data for supervised machine learning tasks such … Continue reading Preparing Data for Supervised Classification [...Read more...]

The Advantages of Record Transform Specifications

September 18, 2019 | 0 Comments

Nina Zumel had a really great article on how to prepare a nice Keras performance plot using R. I will use this example to show some of the advantages of cdata record transform specifications. The model performance data from Keras is in the following... [...Read more...]

Advanced Data Reshaping in Python and R

September 4, 2019 | 0 Comments

This note is a simple data wrangling example worked using both the Python data_algebra package and the R cdata package. Both of these packages make data wrangling easy through he use of coordinatized data concepts (relying heavily on Codd’s “rule of access”). The advantages of data_algebra and cdata are: The user specifies their desired transform … Continue reading Advanced Data Reshaping in Python and R [...Read more...]

New Getting Started with vtreat Documentation

September 2, 2019 | 0 Comments

Win Vector LLC‘s Dr. Nina Zumel has just released some new vtreat documentation. vtreat is a an all-in one step data preparation system that helps defend your machine learning algorithms from: Missing values Large cardinality categorical variables Novel levels from categorical variables I hoped she could get the Python vtreat documentation up to parity with … Continue reading New Getting Started with vtreat Documentation [...Read more...]
1 2 3