Articles by John Mount

How to Prepare Data

September 26, 2019 | John Mount

Real world data can present a number of challenges to data science workflows. Even properly structured data (each interesting measurement already landed in distinct columns), can present problems, such as missing values and high cardinality categorical variables. In this note we describe some great tools for working with such data. ... [...Read more...]

Preparing Data for Supervised Classification

September 24, 2019 | John Mount

Nina Zumel has been polishing up new vtreat for Python documentation and tutorials. They are coming out so good that I find to be fair to the R community I must start to back-port this new documentation to vtreat for R. vtreat is a package for systematically preparing data for ... [...Read more...]

Advanced Data Reshaping in Python and R

September 4, 2019 | John Mount

This note is a simple data wrangling example worked using both the Python data_algebra package and the R cdata package. Both of these packages make data wrangling easy through he use of coordinatized data concepts (relying heavily on Codd’s “rule of access”). The advantages of data_algebra and ... [...Read more...]

New Getting Started with vtreat Documentation

September 2, 2019 | John Mount

Win Vector LLC‘s Dr. Nina Zumel has just released some new vtreat documentation. vtreat is a an all-in one step data preparation system that helps defend your machine learning algorithms from: Missing values Large cardinality categorical variables Novel levels from categorical variables I hoped she could get the Python ... [...Read more...]

Introducing data_algebra

August 26, 2019 | John Mount

This article introduces the data_algebra project: a data processing tool family available in R and Python. These tools are designed to transform data either in-memory or on remote databases. In particular we will discuss the Python implementation (also called data_algebra) and its relation to the mature R implementations (...
[...Read more...]

Eliminating Tail Calls in Python Using Exceptions

August 23, 2019 | John Mount

I was working through Kyle Miller‘s excellent note: “Tail call recursion in Python”, and decided to experiment with variations of the techniques. The idea is: one may want to eliminate use of the Python language call-stack in the case of a “tail calls” (a function call where the result ... [...Read more...]
1 2 3