The Advantages of Record Transform Specifications

September 18, 2019 | 0 Comments

Nina Zumel had a really great article on how to prepare a nice Keras performance plot using R. I will use this example to show some of the advantages of cdata record transform specifications. The model performance data from Keras is in the following... [...Read more...]

Advanced Data Reshaping in Python and R

September 4, 2019 | 0 Comments

This note is a simple data wrangling example worked using both the Python data_algebra package and the R cdata package. Both of these packages make data wrangling easy through he use of coordinatized data concepts (relying heavily on Codd’s “rule of access”). The advantages of data_algebra and cdata are: The user specifies their desired transform … Continue reading Advanced Data Reshaping in Python and R [...Read more...]

New Getting Started with vtreat Documentation

September 2, 2019 | 0 Comments

Win Vector LLC‘s Dr. Nina Zumel has just released some new vtreat documentation. vtreat is a an all-in one step data preparation system that helps defend your machine learning algorithms from: Missing values Large cardinality categorical variables Novel levels from categorical variables I hoped she could get the Python vtreat documentation up to parity with … Continue reading New Getting Started with vtreat Documentation [...Read more...]

Introducing data_algebra

August 26, 2019 | 0 Comments

This article introduces the data_algebra project: a data processing tool family available in R and Python. These tools are designed to transform data either in-memory or on remote databases. In particular we will discuss the Python implementation (also called data_algebra) and its relation to the mature R implementations (rquery and rqdatatable). Introduction Parts of the … Continue reading Introducing data_algebra [...Read more...]

Eliminating Tail Calls in Python Using Exceptions

August 23, 2019 | 0 Comments

I was working through Kyle Miller‘s excellent note: “Tail call recursion in Python”, and decided to experiment with variations of the techniques. The idea is: one may want to eliminate use of the Python language call-stack in the case of a “tail calls” (a function call where the result is not used by the calling … Continue reading Eliminating Tail Calls in Python Using Exceptions [...Read more...]

Random Forest Classification with Python

March 31, 2019 | 0 Comments

Random forest is a type of machine learning algorithm in which the algorithm makes multiple decision trees that may use different features and subsample to making as many trees as you specify. The trees then vote to determine the class of an example. This approach helps to deal with the high variance that is a […] [...Read more...]

Top 8 Docker Images for Data Science

March 1, 2019 | 0 Comments

Dockerizing Data Science: Introduction PreReqs: Docker, images, and containers Dockerizing data science packages have become more relevant these days mainly because you can isolate your data science projects without breaking anything. Dockerizing data science projects also make most of your projects portable and sharable and not worrying about installing right dependencies (you python fans know... Continue Reading → [...Read more...]
1 2 3 4 5