New improved cdata instructional video

February 8, 2020 | 0 Comments

We have a new improved version of the “how to design a cdata/data_algebra data transform” up! The original article, the Python example, and the R example have all been updated to use the new video. Please check it out! [...Read more...]

Data re-Shaping in R and in Python

January 28, 2020 | 0 Comments

Nina Zumel and I have a two new tutorials on fluid data wrangling/shaping. They are written in a parallel structure, with the R version of the tutorial being almost identical to the Python version of the tutorial. This reflects our opinion on the “which is better for data science R or Python?” They both are … Continue reading Data re-Shaping in R and in Python [...Read more...]

sklearn Pipe Step Interface for vtreat

January 14, 2020 | 0 Comments

We’ve been experimenting with this for a while, and the next R vtreat package will have a back-port of the Python vtreat package sklearn pipe step interface (in addition to the standard R interface). This means the user can express easily express modeling intent by choosing between coder$fit_transform(train_data), coder$fit(train_data_cal)$transform(train_data_model), and coder$fit(application_data). We have also regenerated … Continue reading sklearn Pipe Step Interface for vtreat [...Read more...]

Biomedical Data Science Textbook Available

January 14, 2020 | 0 Comments

By Bob Hoyt & Bob Muenchen Data science is being used in many ways to improve healthcare and reduce costs. We have written a textbook, Introduction to Biomedical Data Science, to help healthcare professionals understand the topic and to work … Continue reading → [...Read more...]

MinIO for Machine Learning Model Storage using Python

January 13, 2020 | 0 Comments

MinIO is a object storage database which uses S3(from Amazon). This is a very convenient tool in for data scientists or machine learning engineers to easily collaborate and share data and machine learning models. MinIO is a cloud storage server compatible with Amazon S3, released under Apache License v2. As an object store, MinIO can... Continue Reading → [...Read more...]

New vtreat Feature: Nested Model Bias Warning

January 11, 2020 | 0 Comments

For quite a while we have been teaching estimating variable re-encodings on the exact same data they are later naively using to train a model on, leads to an undesirable nested model bias. The vtreat package (both the R version and Python version) both incorporate a cross-frame method that allows one to use all the … Continue reading New vtreat Feature: Nested Model Bias Warning [...Read more...]

CodeWars: Learn programming through test-driven development

January 8, 2020 | 0 Comments

As I wrote about Project Euler and CodingGame before, someone recommended me CodeWars. CodeWars offers free online learning exercises to develop your programming skills through fun daily challenges. In line with Project Euler, you are tasked with solving increasingly complex programming challenges. At CodeWars, these little problems you need to solve with code are called … Continue reading CodeWars: Learn programming through test-driven development → [...Read more...]

New Timings for a Grouped In-Place Aggregation Task

January 2, 2020 | 0 Comments

I’d like to share some new timings on a grouped in-place aggregation task. A client of mine was seeing some slow performance, so I decided to time a very simple abstraction of one of the steps of their workflow. Roughly, the task was to add in some derived per-group aggregation columns to a few million … Continue reading New Timings for a Grouped In-Place Aggregation Task [...Read more...]

Python Web Scraping: WordPress Visitor Statistics

December 29, 2019 | 0 Comments

I’ve had this WordPress domain for several years now, and in the beginning it was very convenient. WordPress enabled me to set up a fully functional blog in a matter of hours. Everything from HTML markup, external content embedding, databases, and simple analytics was already conveniently set up. However, after a while, I wanted to … Continue reading Python Web Scraping: WordPress Visitor Statistics → [...Read more...]
1 2 3 4 7