Preparing Data for Supervised Classification

September 24, 2019 | 0 Comments

Nina Zumel has been polishing up new vtreat for Python documentation and tutorials. They are coming out so good that I find to be fair to the R community I must start to back-port this new documentation to vtreat for R. vtreat is a package for systematically preparing data for ... [...Read more...]

The Advantages of Record Transform Specifications

September 18, 2019 | 0 Comments

Nina Zumel had a really great article on how to prepare a nice Keras performance plot using R. I will use this example to show some of the advantages of cdata record transform specifications. The model performance data from Keras is in the following...
[...Read more...]

Advanced Data Reshaping in Python and R

September 4, 2019 | 0 Comments

This note is a simple data wrangling example worked using both the Python data_algebra package and the R cdata package. Both of these packages make data wrangling easy through he use of coordinatized data concepts (relying heavily on Codd’s “rule of access”). The advantages of data_algebra and ... [...Read more...]

New Getting Started with vtreat Documentation

September 2, 2019 | 0 Comments

Win Vector LLC‘s Dr. Nina Zumel has just released some new vtreat documentation. vtreat is a an all-in one step data preparation system that helps defend your machine learning algorithms from: Missing values Large cardinality categorical variables Novel levels from categorical variables I hoped she could get the Python ... [...Read more...]

Who’s covered

August 30, 2019 | 0 Comments

One of the simplest options strategies is known as the covered call. For this strategy, an investor who already owns a stock elects to sell (or write) an option contract to surrender that stock at a specified price (known as the strike) at some poin...
[...Read more...]

Introducing data_algebra

August 26, 2019 | 0 Comments

This article introduces the data_algebra project: a data processing tool family available in R and Python. These tools are designed to transform data either in-memory or on remote databases. In particular we will discuss the Python implementation (also called data_algebra) and its relation to the mature R implementations (...
[...Read more...]

Eliminating Tail Calls in Python Using Exceptions

August 23, 2019 | 0 Comments

I was working through Kyle Miller‘s excellent note: “Tail call recursion in Python”, and decided to experiment with variations of the techniques. The idea is: one may want to eliminate use of the Python language call-stack in the case of a “tail calls” (a function call where the result ... [...Read more...]

Tens and twos

August 16, 2019 | 0 Comments

Only three months ago, market pundits were getting lathered up about the potential for an inverted yield curve. We discussed that in our post Fed up. But a lot has changed since then. One oft-used measure of the yield curve, the time spread (10-yea...
[...Read more...]

A weighty matter

August 9, 2019 | 0 Comments

When we were testing random correlations and weighthings in our last post on diversification, we discovered that randomizing correlations often increased portfolio risk. Then, when we randomized stock weightings on top of our random correlations, we...
[...Read more...]

My strategy beats yours!

August 2, 2019 | 0 Comments

Don’t hold your breath. We’re taking a break from our deep dive into diversification. We know how you couldn’t wait for the next installment. But we thought we should revisit our previous post on investing strategies to mix things up a bit. Recall w...
[...Read more...]
1 8 9 10 11 12 15