I’ll be presenting a ~20-minute presentation to the Data Science Meetup of Rand Merchant Bank (RMB), a group hosted by Matthew Bernath of the Financial Modelling Podcast. The event is on Monday 10/4 at 10a Eastern US; 4p Johannesburg and is open to anyone working with data at RMB, a ... [...Read more...]
I had the perfect environment for my setup and I want to potentially create another environment and add some additional packages, or I want to roll this environment out for training purposes. This is where the YAML file comes in and it is sweet. Head to Anaconda Navigator I load ...
In this tutorial I will take you through how to: Read in data Perform feature engineering, dummy encoding and feature selection Splitting data Training an XGBoost classifier Pickling your model and data to be consumed in an evaluation script Evaluating your model with Confusion Matrices and Classification reports in Sci-kit ...
Data analytics is iterative like the sky is blue… what does that actually mean? I get the sense from many new analysts that they’ve spent a decent amount of time looking through the data and trying different things, and they get that things aren’t linear, but they still ...
Python Updates
At RStudio we know that many data science teams leverage both R and Python in their work, so it’s important that we build products to support the best tools available in both languages. For an overview of all the ways our pro produ...
Getting into analytics can be overwhelming: so many tools to learn and techniques to apply. Advancing into Analytics is a great start made even better with practice and supplemental knowledge. That’s where a class can come in handy, and I’m excited to share that I’m working on ...
OpenCV is open-source library with tools and functionalities that support computer vision. It allows your computer to use complex mathematics to detect lines, shapes, colors, text and what not. OpenCV was originally developed by Intel in 2000 and sometime later someone had the bright idea to build a Python module on ...
TLDR: The number of subsampled features is a main source of randomness and an important parameter in random forests. Mind the different default values across implementations. Randomness in Random Forests Random forests are very popular machine learning models. They are build from easily understandable and well visualizable decision trees and ...
Advancing into Analytics is a wide-reaching a technical book: starting with the foundations of analytics of Excel, readers are introduced to data manipulation, visualization and hypothesis in both Python and R. Learning all this in one book is possible because of what Excel users already know about working with data: ...