Articles by Dr. Darrin

Principal Component Analysis with Python VIDEO

April 21, 2022 | Dr. Darrin

Principal component analysis is a tool for reducing the number of variables in a dataset without losing too much information. This is a great way to summarize information or to simplify things for a more complex analysis. The video provides a simple example of how to do this.
[...Read more...]

Data Visualization with Altair VIDEO

April 17, 2022 | Dr. Darrin

Python has a great library called that Altair that makes it really easy to make various data visualizations. The primary strength of this particular library is how easy it is to use and to also create interactive plots. The video below provides an introduction to using this innovative tool.
[...Read more...]

Visualizations with Altair

February 27, 2022 | Dr. Darrin

We are going to take a look at Altair which is a data visulization library for Python. What is unique abiut Altair compared to other packages experienced on this blog is that it allows for interactions. The interactions can take place inside jupyter or they can be exported and loaded ...
[...Read more...]

Random Forest Classification with Python

March 31, 2019 | Dr. Darrin

Random forest is a type of machine learning algorithm in which the algorithm makes multiple decision trees that may use different features and subsample to making as many trees as you specify. The trees then vote to determine the class of an example. This approach helps to deal with the ...
[...Read more...]

Data Exploration Case Study: Credit Default

February 21, 2019 | Dr. Darrin

Exploratory data analysis is the main task of a Data Scientist with as much as 60% of their time being devoted to this task. As such, the majority of their time is spent on something that is rather boring compared to building models. This post will provide a simple example of ...
[...Read more...]

RANSAC Regression in Python

February 7, 2019 | Dr. Darrin

RANSAC is an acronym for Random Sample Consensus. What this algorithm does is fit a regression model on a subset of data that the algorithm judges as inliers while removing outliers. This naturally improves the fit of the model due to the removal of some data points. The process that ...
[...Read more...]

Combining Algorithms for Classification with Python

January 20, 2019 | Dr. Darrin

Many approaches in machine learning involve making many models that combine their strength and weaknesses to make more accuracy classification. Generally, when this is done it is the same algorithm being used. For example, random forest is simply many decision trees being developed. Even when bagging or boosting is being ...
[...Read more...]

Gradient Boosting Regression in Python

January 13, 2019 | Dr. Darrin

In thisĀ  post, we will take a look at gradient boosting for regression. Gradient boosting simply makes sequential models that try to explain any examples that had not been explained by previously models. This approach makes gradient boosting superior to AdaBoost. Regression trees are mostly commonly teamed with boosting. There ...
[...Read more...]

Gradient Boosting Classification in Python

January 8, 2019 | Dr. Darrin

Gradient Boosting is an alternative form of boosting to AdaBoost. Many consider gradient boosting to be a better performer than adaboost. Some differences between the two algorithms is that gradient boosting uses optimization for weight the estimators. Like adaboost, gradient boosting can be used for most algorithms but is commonly ...
[...Read more...]

AdaBoost Regression with Python

January 6, 2019 | Dr. Darrin

This post will share how to use the adaBoost algorithm for regression in Python. What boosting does is that it makes multiple models in a sequential manner. Each newer model tries to successful predict what older models struggled with. For regression, the average of the models are used for the ...
[...Read more...]
1 2 3 4 5