Within the GLM framework, model coefficients are estimated using iterative reweighted least squares (IRLS), sometimes referred to as Fisher Scoring. This works well, but becomes inefficient as the size of the dataset increases: IRLS relies on the...
Pandas is a powerful, flexible and easy to use open source data analysis and manipulation tool built on top of the Python programming language. It has become the data manipulation library of choice for Machine Learning and Data Science practition... [...Read more...]
This post is intended to shed light on why the closed form solution to linear regression estimates is avoided in statistical software packages. But we start by first by deriving the solution to the normal equations within the standard multivariat...
An analysis may require the ability to generate correlated random samples. For example, imagine we have monthly returns for three financial indicators over a 20 year period. We are interested in modeling these returns using parametric distributio...
Bessel’s correction is the use of instead of in the sample variance formula where is the number of observations in a sample. This method corrects the bias in the estimation of the population variance.
Recall that bias is defined as: where re...
Greetings, humanists, social and data scientists! Have you ever tried to load a large file in Python or R? Sometimes, when we have file sizes in the order of gigabytes, you may experience problems of performance with your program taking an unusual...
Is my code secure? This is something that we should all be thinking (if not worrying) about. A thorough security audit would be the ideal, but what if you don’t have the skills or resources for that? Well, there are some tools that will at least get you part ... [...Read more...]