mlsauce’s `v0.13.0`: taking into account inputs heterogeneity through clustering

This article was first published on T. Moudiki's Webpage - Python , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Last week in #134, I talked about mlsauce’s v0.12.0, and LSBoost in particular. As shown in the post, it’s now possible to obtain prediction intervals for the regression model, notably by employing Split Conformal Prediction.

Right now (looking for ways to fix it), the best way to install the package, is to use the development version:

pip install git+https://github.com/Techtonique/mlsauce.git --verbose

Now, in v0.13.0, it’s possible to add explanatory variables’ heterogeneity to the mix; through clustering (K-means and Gaussian Mixtures models). This means that, a priori, and in order to assess the conditional expectation of the variable of interest as a function of our covariates, we explicitly tell the model to take into account similarities between individual observations. Some examples of use of this new feature can be found here, here and here. Keep in mind however: these examples only show that it’s possible to overfit the training set (hence reducing the loss function’s magnitude) by adding some clusters. The whole model’s hyperparameters need to be ‘fine-tuned’, for example by using  GPopt.

pres-image

To leave a comment for the author, please follow the link and comment on their blog: T. Moudiki's Webpage - Python .

Want to share your content on python-bloggers? click here.