mlsauce’s `v0.13.0`: taking into account inputs heterogeneity through clustering
Want to share your content on python-bloggers? click here.
Last week in #134, I talked about mlsauce
’s v0.12.0
, and LSBoost
in particular. As shown in the post, it’s now possible to obtain prediction intervals for the regression model, notably by employing Split Conformal Prediction.
Right now (looking for ways to fix it), the best way to install the package, is to use the development version:
pip install git+https://github.com/Techtonique/mlsauce.git --verbose
Now, in v0.13.0
, it’s possible to add explanatory variables’ heterogeneity to the mix; through clustering (K-means and Gaussian Mixtures models). This means that, a priori, and in order to assess the conditional expectation of the variable of interest as a function of our covariates, we explicitly tell the model to take into account similarities between individual observations. Some examples of use of this new feature can be found here, here and here. Keep in mind however: these examples only show that it’s possible to overfit the training set (hence reducing the loss function’s magnitude) by adding some clusters. The whole model’s hyperparameters need to be ‘fine-tuned’, for example by using GPopt.
Want to share your content on python-bloggers? click here.