LIME: How to Interpret Machine Learning Models With Python
Want to share your content on python-bloggers? click here.
Black-box models aren’t cool anymore. It’s easy to build great models nowadays, but what’s going on inside? That’s what Explainable AI and LIME try to uncover.
Don’t feel like reading? Check out my video on the topic:
Knowing why the model makes predictions the way it does is essential for tweaking. Just think about it – if you don’t know what’s going on inside, how the hell will you improve it?
LIME isn’t the only option for machine learning model interpretation. The alternative is SHAP. You can learn more about it here:
Today we also want to train the model ASAP and focus on interpretation. Because of that, the identical dataset and modeling process is used.
After reading this article, you shouldn’t have any problems with explainable machine learning. Interpreting models and the importance of each predictor should become second nature.
The article is structured as follows:
What is LIME?
The acronym LIME stands for Local Interpretable Model-agnostic Explanations. The project is about explaining what machine learning models are doing (source). LIME supports explanations for tabular models, text classifiers, and image classifiers (currently).
To install LIME, execute the following line from the Terminal:
pip install lime
In a nutshell, LIME is used to explain predictions of your machine learning model. The explanations should help you to understand why the model behaves the way it does. If the model isn’t behaving as expected, there’s a good chance you did something wrong in the data preparation phase.
You’ll now train a simple model and then begin with the interpretations.
Model training
You can’t interpret a model before you train it, so that’s the first step. The Wine quality dataset is easy to train on and comes with a bunch of interpretable features. Here’s how to load it into Python:
The first couple of rows look like this:
All attributes are numeric, and there are no missing values, so you can cross data preparation from the list.
Train/Test split is the next step. The column quality
is the target variable, with possible values of good and bad. Set the random_state
parameter to 42 if you want to get the same split:
Model training is the only thing left to do. RandomForestClassifier
from ScikitLearn
will do the job, and you’ll have to fit it on the training set. You’ll get an 80% accurate classifier out of the box (score
):
And that’s all you need to start with model interpretation. You’ll learn how in the next section.
Model interpretation
To start explaining the model, you first need to import the LIME library and create a tabular explainer object. It expects the following parameters:
training_data
– our training data generated with train/test split. It must be in a Numpy array format.feature_names
– column names from the training setclass_names
– distinct classes from the target variablemode
– type of problem you’re solving (classification in this case)
Here’s the code:
And that’s it – you can start interpreting! A bad wine comes in first. The second row of the test set represents wine classified as bad. You can call the explain_instance
function of the explainer
object to, well, explain the prediction. The following parameters are required:
data_row
– a single observation from the datasetpredict_fn
– a function used to make predictions. Thepredict_proba
from the model is a great option because it shows probabilities
Here’s the code:
The show_in_notebook
function shows the prediction interpretation in the notebook environment:
The model is 81% confident this is a bad wine. The values of alcohol, sulphates, and total sulfur dioxide increase wine’s chance to be classified as bad. The volatile acidity is the only one that decreases it.
Let’s take a look at a good wine next. You can find one at the fifth row of the test set:
Here’s the corresponding interpretation:
Now that’s the wine I’d like to try. The model is 100% confident it’s a good wine, and the top three predictors show it.
That’s how LIME works in a nutshell. There are different visualizations available, and you are not limited to interpreting only a single instance, but this is enough to get you started. Let’s wrap things up in the next section.
Conclusion
Interpreting machine learning models is simple. It provides you with a great way of explaining what’s going on below the surface to non-technical folks. You don’t have to worry about data visualization, as the LIME library handles that for you.
This article should serve you as a basis for more advanced interpretations and visualizations. You can always learn further on your own.
What are your thoughts on LIME? Do you want to see a comparison between LIME and SHAP? Please let me know.
Join my private email list for more helpful insights.
The post LIME: How to Interpret Machine Learning Models With Python appeared first on Better Data Science.
Want to share your content on python-bloggers? click here.