LIME: How to Interpret Machine Learning Models With Python

This article was first published on python – Better Data Science , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Black-box models aren’t cool anymore. It’s easy to build great models nowadays, but what’s going on inside? That’s what Explainable AI and LIME try to uncover. 

Don’t feel like reading? Check out my video on the topic:

Knowing why the model makes predictions the way it does is essential for tweaking. Just think about it – if you don’t know what’s going on inside, how the hell will you improve it? 

LIME isn’t the only option for machine learning model interpretation. The alternative is SHAP. You can learn more about it here:

SHAP: How to Interpret Machine Learning Models With Python

Today we also want to train the model ASAP and focus on interpretation. Because of that, the identical dataset and modeling process is used.

After reading this article, you shouldn’t have any problems with explainable machine learning. Interpreting models and the importance of each predictor should become second nature.

The article is structured as follows:

What is LIME?

The acronym LIME stands for Local Interpretable Model-agnostic Explanations. The project is about explaining what machine learning models are doing (source). LIME supports explanations for tabular models, text classifiers, and image classifiers (currently).

To install LIME, execute the following line from the Terminal:

pip install lime

In a nutshell, LIME is used to explain predictions of your machine learning model. The explanations should help you to understand why the model behaves the way it does. If the model isn’t behaving as expected, there’s a good chance you did something wrong in the data preparation phase.

You’ll now train a simple model and then begin with the interpretations.

Model training

You can’t interpret a model before you train it, so that’s the first step. The Wine quality dataset is easy to train on and comes with a bunch of interpretable features. Here’s how to load it into Python:

The first couple of rows look like this:

Wine quality dataset head

Image 1 – Wine quality dataset head (image by author)

All attributes are numeric, and there are no missing values, so you can cross data preparation from the list. 

Train/Test split is the next step. The column quality is the target variable, with possible values of good and bad. Set the random_state parameter to 42 if you want to get the same split:

Model training is the only thing left to do. RandomForestClassifier from ScikitLearn will do the job, and you’ll have to fit it on the training set. You’ll get an 80% accurate classifier out of the box (score):

 

And that’s all you need to start with model interpretation. You’ll learn how in the next section.

Model interpretation

To start explaining the model, you first need to import the LIME library and create a tabular explainer object. It expects the following parameters:

  • training_data – our training data generated with train/test split. It must be in a Numpy array format.
  • feature_names – column names from the training set
  • class_names – distinct classes from the target variable
  • mode – type of problem you’re solving (classification in this case)

Here’s the code:

And that’s it – you can start interpreting! A bad wine comes in first. The second row of the test set represents wine classified as bad. You can call the explain_instance function of the explainer object to, well, explain the prediction. The following parameters are required:

  • data_row – a single observation from the dataset
  • predict_fn – a function used to make predictions. The predict_proba from the model is a great option because it shows probabilities

Here’s the code:

The show_in_notebook function shows the prediction interpretation in the notebook environment:

Interpretation for a bad wine

Image 2 – LIME interpretation for a bad wine (image by author)

The model is 81% confident this is a bad wine. The values of alcoholsulphates, and total sulfur dioxide increase wine’s chance to be classified as bad. The volatile acidity is the only one that decreases it.

Let’s take a look at a good wine next. You can find one at the fifth row of the test set:

Here’s the corresponding interpretation:

Interpretation for a good wine

Image 3 – LIME interpretation for a good wine (image by author)

Now that’s the wine I’d like to try. The model is 100% confident it’s a good wine, and the top three predictors show it. 

That’s how LIME works in a nutshell. There are different visualizations available, and you are not limited to interpreting only a single instance, but this is enough to get you started. Let’s wrap things up in the next section.

Conclusion

Interpreting machine learning models is simple. It provides you with a great way of explaining what’s going on below the surface to non-technical folks. You don’t have to worry about data visualization, as the LIME library handles that for you.

This article should serve you as a basis for more advanced interpretations and visualizations. You can always learn further on your own.

What are your thoughts on LIME? Do you want to see a comparison between LIME and SHAP? Please let me know.

Join my private email list for more helpful insights.

The post LIME: How to Interpret Machine Learning Models With Python appeared first on Better Data Science.

To leave a comment for the author, please follow the link and comment on their blog: python – Better Data Science .

Want to share your content on python-bloggers? click here.