Recommendation Engine with Python

This article was first published on python – educational research techniques , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Recommendation engines make future suggestion to a person based on their prior behavior. There are several ways to develop recommendation engines but for purposes, we will be looking at the development of a user-based collaborative filter. This type of filter takes the ratings of others to suggest future items to another user based on the other user’s ratings.

Making a recommendation engine in Python actually does not take much code and is somewhat easy consider what can be done through coding. We will make a movie recommendation engine using data from movielens.

 

Below is the link for downloading the zip file 

Inside the zip file are several files we will use. We will use each in a few moments. Below is the initial code to get started

import pandas as pd
from scipy.sparse import csr_matrix
from sklearn.decomposition import TruncatedSVD
import numpy as np

We will now make 4 dataframes. Dataframes 1-3 will be the user, rating, and movie title data. The last dataframe will be a merger of the first 3. The code is below with a printout of the final result.

user = pd.read_table('/home/darrin/Documents/python/new/ml-1m/users.dat', sep='::', header=None, names=['user_id', 'gender', 'age', 'occupation', 'zip'],engine='python')
rating = pd.read_table('/home/darrin/Documents/python/new/ml-1m/ratings.dat', sep='::', header=None, names=['user_id', 'movie_id', 'rating', 'timestamp'],engine='python')
movie = pd.read_table('/home/darrin/Documents/python/new/ml-1m/movies.dat', sep='::', header=None, names=['movie_id', 'title', 'genres'],engine='python')
MovieAll = pd.merge(pd.merge(rating, user), movie)

We now need to create a matrix using the .pivot_table function. This matrix will include ratings and user_id from our “MovieAll” dataframe. We will then move this information into a dataframe called “movie_index”. This index will help us keep track of what movie each column represents. The code is below.

rating_mtx_df = MovieAll.pivot_table(values='rating', index='user_id', columns='title', fill_value=0)

There are many variables in our matrix. This makes the computational time long and expensive. To reduce this we will reduce the dimensions using the TruncatedSVD function. We will reduce the matrix to 20 components. We also need to transform the data because we want the Vh matrix and no tthe U matrix. All this is hand in the code below.

recomm = TruncatedSVD(n_components=20, random_state=10)
R = recomm.fit_transform(rating_mtx_df.values.T)

What we saved our modified dataset as “R”. If we were to print this it would show that each row has two columns with various numbers in it that cannot be interpreted by us.  Instead, we will move to the actual recommendation part of this post.

To get a recommendation you have to tell Python the movie that you watch first. Python will then compare this movie with other movies that have a similiar rating and genera in the training dataset and then provide recommendation based on which movies have the highest correlation to the movie that was watched.

We are going to tell Python that we watched “One Flew Over the Cuckoo’s Nest” and see what movies it recommends.

First, we need to pull the information for just “One Flew Over the Cuckoo’s Nest”  and place this in a matrix. Then we need to calculate the correlations of all our movies using the modified dataset we named “R”. These two steps are completed below.

cuckoo_idx = list(movie_index).index("One Flew Over the Cuckoo's Nest (1975)")
correlation_matrix = np.corrcoef(R)

Now we can determine which movies have the highest correlation with our movie. However, to determine this, we must gvive Python a range of acceptable correlations. For our purposes we will set this between 0.93 and 1.0. The code is below with the recommendations.

P = correlation_matrix[cuckoo_idx]
print (list(movie_index[(P > 0.93) & (P < 1.0)]))
['Graduate, The (1967)', 'Taxi Driver (1976)']

You can see that the engine recommended two movies which are “The Graduate” and “Taxi Driver”. We could increase the number of recommendations by lower the correlation requirement if we desired.

Conclusion

Recommendation engines are a great tool for generating sales automatically for customers. Understanding the basics of how to do this a practical application of machine learning

 

To leave a comment for the author, please follow the link and comment on their blog: python – educational research techniques .

Want to share your content on python-bloggers? click here.