Content-Based Recommenders using Natural Language Processing

In this blog, we talk about the core to Personal-ization. We will discuss Content-based Recomender Systems, what are they, the different types, and how these systems work. We will also create a content-based recommender system for movies.

Recommender System

In the digital world, we are surrounded by Recommender Systems. These recommender systems are the core of our digital customer experience and are most commonly used applications such as Netflix, Amazon, and YouTube.

For Example, when we are watching YouTube it is actively trying to understand your choice(s) and recommend other videos and content based on our history, and preferences. It suggests to us that what program we should purchase next. Equally systems such as Amazon look at the same preferences and history to promote what you should buy next.

By definition recommender systems aim to provide users with relevant suggestions regarding any item, beyond the two examples given these recommenders can be used to track and suggest based on a wide number of social and professional use cases, basically anything where a transaction (click) is involved.

We are not looking at the efficacy or accuracy of such system, we just acknowledge that most modern systems have a recommender engine that enhances the user experience, and targets user retention.

We take a look at two such systems, our aim to provide an outline and their uses;

1. Collaborative Filtering

Collaborative filtering is similar to asking someone to suggest a movie for you to watch, item to buy or a job to apply for. They would go through their personal experience and think that what will suit you best and suggest the most appropriate thing for you.

So, in collaborative filtering, ratings from different people on different items are used to decide recommending a particular item to you. For example, all those people who watched XYZ movie also watched ABC so if you watch XYZ, ABC will be recommended to you.

There are two ways to perform Collaborative filtering.

Memory-based Collaborative filtering: In memory-based collaborative filtering the user rating and choice are used to make the recommendation. In this technique, the data that is stored is the only thing used for recommending any item.

Model-Based Collaborative filtering: In this technique a model is created from stored data. This model is then used for making a recommendation to a user. This technique is more sophisticated as it involves statistics and probability, but it tends to produce better results.

2. Content-Based Recommendation System

This system is based on the creation of a user profile. Whenever a user interacts with, or rates any item the system updates the profile based on the properties associated with items. For example, if I watch a movie the system will add to a profile the attributes of movies i.e. Genre of the movie, director, actor, actress, and plot. Based on all these things it would recommend other movies to me, using my previous preferences as a model.

It is this second model we will use for our engine example.

Creating a Content-based Recommendation System

Now we have the basics of the two types of recommender system let look at creating a model for movies with NLP, so let’s begin.

Step 1: Identify the Data

To create the movie recommendation system the first thing that we need is data, our source data can be accessed from here.

This dataset lists the top 250 highest rated movies from IMDB. It contains 250 rows and 38 different columns, but for our purposes, we have taken 5 columns, Title, Genre, Director, Actors, and Plot.

Data Snapshot

Dataset Snapshot

Steps 2: Process the Data

Now we have the data, we need to clean it up for use in the recommendation. To do this we must perform following operations.

Join the Text:

The first step in the process of cleaning is having the relevant data to clean, so we will join the Genre, Director, Actors, and Pot into a single string. We can then apply the other operations.

Joint text from the first row of the data set

Apply Regular Expression:

In this step, we will clean the joint string, removing any number or special character, so that we are left with a string that only contains the alphabet.

Text after applying the regular expression

Convert Text into Lowercase:

Now that we have only alphabet left in the string, to get consistency in terms of the alphabet we will convert it into lowercase.

Text after changing its case to lower

Apply Stemming:

The strings we have may contain different words with same ‘base word’. For example, a string can contain Fly, Flying, or flew all of which are applied from the same base word ‘fly’. So to improve results and tackle ambiguity in words we will apply stemming to remove these extensions and reduce the string size.

List of stemmed words

Stop word removal:

Additionally, strings can contain words that appear multiple times. Some of these words can be repeated with a high frequency, but have a low statistical importance such as “a” or “the”. It is important that these types of words are removed to achieve a better performance. This creates our final data set.

List after removing Stop words

Step 3: Convert the Data

Until now we have processing data in form of text, but the computer doesn’t understand textin this way. So, it can we need to convert it into a matrix of numbers. To do this we use a countvectorizer.

This will create and return a matrix that will contain the number of times a word occurs in a document. At the end of this step, we have data in terms of a matrix of numbers that can be fed to the computer.

15 most common features and their frequency using countvectorizer

Step 4: Applying the Algorithm

Now we have data that can be fed to an algorithm. For this recommendation system use case we will be using Cosine Similarity. Let’s first understand, what is cosine similarity?

Cosine Similarity

Cosine Similarity is a measure that shows the similarity between two vectors. It is achieved by projecting the vector in multi-dimension, and calculating the cosine angle between them.

Mathematically this is represented by the dot product of two vectors, divided by the magnitude. This can be seen in the figure below.

Equation of cosine similarity

In this use case we use Cosine Similarity to calculate the similarity between two keywords. As in our case we have a matrix of keywords for which we need to establish the similarities between them so they can use it to recommend items in the future.


With this, we have successfully created our recommendation system lets see it in action, it works like this. When you pick a movie or enter a movie that you have watched, the recommender will suggest 10 movies based on it that you may also like to watch. Here in this example, we are going to input “Slumdog Millionaire”, the recommender provided the following results;


Based on these results we can say that our recommender is working as we can see all the movies it recommends has some similarity with “Slumdog Millionaire” for example “Lion” and “Slumdog Millionaire”. Obviously using the small data set we identified result are limited.

With a broader set the recommender would identify that suggestions both have Dev Patel in their cast, or that there are similarities in other areas such as movie type, description similarities or other metrics.

Performance of these type of system depends largely on the input data, as we applied a very limited data set, so we can follow same steps and just increase data set and have better performance. This can be included with reinforcement and other learning as profiles become deeper and more established colaborative filtering.

Complete code of this recommendation system is available at GitHub. Hope you enjoyed the article stay tuned until next on and happy coding ❤

Spread the word

Related posts