Collaborative Filtering
Overview
Collaborative filtering is a recommendation system that has revolutionized the e-commerce business. A collaborative system tries to recommend a product to a user based on her past purchase pattern and similarity with other users. This article will discuss different types of collaborative filtering, their limitations, and an example of implementing collaborative filtering in real-life use cases.
Pre-Requisite
- Learners must know the basics of recommendation systems.
- Introductory level knowledge of linear algebra would be helpful to understand the content of the article.
What is Collaborative Filtering?
Collaborative filtering is a popular approach in product recommendation, which tries to identify similarities between users to serve relevant product recommendations. Specifically, it is designed to predict user preference for items based on historical data. Let us try to understand collaborative filtering with an example. Let's assume we have a user U1 who likes products p1,p2, and p4. User U2 likes products p1,p3,p4, and user U3 likes product p1. So our job is to recommend which new product to user U3 next. So here we can see users U1, U2, and U3 like products p1, so the three have the same taste. Now users U1 and U2 like p4, so user U3 could like product p4. Hence, we recommend product p4 to U3; this is the logic flow.
Classes of Collaborative Filtering
Broadly, Collaborative filtering can be divided into two different categories. They are:
- User-based Collaborative Filtering
- Item-to-item-based collaborative filtering
Here we will discuss each in detail.
User-Based Collaborative Filtering
User-Based Collaborative Filtering is used to identify the items that a customer might like based on the ratings given to that item by other customers similar to that of the target user.
Steps for User-based Collaborative Filtering
- Finding the similarity of users to the target user U
Similarity for any two users 'a' and 'b' can be calculated from the given formula,
.... (1)
Here, items and rating of item p by user u. - Prediction of missing rating of an item Here, we multiply each user's rating with a similarity factor calculated using the abovementioned formula. The missing rating can be calculated as, .... (2)
Pros
- User-based collaborative Filtering is straightforward to implement; one can easily calculate a user's rating against a product using the two steps mentioned above.
- With an increasing user base, its performance improves, which is intuitively meaningful.
- Compared to other techniques, such as content-based, it is more accurate.
Cons
- Because the user-product matrix is sparse, it suffers from memory complexity. Also, calculating the nearest neighbor becomes infeasible with an increased user base. We have to calculate the similarity of the target user with all other users.
- Cold-start: New users will have little to no information about them to be compared with other users.
Item-to-Item Based Collaborative Filtering
Item-based collaborative filtering is very similar to user-based collaborative filtering. Here, instead of finding similarities between two users, we find similarities between two items, and later we calculate a user's probable rating of a product.
Steps for Item-to-Item Based Collaborative Filtering
- Item to Item Similarity
The cosine similarity (sometimes adjusted cosine similarity or cosine-based similarity) is used to find the similarity between two items.
.... (3) - Prediction Computation
Here, items that the user rates are identified. Next, from these items, it uses the items that are most similar to the missing item to generate a rating. The formula is given below:
.... (4)
Pros
- With limited ratings, it can work better than user-based collaborative filtering. User-based collaborative filtering requires an item product matrix to be spared to generate accurate results. Item-to-item-based collaborative filtering doesn't have any such limitations.
- Easy to implement, update and maintain. The rating can be easily calculated using the two steps mentioned above.
Cons
- It also suffers from a cold start problem when a new item is added to the catalog.
Cold Start Problem
The cold start problem arises when we don't have past data about a user or an item. For example, when a user first signs-in into an e-commerce website or a new item is added to the catalog, we do not have purchasing history available for the user or the item, respectively. In such a situation, finding similarities among users/items becomes impossible.
Cold Start Problem Solution
- Asking a few basic questions and recommending products based on the answer.
- Recommend the most popular items based on gender or geography.
- Recommend new products randomly to some users who do binge shopping.
A Movie Recommendation Example
Consider a matrix showing four user ratings (Amit, Deepak, Rohan, and Sachin) on different movies. The rating range is from 1 to 5. The '?' indicates that the user has not rated the movie.
- Step 1: Calculating the similarity between Amit and all the other users
At first, we calculate the averages of the ratings of all the users, excluding The Godfather, as Amit does not rate it. Therefore, we calculate the average as,
Therefore, we have , , and .
After calculaing the new matrix as , we got:
Now, we calculate the similarity between Amit and all the other users using equation 1.
sim(Amit, Deepak) = 0.301
sim(Amit, Rohan) = -0.33
sim(Amit, Sachin) = 0.707
- Step 2: Predicting the rating of The Godfather by Alice
We use Amit's rating for The Godfather using equation 2.
rating(Amit, The Godfather) = 3.83
Conclusion:
- We have discussed the definition of collaborative filtering and different types of collaborative filtering recommendation systems here.
- The limitation of collaborative filtering is discussed next regarding the cold start problem.
- A movie recommendation example is presented in the article to understand collaborative filtering better.
- Content-based filtering will be covered in the following article.