Collaborative Filtering - Scaler Topics

Overview

Collaborative filtering is a recommendation system that has revolutionized the e-commerce business. A collaborative system tries to recommend a product to a user based on her past purchase pattern and similarity with other users. This article will discuss different types of collaborative filtering, their limitations, and an example of implementing collaborative filtering in real-life use cases.

Pre-Requisite

Learners must know the basics of recommendation systems.
Introductory level knowledge of linear algebra would be helpful to understand the content of the article.

What is Collaborative Filtering?

Collaborative filtering is a popular approach in product recommendation, which tries to identify similarities between users to serve relevant product recommendations. Specifically, it is designed to predict user preference for items based on historical data. Let us try to understand collaborative filtering with an example. Let's assume we have a user U1 who likes products p1,p2, and p4. User U2 likes products p1,p3,p4, and user U3 likes product p1. So our job is to recommend which new product to user U3 next. So here we can see users U1, U2, and U3 like products p1, so the three have the same taste. Now users U1 and U2 like p4, so user U3 could like product p4. Hence, we recommend product p4 to U3; this is the logic flow.

\begin{bmatrix} U1 & p1 & p2 & p4 \\ U2 & p1 & p3 & p4 \\ U3 & p1 & \text{what else we can suggest to U3?} \\ \end{bmatrix}

COLLABORATIVE FILTERING

Classes of Collaborative Filtering

Broadly, Collaborative filtering can be divided into two different categories. They are:

User-based Collaborative Filtering
Item-to-item-based collaborative filtering

Here we will discuss each in detail.

User-Based Collaborative Filtering

User-Based Collaborative Filtering is used to identify the items that a customer might like based on the ratings given to that item by other customers similar to that of the target user.

Steps for User-based Collaborative Filtering

Finding the similarity of users to the target user U Similarity for any two users 'a' and 'b' can be calculated from the given formula, $sim(a, b) = \sum_p(r_{ap}-\bar{r}_a)(r_{ab}-\bar{r}_b)/\sqrt{\sum(r_{ap}-\bar{r}_a)^2}\sqrt{\sum(r_{bp}-\bar{r}_b)^2}$ .... (1)
Here, $p:$ items and $r_{up}:$ rating of item p by user u.
Prediction of missing rating of an item Here, we multiply each user's rating with a similarity factor calculated using the abovementioned formula. The missing rating can be calculated as, $r_{up} =\bar{r}_u +\sum_{i\in users}sim(u,i)*r_{ip}/\sum_{i\in users}|sim(u,i)|$ .... (2)

Pros

User-based collaborative Filtering is straightforward to implement; one can easily calculate a user's rating against a product using the two steps mentioned above.
With an increasing user base, its performance improves, which is intuitively meaningful.
Compared to other techniques, such as content-based, it is more accurate.

Cons

Because the user-product matrix is sparse, it suffers from memory complexity. Also, calculating the nearest neighbor becomes infeasible with an increased user base. We have to calculate the similarity of the target user with all other users.
Cold-start: New users will have little to no information about them to be compared with other users.

Item-to-Item Based Collaborative Filtering

Item-based collaborative filtering is very similar to user-based collaborative filtering. Here, instead of finding similarities between two users, we find similarities between two items, and later we calculate a user's probable rating of a product.

Steps for Item-to-Item Based Collaborative Filtering

Item to Item Similarity The cosine similarity (sometimes adjusted cosine similarity or cosine-based similarity) is used to find the similarity between two items.
$sim(\vec{a},\vec{b})=\vec{a}.\vec{b}/||\vec{a}||*||\vec{b}||$ .... (3)
Prediction Computation Here, items that the user rates are identified. Next, from these items, it uses the items that are most similar to the missing item to generate a rating. The formula is given below:
$rating(U,I_i)=\sum_{j}rating(U,I_{j})*s_{ij}/\sum_{j}s_{ij}$ .... (4)

Pros

With limited ratings, it can work better than user-based collaborative filtering. User-based collaborative filtering requires an item product matrix to be spared to generate accurate results. Item-to-item-based collaborative filtering doesn't have any such limitations.
Easy to implement, update and maintain. The rating can be easily calculated using the two steps mentioned above.

Cons

It also suffers from a cold start problem when a new item is added to the catalog.

Cold Start Problem

The cold start problem arises when we don't have past data about a user or an item. For example, when a user first signs-in into an e-commerce website or a new item is added to the catalog, we do not have purchasing history available for the user or the item, respectively. In such a situation, finding similarities among users/items becomes impossible.

Cold Start Problem Solution

Asking a few basic questions and recommending products based on the answer.
Recommend the most popular items based on gender or geography.
Recommend new products randomly to some users who do binge shopping.

A Movie Recommendation Example

Consider a matrix showing four user ratings (Amit, Deepak, Rohan, and Sachin) on different movies. The rating range is from 1 to 5. The '?' indicates that the user has not rated the movie.

\begin{matrix} & Avatar & Goodfellas & Inception & Raging Bull & The\ Godfather \\ Amit & 5 & 4 & 1 & 4 & ? \\ Deepak & 3 & 1 & 2 & 3 & 3 \\ Rohan & 4 & 3 & 4 & 3 & 5 \\ Sachin & 3 & 3 & 1 & 5 & 4 \end{matrix}

Step 1: Calculating the similarity between Amit and all the other users At first, we calculate the averages of the ratings of all the users, excluding The Godfather, as Amit does not rate it. Therefore, we calculate the average as,
$r_i=\sum_{p}r_{ip}/\sum{p}$

Therefore, we have $\bar{r}_{Amit}=3.5$ , $\bar{r}_{Deepak}=2.25$ , $\bar{r}_{Rohan}=3.5$ and $\bar{r}_{Sachin}=3$ .

After calculaing the new matrix as $r_{ip}\prime = r_{ip} - \bar{r}$ , we got:

\begin{matrix} & Avatar & Goodfellas & Inception & Raging Bull\\ Amit & 1.5 & 0.5 & -2.5& 0.5 \\ Deepak & 0.75 & -1.25 & -0.25 & 0.75 \\ Rohan & 0.5 & -0.5 & 0.5 & -0.5 \\ Sachin & 0 & 0 & -2 & 2 \end{matrix}

Now, we calculate the similarity between Amit and all the other users using equation 1.
sim(Amit, Deepak) = 0.301
sim(Amit, Rohan) = -0.33
sim(Amit, Sachin) = 0.707

Step 2: Predicting the rating of The Godfather by Alice We use Amit's rating for The Godfather using equation 2.
rating(Amit, The Godfather) = 3.83

Conclusion:

We have discussed the definition of collaborative filtering and different types of collaborative filtering recommendation systems here.
The limitation of collaborative filtering is discussed next regarding the cold start problem.
A movie recommendation example is presented in the article to understand collaborative filtering better.
Content-based filtering will be covered in the following article.