Collaborative Filtering
Average Rating
Basic algorithms is to make s(j) the average rating for j:
$$ s(i,j) = \frac{\sum_{ {i} \in \omega_j} r_{ij}}{| \omega_j |} $$
- s(i,j) can depend both on user i and item j
- Omega_j = Set of all users who rated item j
- r_ij = rating user i gave item j
- i: [1,N], N is the number of users
- j: [1,M], M is the number of items
Some Issues of Average Rating
It treats everyone’s rating of the movie equally.
Sparsity
R_{N*M} = user - items ratings matrix of size N * M, User-item matrix is sparse because most entries are empty!
Goal of Collaborative Filtering
- We want to make recommendations
- Most of r(i,j) doesn’t exist, this is good
- Dense matrix is convenient mathematically, but sucks business-wise
- It means every user has already seen every movie, so we have nothing to recommend
- Therefore, the matrix must be sparse in order to actually have items to recommend
Regression
Since we want to predict a real number, so objective is MSE:
$$ MSE = \frac{1}{\omega} \sum_{i,j \in \omega}(r_{ij} - \bar{r}_{ij})^2 $$
omega is the set of pairs (i,j) where user i has rated item j.
User-User Collaborative Filtering
For UCF, the goal is to find the “users like me”. It’s intuitive that if they are “like me”, I’d like movies they’ve rated highly.
The user-item matrix is shown below:
We can see that Alice’s and Bob’s ratings are highly correlated.
Weighted Ratings
Intuitively, I want it to be small for users I don’t agree with, large for users I do agree with:
$$ s(i,j) = \frac{\sum_{ {i,i\prime} \in \omega_j} W_{ii\prime} r_{ij}}{ \sum_{i,i\prime \in \omega_j} W_{ii\prime}} $$
W_ii’ is the weight between user i and user i’.
Deviation
We care how much it deviates user’s own average, but not the absolute rating:
$$ dev(i,j) = r(i,j) - \bar{r_i}, for \ a \ known \ rating $$
The deviation score for use i gave item j is the rating user i gave j minus the average rating of user i across all movies.
Deviation + Weighted Ratings
The score function combining deviation and weighted ratings is shown below:
$$ s(i,j) = \bar{r_i} + \frac{\sum_{ {i,i\prime} \in \omega_j} W_{ii\prime} | r_{i\prime j} - \bar{r}_{i\prime} | }{ \sum_{i,i\prime \in \omega_j} W_{ii\prime}} $$
How to Calculate Weights
Pearson Correlation Coefficient
$$ P(x,y) = \frac{\sum_{i=1}^{N} (x_i - \bar{x}) (y_i - \bar{y})}{\sqrt{\sum_{i=1}^{N} (x_i - \bar{x})^2} \sqrt{\sum_{i=1}^{N} (y_i - \bar{y})^2}} $$
The problem is our matrix is sparse. So we could modify the formula to this:
Cosine Similarity
$$ \cos \theta = \frac{x^T y}{ |x| |y| } = \frac{\sum_{i=1}^{N} x_i y_i}{ \sqrt{\sum_{i=1}^{N} x_i ^2} \sqrt{\sum_{i=1}^{N}y_i^2}} $$
Cosine Similarity VS Pearson Correlation Coefficient
They are the same! Except Pearson is centered. We want to center them anyway because we’re working with deviations, not absolute ratings.
Problem
- If 2 users have zero movies in common, don’t consider i’ in user i’s calculation-it can’t be calculated
- If few(e.g. < 5) then don’t use the weight, since not enough data to be accurate
Neighborhood
- In practice, don’t sum over all users who rated movie j (takes too long)
- It can help to pre-compute weights beforehand
- Non-trivial: instead of summing over all users, take the ones with the highest weight. E.g. use K nearest neighbors, K = 25 up to 50
Item-Item Collaborative Filtering
The correlation between the column vectors is high. If you like Power Rangers, you’ll also like Transformers.
Hot to Calculate Weights for Item Correlation
Item Score
The deviation means how much user i likes item j’, compared to how much everyone else like j’. If user i really like j’(more than other users do) and j is similar to j’(w_jj’ is high), then user i probably likes j too.
UCF VS ICF
- UCF: Choose items for a uses, because those items have been liked by similar users
- ICF: Choose items for a user, because this user has liked similar items in the past
UCF and ICF are mathematically identical.
Summary
For average rating which considers only a single score for each item regardless of which user is looking. Some problems:
- Not all rating should be treated equally
- Users who I agree with should be weighted higher
- Users I disagree with should be weighted lower
By using collaborative filtering method, the s(i,j) score depends on user i and item j. We used the pearson correlation as weights.
By flipping the rating matrix sideways, we can convert our UCF algorithm into an ICF algorithm.
But accuracy is not the most important thing we care about since it can leads to lack of diversity in recommendations which always suggesting “similar products”.
The user-item matrix doesn’t have to be ratings at all!
Note: Cover Picture