How to do prediction in collaborative filtering algorithm?

Hi experts, I have a question for the prediction using collaborative filtering algorithm.
In the lecturer, X, W, b are all considered as paramters to be optimized using gradient decent, then when to predict a new dataset including new users and new movies, and the new users have not rated any movie and the new movies have no any features. In this case, how to calculate the prediction using trained X, W, b?
Thank you!

Hello @lxd_1986001,

The issue you are sharing with us is called the cold-start problem. Collaborative filtering requires some interactions between an user and a movie to learn about the user and the movie, since an user hasn’t been learnt by the model, the model can’t make a recommendation for that user. The traditional way would be to resort to the Content-based filtering approach where an user is identified by their “content” and “content” information (such as location and browser type) can be collected even for a new user. However, since recommendation is a very practical algorithm, you should be able to find other ways on Google that addresses this issue.

Cheers,
Raymond

1 Like

Thank you @rmwkwok . I see the issue here by the cold start, and there is one method introduced in the lecture, which is mean normalization. The mean normalization example in lecture is for rating system, but it is also good choice for binary labels system? Since there is only 0/1 in the Y matrix.

As you mentioned, I think I need to google more information on the solution of recommendation and cold start issue.

Thank you!

Hello @lxd_1986001,

The mean normalization is a nice catch, but let me also share my view about such approach:

  1. From the lecture at around 6 minute 47 second, there are 2 consequences discussed using the mean normalization

    • “the optimization algorithm for the recommended system will also run just a little bit faster”
    • “it does make the algorithm behave much better for users who have rated no movies or very small numbers of movies”
  2. For users who have rated no movies at all, which is the focus of the cold start problem, we are actually only assigning them an average rating for each movie. Therefore, if you have 100 new users coming from 10 different countries, your model is going to predict one same averaged rating for each movie for all of those new users. This is helpful because we won’t end up with nothing, but this is also NOT very helpful because we know those 100 new users are going to have different perferences. Collaborative filtering is good at learning different users’ perferences by their interactions, but if there is no interactions at all for those new users, then it can’t learn any perference and can only resort to the “default” which is the “averaged” perference.

  3. Therefore, the mean normalization does make the collaborative filtering address the “new users” AS “the average”, but if you want something more which means if you have some additional information such as the “content” that I mentioned in my previous reply, then that is the time you want more than just the collaborative filtering to shine in your recommendation system. A recommendation system doesn’t have to only have one algorithm, but you can combine many algorithms in some way to make any final decision.

Please feel free to share your finding here if you want to discuss or even if you want to develop something.

Cheers,
Raymond

Hi Raymond,

Thanks for your detailed description.
What I thought here is that, as we always do in tranditional ML project by spliting data set to Train, CV and Test, and we train the model on Train set and validate on CV/Test. So I thought if collaborative filtering algorithm we trained using Train set, then how to do the prediction on CV/Test set for the validation. But it seems in such recommendation system the way of working different? There is actually no such data set split, and just train the model to calculate the Y matrix for users (including new users) and X matrix for movies (including new movies). Am I right?

Regards,
Liu

Hello @lxd_1986001,

I think it’s always important for us to find a way to do evaluation. The question is how to design the evaluation process to make it fair.

For example, let’s say user A has rated 15 movies, and we trained the model with all 15. Then one question could be, in production time, if the model thinks that user A would rate movie X as 5 stars, then whether user A will really finally be rating it as 5 stars?

Now, if I take a step back, let’s say I keep 1 of the 15 movies from the training set, and let the model be trained with only 14 of them. Then we can ask the same question, if the model thinks that user A would rate that 15th movie as 3 stars, then whether user A had really rated it as 3 stars.

You might come up with other way to ask the question, but I think the above one sounds pretty reasonable, and therefore, following the flow of my question, I would keep some ratings away from the model and they will serve as my evaulation set.

However, we would need to pay extra attention to understanding the evaluation results, because some users have rated significantly less movies, and even if we just keep 1 of their rated movies for evaluation, it could already be a very significant share, and consequently the model’s performance on those users might be worse than average. Of course, we always need to pick our evaluation set randomly, but in your analysis, you might want to look at the results by some groupings to understand how the model behaves among different groups.

Cheers,
Raymond

1 Like