Question for r(i,j) used in Binary Label recommend system

lxd_1986001 · January 10, 2023, 8:46am

I have a question about the r(i,j) matrix used in the course.
r(i,j) matrix is defined to have 1 value at the position the user rated the movie, and 0 on the position user did not rate the movie.
It is easy to understand this matrix used in rating system (e.g, rating from 0.5 to 5), but if in binary label system, which is introduced in week2 4th video, how to use this r(i,j) matrix?
My question is that, in binary label system, there are only two options, for example, click (y=1) and not click (y=0), then should we define r(i,j) same as y? or we should not use r(i,j) matrix in binary lable system?

rmwkwok · January 10, 2023, 12:11pm

Hello @lxd_1986001,

Below is a screenshot from the Course 3 Week 2 Video 4 which shows that r(i, j) is also used.

As you said, r(i, j) is different from y(i, j) because the former is for “rated or not” and the latter for “rated value”. If in your case, you only have “clicked or not”, then r(i, j) = 1. Note that the purpose for r(i, j) is to screen off invalid user-movie pairs from the cost function so that the model isn’t optimized for those invalid pairs. However, y(i, j) is the label. Therefore, they serve different purposes.

Cheers,
Raymond

lxd_1986001 · January 10, 2023, 1:27pm

Thank you @rmwkwok .
Now I understand that we should still keep r(i,j) in binary label system, but be careful that the purpose of it.
Thank you!

rmwkwok · January 10, 2023, 9:57pm

Hello @lxd_1986001,

I must not have a clear mind last night. If you only have clicked or not, then all r(i,j) are one. Therefore, you can choose to keep it for consistency, or ignore it for simplicity. I will edit my previous answer accordingly.

Cheers,
Raymond

lxd_1986001 · January 11, 2023, 8:07am

Thank you @rmwkwok for clarification. I think this will make sence.

Thank you!

rmwkwok · January 11, 2023, 8:18am

You are welcome @lxd_1986001!

Yeoh_Ji_Dian · February 6, 2023, 6:35am

Hi Raymond (@rmwkwok),

Thank you for your explanation here. A follow-up question:

If our dataset is sparse (75% did not have an interaction) and we are dealing with a binary label situation as mentioned above, setting all values of r(i,j) =1 will mean that we are computing costs for the entire dataset, including those 75% who did not have an interaction. In such situation, would it be ok to still train on the entire dataset or do you recommend removing the 75% who did not have an interaction from training? I am asking this because I see (from blog posts, etc…) people commonly excludes users who had few/no interactions from training data.

rmwkwok · February 6, 2023, 6:42am

@Yeoh_Ji_Dian

It depends on your model’s formulation. If we follow the assignment’s, then we can’t remove it. If you would like to discuss the feasibility of removing it, then it’s better to base the discussion on the content of one of those blog posts.

A possible alternative approach to not ever consider cases where r(i,j)=0 is, for example, we have an embedding layer for users and an embedding layer for items, then the model is so built that we only pick positive pairs (i.e. r(i,j)=1) to do the dot product (plus bias) and we minimize the difference between the result of the dot product and the true rating.

Cheers,
Raymond

Yeoh_Ji_Dian · February 6, 2023, 6:53am

@rmwkwok , thank you for your swift reply.

Is there a reason why we can’t remove those without ratings/interactions from the assignment (other than for the purpose of consistency in grading)?
What is the general rule of thumb when it comes to including or excluding these 75% who did not have interactions in the case of binary labels? I’ve tried both approaches on the dataset I have at work and i found that including the 75% worked way better vs when I exclude them. I suspect this was due to the mean-normalization step which recommended popular (on average) products to the 75% users. However, I just wanted to know your opinion on this from a broader context.

Best regards,
JD

rmwkwok · February 6, 2023, 7:02am

@Yeoh_Ji_Dian,

In the assignment, it uses cofi_cost_func_v in the training process, which computes a part of the cost like this (tf.linalg.matmul(X, tf.transpose(W)) + b - Y)*R. The matrix multiplication will have no choice but compare all possible pairs of users and movies. Therefore we can’t remove them, but to use R to screen them off after we computed them.
Those 75% doesn’t provide any information in the assignment’s problem. It only makes computation more expensive. There is no general rule of thumb to include them. To include them, I think we need justification. The justification behind the assignment is that, I think, it makes the code simpler.

What do you assign to y(i,j) for those with r(i,j)=0? You don’t really need to tell me everything because it may be confidental. However, you may want to think about the rationale behind your assignment of those y(i,j) because those reasons can be your justifications to include them.

Raymond

rmwkwok · February 6, 2023, 7:06am

For example, if no interaction (i.e. r(i,j)=0) does imply something in your business process, then you may want to include some of them. Is this considered a general rule of thumb ?

KahnSlaver · October 23, 2023, 5:23am

Thank you @rmwkwok for the clarification

Topic		Replies	Views
Assignment c3_w2_quizz Unsupervised Learning, Recommenders, Reinforcement week-module-2	4	453	January 17, 2024
Collaborative Filtering Q1 Advanced Learning Algorithms week-module-3	4	524	October 26, 2022
Collaborative Filtering Quiz [SOLVED] Unsupervised Learning, Recommenders, Reinforcement week-module-2	1	1032	September 14, 2022
Practice lab: Deep Learning for Content-Based Filtering Unsupervised Learning, Recommenders, Reinforcement week-module-2	5	74	July 8, 2024
Is the content about collaborative filtering in week2 ultimately Matrix factorization? Unsupervised Learning, Recommenders, Reinforcement week-module-2	10	509	March 4, 2023

Question for r(i,j) used in Binary Label recommend system

Related topics