def normalizeRatings(Y, R):
Ymean = (np.sum(Y*R, axis=1) / (np.sum(R, axis=1) + 1e-12)).reshape(-1, 1)
Ynorm = Y - np.multiply(Ymean, R)
return(Ynorm, Ymean)
Can someone explain why are we multiplying the mean value with R again when calculating Ynorm? In Ymean, we are already multiplying Y with R to consider only the ratings that have ratings. In other words, why the average rating is setting to 0 specifically when there is no rating for a movie while doing normalization?
Hello @bhavanamalla,
I will let you investigate it first. I recommend you to add a cell like mine to print some shapes and examine them. From your post, it seems to me that you thought the second multiplication was redundant, however, if you examine it, you will find that it does make a difference. Without the second multiplication (which you can actually try), the resulting Ynorm
will be different.
I hope you will figure something out.
After investigation, please make sure to remove the added cell, or it can fail your submission.
Cheers,
Raymond
Thanks for your help.
Now I got why multiplying Y_mean with R is necessary. Y_mean contains mean values for all the movies as a column vector. Since we need to subtract Y_mean from each rating value of Y, we are using the masking matrix R. So the resulting multiplied matrix will be of the same shape as Y to perform element-wise subtraction. Got it.
Hello @bhavanamalla,
Wonderful! I have marked your post as solution 
Cheers,
Raymond