Clarity on normalizeRatings function in recsys_utils.py module

SRezaS · August 2, 2023, 5:57pm

def normalizeRatings(Y, R):
    """
    Preprocess data by subtracting mean rating for every movie (every row).
    Only include real ratings R(i,j)=1.
    [Ynorm, Ymean] = normalizeRatings(Y, R) normalized Y so that each movie
    has a rating of 0 on average. Unrated moves then have a mean rating (0)
    Returns the mean rating in Ymean.
    """
    Ymean = (np.sum(Y*R, axis=1) / (np.sum(R, axis=1) + 1e-12)).reshape(-1, 1)
    Ynorm = Y - np.multiply(Ymean, R)
    return(Ynorm, Ymean)

Why is Y multiplied by R?

even np.array_equal(Y*R, Y) returns True

SRezaS · August 2, 2023, 6:08pm

Since numpy 1.7.0 this would do the same:

Ymean = Y.mean(axis=1, where=Y != 0).reshape(-1, 1)

TMosh · August 2, 2023, 9:52pm

R is a mask where its values are set to 0 except where a user has reviewed a movie, then it is set to 1.

This prevents un-reviewed movies from influencing the output values.

SRezaS · August 3, 2023, 6:30am

Y is going to be 0 itself wherever there’s an unrated movie.

So the corresponding zeros of R in Y are zero themselves, and 0 × 0 = 0

TMosh · August 3, 2023, 6:34am

You’re making an assumption about how Y is initialized.
Using R, no such assumption is needed.

Topic		Replies	Views
Week2 Lab1 - normalizeRatings function in collaborative filtering Unsupervised Learning, Recommenders, Reinforcement week-2	3	378	August 4, 2023
Confusion Regarding Mean normalizarion Unsupervised Learning, Recommenders, Reinforcement week-2	11	276	February 5, 2024
C3_W2 - Practice Lab 1: Mean Normalization Unsupervised Learning, Recommenders, Reinforcement week-2	6	570	August 24, 2022
Unsupervised Learning, Content-based Filtering Unsupervised Learning, Recommenders, Reinforcement week-2	6	54	July 15, 2024
C1_W2_Lab03 possibly wrong comment? Supervised ML: Regression and Classification week-2	4	526	July 31, 2022

Clarity on normalizeRatings function in recsys_utils.py module

Related topics