Clarity on normalizeRatings function in recsys_utils.py module

def normalizeRatings(Y, R):
    """
    Preprocess data by subtracting mean rating for every movie (every row).
    Only include real ratings R(i,j)=1.
    [Ynorm, Ymean] = normalizeRatings(Y, R) normalized Y so that each movie
    has a rating of 0 on average. Unrated moves then have a mean rating (0)
    Returns the mean rating in Ymean.
    """
    Ymean = (np.sum(Y*R, axis=1) / (np.sum(R, axis=1) + 1e-12)).reshape(-1, 1)
    Ynorm = Y - np.multiply(Ymean, R)
    return(Ynorm, Ymean)

Why is Y multiplied by R?

even np.array_equal(Y*R, Y) returns True

Since numpy 1.7.0 this would do the same:

Ymean = Y.mean(axis=1, where=Y != 0).reshape(-1, 1)

R is a mask where its values are set to 0 except where a user has reviewed a movie, then it is set to 1.

This prevents un-reviewed movies from influencing the output values.

Y is going to be 0 itself wherever there’s an unrated movie.

So the corresponding zeros of R in Y are zero themselves, and 0 × 0 = 0

1 Like

You’re making an assumption about how Y is initialized.
Using R, no such assumption is needed.