I was taking the coursera lecture and was more or less confused in the week2 recommender systems.
In the mean normalization explanation, when a new user has no ratings, it was said that it helps the machine learning to run more efficiently and I am not sure why this is the case.
The 5th new user has no ratings so far, and it can be masked with r=0 which has been done for other users with some number of ratings as far as I understood. Is it because of the regularization penalizing w and x and thus making the machine to say the 5th new user will give all 0 ratings for all movies which is not reasonable? – this makes sense, OK.
But is this strictly limited users with no ratings yet so far? Or is this mean normalization replacing the whole masking with r=0 that was explained previously?
And also why would this be more “efficient”? Is it in terms of speed?
It’s been some time! Have you got some answer of your own to your questions?
Andrew started the lecture with two advantages that the mean normalization can bring to us - 1. efficient training; 2. perform better. However, I don’t think the lecture had explained about the first advantage and I think he was only sharing it as his experience. The main part of the lecture, as well as the 3rd paragraph of your post, was to explain the second advantage.
You have correctly explained why it will be predicting 0 ratings for all movies without mean normalization, and yes, the lecture is suggesting that predicting the mean rating value seems more reasonable.
If we examine the symbol (\mu_i) for it, we see that there is one mean value for each movie i, so this is a skill applied across all movies instead of some choices of users (like new user).
r is always there, regardless we use mean normalization or not, to make sure our model is optimized with only existing ratings.