In the collaborative filtering section, the cost function combines the cost function to learn w(1), b(1), …, w(n_u), b(n_u) and the cost function to learn x(1), … , x(n_m).
In the cost function to learn w(1), b(1), …, w(n_u), b(n_u), we have a normalizing term 1/2 on the outside. For each individual user’s loss, the normalizing term shall be 1/(2* no. of movies rated by user j). It won’t affect if we remove the constant (no. of movies rated by user j) because we are minimizing each w(i), b(i) which will only affect each user’s loss.
However, when we combine these two cost functions together, it seems give more considerations to users who rate a lot of movies than users who rate only one or two movies because now each w(i), b(i) are no longer independently estimated but interact with x(1) through x(n_m).
The collaborative filtering seems to give each rating the same weight, but not give the same weight to each user. A user who rates a lot wins a lot of weights in the system.