Based on the notations taught in lecture, it should be \sum^{n_m} where n_m is number of movie example. Am I correct?
Hi @tbhaxor ,
x^{(i)} is the features of movie i, and n is the number of features of that particular movie.
Thanks for information but I cant see answer in this post. Could you quote and reply
Hi @tbhaxor ,
I don’t understand why you are not able to see the reply from this post. The reference for my reply is:
video: Using per-item features at timestamp 8:17
Thanks for reply. I tried revisiting the video and it is indeed n_m because this loss function is for a single user and we are going row wise (movies count) in linear function.
Hi @tbhaxor
The loss function in question is for learning the parameters W and b for a single user, the summation part of the function is telling us this loss function is concerned with all the movies that have been rated by this single user.
n_m refers to the total number of movies, not just those rated by that single user.
If you refer back to the video at timestep 9:40, you will see the cost function for learning the parameters W and b for all the users.
Now it makes sense. Shouldnt it have division by n instead of m_j usually in avg we divide by total number of operands, here it is n
Same video, same time stamp 8:17
Hi @tbhaxor ,
If you go back to the top slide of this thread, you can see all the notations and the meaning of m^{(j)}
How would you interpret that cost function if it is divided by n, the number of features, of a movie rated by user J ? Bearing in mind this is the regularization term for this cost function.
Number of movies rated by the user j, which is 1 \le m_j \le n_m.
So this formula is \sum_{k=1}^n (w^j_k)^2 will sum all the weights for n feature in m_j movies.
I see why I had confusion. So earlier we had weights for all the features so it was not a problem. Now the weight is per user and since we are adjusting weights only on the movies that is rated by user j, dividing my m_j makes sense.
Have i gotten it right @Kic ?