Based on the notations taught in lecture, it should be \sum^{n_m} where n_m is number of movie example. Am I correct?

Hi @tbhaxor ,

x^{(i)} is the features of movie i, and n is the number of features of that particular movie.

Thanks for information but I cant see answer in this post. Could you quote and reply

Hi @tbhaxor ,

I don’t understand why you are not able to see the reply from this post. The reference for my reply is:

video: Using per-item features at timestamp 8:17

Thanks for reply. I tried revisiting the video and it is indeed n_m because this loss function is for a single user and we are going row wise (movies count) in linear function.

Hi @tbhaxor

The loss function in question is for learning the parameters W and b for a single user, the summation part of the function is telling us this loss function is concerned with all the movies that have been rated by this single user.

n_m refers to the total number of movies, not just those rated by that single user.

If you refer back to the video at timestep 9:40, you will see the cost function for learning the parameters W and b for all the users.

Now it makes sense. Shouldnt it have division by n instead of m_j usually in avg we divide by total number of operands, here it is n

Same video, same time stamp 8:17

Hi @tbhaxor ,

If you go back to the top slide of this thread, you can see all the notations and the meaning of m^{(j)}

How would you interpret that cost function if it is divided by n, the number of features, of a movie rated by user J ? Bearing in mind this is the regularization term for this cost function.

Number of movies rated by the user j, which is 1 \le m_j \le n_m.

So this formula is \sum_{k=1}^n (w^j_k)^2 will sum all the weights for n feature in m_j movies.

I see why I had confusion. So earlier we had weights for all the features so it was not a problem. Now the weight is per user and since we are adjusting weights only on the movies that is rated by user j, dividing my m_j makes sense.

Have i gotten it right @Kic ?