Do we not have to train the neural network in content based filtering on the ratings y^(i,j) of users in order to find v_m? So in this case we would need to have users to find v_m? After all we didn’t specify cost functions for the individual networks.

You still need to train the neural network ahead to make sure it gives good predictions on v_u and v_m, such that their dot product can accurately predict y^{(i,j)}, the rating of user i to item j. Therefore, the cost function is J=\sum _{(i,j)\in r(i,j)=1}(v_u^{i}v_m^j-y^{(i,j)})^2+regularizations.
Once we have the model, for each movie, we can precompute its v_m and find similar movies by comparing its v_m with other movies’.
I think the course video ‘Deep learning for content-based filtering’ may help you know more details.

I had the same issue with this question. I suggest to add “once the network is trained” before “you can pre-compute…” or maybe write “This can be done even before a new user logs…”

y(i,j) is the actual value. You’re comparing a prediction (v_u[i] * v_m[j]) and subtracting the actual value it should have predicted. This helps us know how close we were in making the prediction. Knowing the cost allows the algorithm to adjust the feature weights and bias in order to perform more accurate predictions.

So when you’re trying to train a model you have some conditions that you KNOW are true. That’s the y(i,j). It’s the correct answer you use to help train your model.

So for example, user i would rate movie j a specific value. Y is that rating.

Let’s say we have users Bob and Alice. And we have the movies The Matrix and The Notebook.
Y(0,0) would be the rating that Bob has given to The Matrix.
Y(1,0) would be Alice’s rating of The Matrix.
Y(0,1) would be the rating that Bob has given the Notebook
Y(1,1) would be the rating that Alice has given the Notebook