Reference course-3 week-2 lecture " Deep learning for content-based filtering" (Link).
In the lecture, the content based filtering model for making predictions is as under:
Predicted user rating of j-th user for i-th movie = Vu(j) . Vm(i)
The above model could be trained from several hundred users and several hundred movies.
However, Vu and Vm are created from neural networks. The neural networks shown in the lecture (for both users and movies) have 32 neurons in the output layer which means both Vu and Vm are row matrices with 32 elements i.e., the shape of both Vu and Vm is (32,1).
Let’s say we want to predict user rating of 200th user on 325th movie. The prediction equation becomes:
Predicted user rating of 200-th user for 325-th movie = Vu(200) . Vm(325)
My question is how are we going to trace (say) 325th user in Vu and (say) 200th movie in Vm when both both them have a shape of (32,1)?
But the lecture never said there is any requirement to keep the output units same as the number of users. Most probably, the output layer’s neurons would always be much less than the number of users/movies in a practical scenario.
From this slide, you can see that the ‘y’ output (the training data) is a matrix of size (i, j), where ‘i’ is the number of movies, and ‘j’ is the number of users.