Content based filtering: How to trace j-th user and i-th movie in 'Vu' and 'Vm'?

Reference course-3 week-2 lecture " Deep learning for content-based filtering" (Link).

In the lecture, the content based filtering model for making predictions is as under:

Predicted user rating of j-th user for i-th movie = Vu(j) . Vm(i)

The above model could be trained from several hundred users and several hundred movies.

However, Vu and Vm are created from neural networks. The neural networks shown in the lecture (for both users and movies) have 32 neurons in the output layer which means both Vu and Vm are row matrices with 32 elements i.e., the shape of both Vu and Vm is (32,1).

Let’s say we want to predict user rating of 200th user on 325th movie. The prediction equation becomes:

Predicted user rating of 200-th user for 325-th movie = Vu(200) . Vm(325)

My question is how are we going to trace (say) 325th user in Vu and (say) 200th movie in Vm when both both them have a shape of (32,1)?

1 Like

You’re not limited to only having 32 output units. That’s just the example used in the lecture.

But the lecture never said there is any requirement to keep the output units same as the number of users. Most probably, the output layer’s neurons would always be much less than the number of users/movies in a practical scenario.

1 Like

A correction to my previous reply:

From the video at 1:19, there is a v_u vector for each user. The 25 output units shown here define the user’s characteristics.

At 2:23, Andrew says the v_u and v_m must be the same size.

The number of output units is determined experimentally.

Around 3:07, he goes on to say that there can be different vectors for each movie, since each user can give ratings to many movies.

From this slide, you can see that the ‘y’ output (the training data) is a matrix of size (i, j), where ‘i’ is the number of movies, and ‘j’ is the number of users.

A row matrix has a shape of (1, 32).

As you have pointed out, (1, 32) is for one user or one movie, so we need (n, 32) for n users or n movies.

The user branch of the neural network can accept n user’s inputs and produces (n, 32) as output, and likewise for the movie branch.

Cheers,
Raymond

1 Like