How do we get to know to that the X1 is romance ,action etc., in real life

Hello @Srikarsana,

Let’s take a step back to a slightly bigger picture, and then I will go back to your questions.

  1. There are 7 videos for collaborative filtering. At the end of the first one “Making recommendations”, Andrew said:

And in the next video we’ll start to develop an algorithm for doing exactly that. But making one very special assumption. Which is we’re going to assume temporarily that we have access to features or extra information about the movies such as which movies are romance movies, which movies are action movies. And using that will start to develop an algorithm. But later this week will actually come back and ask what if we don’t have these features, how can you still get the algorithm to work then? But let’s go on to the next video to start building up this algorithm.

  1. It is such assumption that we based on to continue to the disucssion in the 2nd video in which we know BOTH there are features romance and action AND their feature values.

  2. Then in the 3rd video, we ONLY know there are the two features BUT the values are to be determined by algorithm.

  3. And then in the 4th video on, we basically won’t see even the two features anymore, leaving the features as some x^{(i)}

  4. From the above, we see how Andrew had used “romance” and “action” and some movie names that are apparent to be of those genres to INTRODUCE the idea of expressing a movie in terms of some features THAT the values of those features can be determined by algorithm. And extremely importantly (!!!), the algorithm does not limit the number of features at all, which means that we can ASSIGN any number of features, and then the algorithm will determine those feature values! Can you see this too? It is our job to suggest the number of features, and the algorithm will do the rest.

Now, back to your questions:

No, in real-world data, we are not sure of it, unless we had human experts going through all the movies in the database. Setting it to “2” features is a searching process, and their interpretation can’t be told at all. Interpreting them as Romance and Action in videos is, I believe, only for the sake of discussion.

Agreed. There can be more than 2 genres too.

We won’t know. We only know that, if 10 is the enough number of features, then those are the 10 features that are sufficient to describe the movies given the data and the algorithm.

People probably want to identify them, but it is very challenging if not impossible.

We need to think a little bit out-of-the-box. Useful features don’t limit to genres, and it can be anything. So, even if you have a team of movie experts who can review the movies for you and suggest for a number of features, it might only be a good reference.

At the end of the day, you need your algorithm to perform well BASED ON some metrics. Therefore, you will want to search for a good number of features that will make your algorithm to score high metric values on some data unknown to the training process.

In the 2nd video, we already know the feature values because it is for demonstration. The idea of the collaborative filtering is that you don’t need to assign those values, but you let the algorithm determine them for you. So, at first, you can initialize those feature values randomly (just like what you will do to initialize a neural network’s weights), and then the algorithm will tune them to some values that are optimizing the cost function.

Cheers,
Raymondf

4 Likes