How do we get to know to that the X1 is romance ,action etc., in real life

Hi community,
Good day !
I was watching the video and it got me thinking into the following.
All we have is users, movies and user ratings to each movie. Some how magically we also have W parameters.
But now based on this we run the algorithm and we derive values X1 and X2 . But how are we sure that X1 is indicating romance and X2 is indicating action.
For the purpose of teaching a small example is considered. But in reality there are multiple genres and it is not that a romance movie will not contain action. There will be some percentage of action to it.
Lets say we define some 10 features then how to get to know which feature is what ?
First of all is it needed to identify such things ?
How to determine the number of features to divided into ? Is it based on the genres present in the movie data base ?
Who will give the initial values to X as users only rate movies but user is not going to write that a certain movie is 57% romance , 43% action ?

I request your help in understanding this concept.

Thanks a lot in Advance,
Srikar Sana

Hi @Srikarsana

Not sure if I can do a good job to explain your questions, but I will give it a try. I am sure other mentors will chip in with better explanation.
At the beginning of this lecture video, prof. Ng is trying to show us the relationships between, W, b and X, how these values come about is ignore (He did caution us on this). Later, he showed us W, b and X are parameters that can be learned from other users who have rated the movies. From time 7:34 on, the formula for the cost function to find W, b and X is explained, and this is how these parameters are learned.

1 Like

Hello @Srikarsana,

Let’s take a step back to a slightly bigger picture, and then I will go back to your questions.

  1. There are 7 videos for collaborative filtering. At the end of the first one “Making recommendations”, Andrew said:

And in the next video we’ll start to develop an algorithm for doing exactly that. But making one very special assumption. Which is we’re going to assume temporarily that we have access to features or extra information about the movies such as which movies are romance movies, which movies are action movies. And using that will start to develop an algorithm. But later this week will actually come back and ask what if we don’t have these features, how can you still get the algorithm to work then? But let’s go on to the next video to start building up this algorithm.

  1. It is such assumption that we based on to continue to the disucssion in the 2nd video in which we know BOTH there are features romance and action AND their feature values.

  2. Then in the 3rd video, we ONLY know there are the two features BUT the values are to be determined by algorithm.

  3. And then in the 4th video on, we basically won’t see even the two features anymore, leaving the features as some x^{(i)}

  4. From the above, we see how Andrew had used “romance” and “action” and some movie names that are apparent to be of those genres to INTRODUCE the idea of expressing a movie in terms of some features THAT the values of those features can be determined by algorithm. And extremely importantly (!!!), the algorithm does not limit the number of features at all, which means that we can ASSIGN any number of features, and then the algorithm will determine those feature values! Can you see this too? It is our job to suggest the number of features, and the algorithm will do the rest.

Now, back to your questions:

No, in real-world data, we are not sure of it, unless we had human experts going through all the movies in the database. Setting it to “2” features is a searching process, and their interpretation can’t be told at all. Interpreting them as Romance and Action in videos is, I believe, only for the sake of discussion.

Agreed. There can be more than 2 genres too.

We won’t know. We only know that, if 10 is the enough number of features, then those are the 10 features that are sufficient to describe the movies given the data and the algorithm.

People probably want to identify them, but it is very challenging if not impossible.

We need to think a little bit out-of-the-box. Useful features don’t limit to genres, and it can be anything. So, even if you have a team of movie experts who can review the movies for you and suggest for a number of features, it might only be a good reference.

At the end of the day, you need your algorithm to perform well BASED ON some metrics. Therefore, you will want to search for a good number of features that will make your algorithm to score high metric values on some data unknown to the training process.

In the 2nd video, we already know the feature values because it is for demonstration. The idea of the collaborative filtering is that you don’t need to assign those values, but you let the algorithm determine them for you. So, at first, you can initialize those feature values randomly (just like what you will do to initialize a neural network’s weights), and then the algorithm will tune them to some values that are optimizing the cost function.

Cheers,
Raymondf

4 Likes

Hi,
Thanks for your detailed answer.

This line really helped me understand better.
But how do one restrict the features number ? Is it by trail and error and decide for ourselves based on the error ?

Thank you,
Srikar Sana

Hello @Srikarsana,

Yes, it’s by trial and error. If you want some good starting number of features, however, and if you have finished the videos for Principal Component Analysis (PCA), you might carry out a PCA on your data and see how many components are sufficient to express most of the “variances” of your dataset. Note that the terms “components” and “variances” will be more familiar to you after you have finished all the videos and labs of PCA.

If PCA says 5 components are sufficient to express 90% of variances, then 5 can be a good starting point for your number of feautres in the Collaborative Filtering (CF) algorithm. However, I want you to note a few points also:

  • The PCA is NOT a replacement of our CF algorithm. You will notice that PCA does not handle question-mark matrix entries but the CF algorithm handles them with r(i, j)

  • PCA only serves as your starting guess for number of features, so you will also have to explore more different choices.

  • If you had a very very large dataset, PCA might be too slow or sometimes not feasible due to memory limit.

Cheers,
Raymond

1 Like