Also what does it mean that in other machine learning features are given but in this if it not given it can find out on its own based on the algorithm of collaborative filtering. Technically the rating numbers is also a set of features.
And features that we talk in the lessons denoted by \vec X_i is what we used to called synthetic feature in feature engineering
That’s one way to look at it, but there is several viewpoints on this topic.
The way I look at it, we’re trying to predict the ratings for the movies, because the output ‘Y’ matrix contains the ratings for all the movies from each user.
So the ratings are the outputs, not the features.
We have no idea what the features are, so we have to learn them along with the weight values.
It makes sense. I was trying to map it with traditional where Y is always given, if it is not given and very less such records we can remove it using dropna() function.
So now some of y \in Y are given not all. Our goal is to engineer features and find weights for each user that will approximate the existing rating and we can then assume that user will rate predicted value for the movie when they will watch it.
Have I got it right @TMosh ?