I have question, maybe I was fool. I cannot get the point, why we say that:

For Collaborative Filtering:

X dot W + b is rating

For Content-Based Algorithms

V of user dot V of item is rating

It’s evident that throughout the course, Professor Ng has imparted the concepts and trained us on the application and training of models. However, the original definitions of these remains unclear.

The two formalisms are not very different, we can instead read them as:

CF: w of user dot x of movie + b of user is rating

CB: v of user dot v of movie is rating

The only apparent difference here is the inclusion of a bias term in CF.

A less apparent difference here is that,

CF starts from NO given user or movie features, whereas

CB starts from given user and movie features.

What CB does is to train a network that converts those given features non-linearly into the final v of user and v of movie, whereas CF just trains the w (and b) of user and x of movie out directly. This approach also explains why it is natural that there is no bias term in the CB algorithm.

While the above is about the difference between them, their most critical common step is the dotting of the embeddings (of an user and a movie).

The dotting defines how the comparison (among users and movies) is done. We need a mathematical operation to map “good matches” into ones and “bad matches” into zeros, and the dotting of two normalized embeddings can give us both that and a wonderful interpretation of that math operation in terms of similarity. Under such interpretation, similar users and movies stay close in the embedding space created by the neural network, while dissimilar ones far away from each other, so that such interpretation can give us a sense that, if the users/movies have very variable characteristics, we will need a larger embedding space to accommodate them, because we need more space for the algorithm to separate dissimilar ones apart.

@rmwkwok Thank you for your response. Regrettably, my query has not been addressed. The model definition is quite clear to me, maybe my question is still unclear, let me rephrase:

For Collaborative Filtering:

Why canX dot W + b be represented as the rating ?

For Content-Based Algorithms

Why can V of user dot V of item be represented as the rating ?

(1) why can’t they, or
(2) what instead can? or
(3) what criteria are needed? or
(4) in the house pricing problem in the previous MLS courses, why can w dot x + b represent house prices?

I need to know your perspective and not just the questions.

It would be very helpful if you can elaborate on any one of them, or more of them if possible.

Try to share your view. I know it’s your question,
but I would also like to know how you see things.

Of all the linear regression models you have built, why do any of them represent the labels they are trained to represent? Can you give an example and try to tell us anything that convinced you?

Hi @Chris.X, I am just starting with ML, so my answer on the basis on my understanding.

I think the dot product represents the rating because we train the model to represent it. We calculate our loss function by comparing the dot product and actual ratings. Minimizing the loss function is just saying make the dot product close to the actual rating. So, if we train our model to make the dot product close to actual rating(of training data), for the new data, it should represent the rating.