Double Question: L2 norm-ing item and user vectors, and similar movies using ||v_k - v_j||^2

Two questions here.

  1. When trying to find similar movies, we are told to find the item vectors, v_k, which are closest in distance to the movie in question, v_j, i.e. ||v_k-v_j||^2. Why do we not just do this with the item FEATURE vectors, x_j, and x_k instead? It seems that the feature vectors are much more straightforward when comparing objective similarities between movies. Do v_j and v_k contain some information about how the movies are rated which is important for the discussion of similarity?

  2. By L2-normalizing the item and user vectors v_m and v_i, my understanding is that this ensures that the dot product between the two must have a maximum magnitude of 1… How then can this be used to predict movie ratings between 0 and 5?

1 Like

No, that is not what normalizing does.

1 Like

What I meant by this was:

I understand that normalizing a vector makes its magnitude equal to 1. So, with a v_m and v_u normalized, how can v_m dot v_u possibly have a magnitude greater than 1, which would be necessary for predictions between 0 and 5?

1 Like

That is true if you’re talking about a vector of predictions, such as for multiple classification.

If you’re talking about the input features, normalizing them only means that their ranges are adjusted so they are all roughly equivalent.

If the model has a linear output, its value can be any real number. This is because the output is the sum of the products of the weights and features, plus the bias value.

1 Like