Two questions here.
When trying to find similar movies, we are told to find the item vectors, v_k, which are closest in distance to the movie in question, v_j, i.e. ||v_k-v_j||^2. Why do we not just do this with the item FEATURE vectors, x_j, and x_k instead? It seems that the feature vectors are much more straightforward when comparing objective similarities between movies. Do v_j and v_k contain some information about how the movies are rated which is important for the discussion of similarity?
By L2-normalizing the item and user vectors v_m and v_i, my understanding is that this ensures that the dot product between the two must have a maximum magnitude of 1… How then can this be used to predict movie ratings between 0 and 5?