The second worksheet says “Movies with multiple genre have a training vector per genre.” I see how that can work fine but I wonder how that would compare with a single entry per movie but a “1” for each genre.
Thanks. I understand the difference but not the practical trade-offs between them. I would have thought that multi-hot would be natural for movie genres. And it would avoid the awkwardness in later exercises with duplicate answers.
Excellent question, @toontalk! (And, @7arunb, who happened to ask virtually the same question at almost the identical time on another thread).
My guess is the reason for trying the approach with separate movie entries for each genre when a movie has multiple genres came from a goal of trying dis-entangle the genres to make a simpler and more intuitive association with the user parameters, which were defined to have one value per genre). Just a guess, and I’m not sure how much this would actually help with the user parameters.
In any case, the choice to have separate movie entries for each movie/genre pair seems to significantly complicate things when it comes to looking at predictions for a user’s movie ratings and recommendations for similar movies - which is the main point of the recommender.
For example, you would typically want to predict is a user’s rating for a movie - not separate predictions for that movie as a Romance, vs that movie as a Drama, if the movie happens to be in both genres. Right now, print_pred_movies gets around this by just picking the highest rating across all the genres for that moving (highest = first found, since the list is sorted with highest ranking first).
Similarly, when looking for movies that are similar to each other, it seems like you would want to look at the movie as a whole, and not be comparing a particular genre of a movie with specific genres of other movies. For example, it seems more useful to know which movie is most similar to “Bridget Jones’s Diary”, rather than saying that “Super Troopers” is most similar to it if you look the movie as a comedy, but “The Others” is most similar if you look at the movie as a drama.
Your suggestion to have a single entry per movie, with a 1 for each applicable genre seems like it could be a great solution to this issue. Certainly worth exploring. I’ll hunt down some feedback to sanity-check my logic, but right now it seems to me that this is a significant point and we should look into updating the assignment to use your suggested approach.