I just finished the week on Recommendation systems, and I cannot wrap my head about something.
Let’s say that we create a movie vector with features for the Content-based filtering.
The features for v_m will be: movie year, movie genre, avg. rating, etc.
By doing this, am I not missing out on the information I had with Collaborative filtering?
Two movies can have the same movie year, movie genre, avg. rating but they could be liked by different types of people.
In collaborative filtering, w_j incorporated this information on what type of people liked the movie.
Should we take w_j (from collaborative filtering) and add it to v_m (content-based), so we do not miss out on any information?
Or have any other insights you might have on it?
thank you in advance!
Luca
Hi Luca,
That’s an interesting idea, certainly you may incorporate any information you think relevant to the movie into the content-based model. However, we should remember that, you can generate information from collaborative filtering only when the movie was viewed/rated by users, so when you have a new movie, you won’t have that information for the content-based model.
Cheers,
Raymond
1 Like
thank you Raymond for your reply!
Good point. For new movies the collaborative filtering will not be able to ‘learn’ features.
So if I incorporate collaborative filtering information into a content-based model, the model might perform worse on new movies or new users.
Very clear reply,
Luca
1 Like
@lczanna Let me just add one more point. The problem about a new movie / new user in collaborative filtering is called the “Cold-start” problem. Content-based recommendation can fill the gap because usually we use content-based information about the movie (such as genre) or about the user (such as country based on IP address) and that information does not require any user-movie interactions which collaborative filtering always require.
Thanks Raymond, I understand.
One more curiosity: for existing movies and existing users, is collaborative filtering likely to perform better than the content-based algorithm?
If yes, would it make sense to train both algorithms, then:
- use collaborative filtering to predict the rating of existing movies/existing users
- use content-based to predict the rating when either the movie or the user is new
?
Recommendation algorithms is an exciting topic !
Using a different strategy based on data available makes a lot of sense. Moreover, you may also consider to use both for existing users - as you suggested in the first post of this thread. The idea can be, you trained a collaborative filtering to get an user vector, then combining with users’ content information you further get an even longer vector, before you feed the vector to some layers of NN to keep the shape identical with a movie vector, and finally compute the dot product between a user and a movie vector and minimize the dot product - ofcourse what I am saying is just one possibility of merging two pieces of knowledge.
Very insightful, thank you Raymond!
1 Like