Recommender systems mean normalization Collaborative Filtering


From the lecture, as far as I understood, CF Mean normalization is a way to do faster optimization and also to handle the cold start problem of new users or new movies up to some extent.

This necessarily means that the mean normalization helps to fill the ratings of a new user, for all the movies to be the average rating rather than a total zero.

And we are doing this because we don’t have any other information/content about the item or the user apart from ratings. This serves as the best starting point in the case of ratings and respective recommendations.

Is my understanding correct? Kindly correct me if I am wrong

I vote Yes!


I would like to add some more information:

There are two (well, 3) main types of recommender systems:

  • Content based
  • Collaborative based
  • Hybrid (combination of the above)

Content based:
This one uses the features of the items (movies, products, songs, etc) and creates profiles with each one. These profiles are vectorized. Once vectorized, we create a matrix that crosses every item with every other item and calculates the similarities. This is done using mostly cosine similarity.

Content based systems are a great solution for the cold-start problem. When you don’t have any information about your users, then you start with a content based recommender system.

For a content-based recommender system I highly recommend a “sentence transformer”. In some cases TFIDF is very good as well. Other content-based implementation is Count Vectorizer. You can also build a Deep Neural Network.

Collaborative based:
In this type of recommender system, we now have some or a lot of knowledge about the users. Many collaborative recommender systems know, for instance, the history of each user’s selections or preferences: previous movies and its ratings, previous purchases and their frequencies (and may be their ratings), etc. In more sophisticated cases, in top of having the users histories, we also have information about the users themselves like location, gender, age, and many other features.

We now have a wealth of information about users and here we proceed roughly in the same way we did before: we create profiles and then we create a similarities matrix.

The one method I’ve used for the collaborative-based of system is SLIM, which proved very good. Other methods include KNN baseline item-to-item, Co-Clustering, and of course you can always build a DNN.

Hybrid models:
These are basically a combination of the two above. I have not built one yet so I cannot share much on it.

1 Like

Thanks much for your inputs!