Movie Recommender system

  • I’m working on a movie Recommender system project using collaborative filtering algorithm
    *Using python

  • I have a dataset where few movies haven’t rated by any of the user’s .

  • Can you please help me by telling whether to include those movies for training, or should I drop them and not use for training

  • As in the dataset the number of movies are 9742, and the number of unrated movies by any of the user is <20.

I would drop movies that don’t have any ratings in this scenario.

3 Likes

u can simply dropout that columns.

Hello @mrajib.rm - Welcome to the community :slight_smile:

you can search a gen ai hackathon by google on hack2skill website.
you will get so many project idea to work and learn in real time
for more information please connect to me : smitrpatel19@gnu.ac.in

In collaborative filtering algorithms, including movies that have not been rated by any user in the training data can be problematic. These unrated movies are often referred to as “cold start” items. Here are a few considerations to help you decide whether to include these movies in your training data or not:

  1. Sparsity: If the dataset is already sparse (i.e., there are a lot of missing ratings), including unrated movies can exacerbate the sparsity issue. This can make it difficult for the algorithm to accurately learn user preferences.
  2. Relevance: Consider whether these unrated movies are likely to be relevant to users in your dataset. If they are niche or obscure movies that are unlikely to be of interest to most users, including them may not add much value to the recommendations.
  3. Data Quality: If the dataset is small and the number of unrated movies is negligible (<20 in your case), it might be better to exclude them to avoid noise in the training data.
  4. Evaluation Metrics: Think about how you will evaluate the performance of your recommender system. If you include unrated movies in the training data, you may need to use evaluation metrics that account for the presence of unrated items, such as top-N recommendation metrics.
  5. Cold Start Problem: Including unrated movies in the training data does not address the cold start problem, which refers to the challenge of making recommendations for new or unrated items. Collaborative filtering algorithms typically struggle with cold start items, and including them in the training data may not necessarily improve the system’s performance on these items.

Based on the information provided, since the number of unrated movies is small (<20 out of 9742), and including them might not significantly improve the model’s performance while potentially introducing noise, it may be reasonable to exclude them from the training data. However, it ultimately depends on the specific characteristics of your dataset and the goals of your recommender system project. You may want to experiment with both approaches and evaluate their performance using appropriate evaluation metrics

1 Like

In my personal experience (I assume we are talking MovieLens, or similar), one of the other big challenges you’ll find is marshalling the compute resources to actually crunch the whole dataset at once using traditional methods. Techniques like SVD can make a big difference in this regard (as there is more than one way to process collaborative filtering).

Also, another thing to consider-- At least in regards to MovieLens 10M-- Try mapping out the numbers of reviews by user. I mean I understand you are trying reduce model size by removing ‘not often reviewed’ movies, but on the other end of the spectrum you will see there is a relatively small precentage of users (relative to the data set) that have reviewed a ton of movies.

Thus another interesting question to consider-- Should these users be considered ‘experts’ (?)-- Or in effect ‘outliers’. I mean presumably you are trying to provide recommendations for the ‘average’ user.

In any case, it is something to think about.

1 Like