How do we produce the user features in the first place?

From the original data source MovieLens Latest Datasets | GroupLens, there are movies.csv and ratings.csv. The movies.csv and ratings.csv together can produce item features (year, average rating, action, …, thriller) for each movie. However, how can we produce values of user features like action, Adventure, …, thriller in content_user_train.csv in the first place? The original data source only has user’s ratings for movies they’ve watched, but that does not tells us what features users should have. I’m confused.

Hello @Huaqing_Mao,

Welcome to our community.

The user features such as Action, Adventure etc have been derived from the ratings a partiuclar user has given for the movies.

Every movie has genre(s) associated with it. Every user has rated a certain set of movies.

Based on these 2 bits of info, we can group the movies (rated by the user) by genre(s) and then find the average rating given by the user for each genre.

So, the user feature “Action” is the average rating given by the user for the movies under “Action” genre.

I see. The movies watched and rated by the users help to define the features of the users. In general, the features of users can be anything like age, gender, country which has nothing to do with their movie watching or rating history. But in this assignment, the features is defined as such. Thanks for clarifying.

Yes, the original user features can be demographic in nature, totally unrelated to the movies. However, there is nothing stopping us from also creating these derived features that have more relavence to the users preference for movies - This could greatly improve our predictive capability.


I have one doubt in this. Why we have constructed multiple values for a user in the content_user_train.csv when we could have used a single vale for each user with the average rating per genre. Is it because it will create more training data & help in training the model? if yes, can we do with any dataset , won’t it lead to overfitting?

1 Like