C3_W2_Assignment 2_Content based filtering

In this lab:

  1. The data for user has been repeated. Why?
  2. There seems to be a mismatch between what’s written and what’s shown. Kindly help me understand.

See the image above for context please.

Hello @Debatreyo_Roy

  1. It is repeated for each movie that was rated by the same user. If you concatenate the user table and the movie table horizontally, then each line of this bigger table is one user rating one movie, but it is just that the bigger table was split into two, and so you see a repeated user in the user table because that user had rated multiple movies.

  2. Another learner had observed this difference, so I requested the team to review the number. For now, we may rely on what the data tells us.

Cheers,
Raymond

  1. I agree with @rmwkwok It seems the ‘user_train’ data are repeated to prepare the cross in the network model.
  2. If you look in csv file ‘content_user_train.csv’ you can see 3.95 rating av. for Action genre. This file has the raw data.

The ‘pprint_train’ code has a format that round the value to 4.0: