For instance, with one one-hot coded movie feature , 39 rows of training data are required to describe user 2#'s rating on 16 movies.Whie it could have been descirbed with only 16 rows if not using one-hot coding(which by the way is also more intuitively comfortable for me)
Can I get more clarity on why there are 39 duplicate entries for user 2 when rating 16 movies? I see item_train and user_train both have 58187 rows but still not getting the connection between the two tables.
Thank you! So the rows line up (implicitly, i.e. no foreign key like in sql). Rows 0-38 are user 2’s reviews of the movies in rows 0-38, i.e. movie 6874, 8798 etc with all the permutations of the genres.