【C3_W2_RecSysNN_Assignment】Why one-hot coding for movie genre?

Why one-hot coding for movie genre?

For instance, with one one-hot coded movie feature , 39 rows of training data are required to describe user 2#'s rating on 16 movies.Whie it could have been descirbed with only 16 rows if not using one-hot coding(which by the way is also more intuitively comfortable for me)

There must be some benifit and what is that?

Hello @okleon ,
Welcome to the DeepLearning.AI community.

That’s a good question. You can follow this thread for relevant discussions on your question.

Hello @vignesh18

thank you for the info.
It looks like I need to improve on searching skills

Can I get more clarity on why there are 39 duplicate entries for user 2 when rating 16 movies? I see item_train and user_train both have 58187 rows but still not getting the connection between the two tables.

It seems that each combination of user x (movie x movie genre) is ONE training sample of 58187 samples.

For instance:user U1 watched movie M1 and M2 ,both movies are of genre G1 and G2, then there are 4 samples:
U1 M1 G1
U1 M1 G2
U1 M2 G1
U1 M2 G4

If you link the two tables together horizontally row by row, each row is different.

3 Likes

Thank you! So the rows line up (implicitly, i.e. no foreign key like in sql). Rows 0-38 are user 2’s reviews of the movies in rows 0-38, i.e. movie 6874, 8798 etc with all the permutations of the genres.