Content-Based Filtering lab question

7arunb · July 29, 2022, 7:08pm

Hi - have a question on the implementation in the lab.

Why does each movie have multiple entries in the training set? I get that they are represented as one-hot vectors, but (using the lab’s example) any reason we cannot have movie ID 6874 have 1s for Action, Crime and Thriller in the same line vs. in separate lines?

Also, wouldn’t this decrease the effectiveness of the algo. given we are losing information about the same movie having multiple genres? What am I missing?

Thanks

Lukasz_S · July 29, 2022, 7:26pm

Hi @7arunb

maybe this will be helpful:

7arunb · July 30, 2022, 1:54am

Thanks @Lukasz_S

I have a simlar question as the second comment on the stackexchange thread you linked to - if a movie has multiple genres, why not represent it as [1,0,0,1] vs two rows of [1,0,0,0] and [0,0,0,1]?

Topic		Replies	Views
One-hot versus "n-hot" for movie genres Unsupervised Learning, Recommenders, Reinforcement week-2	4	612	September 9, 2022
【C3_W2_RecSysNN_Assignment】Why one-hot coding for movie genre? Unsupervised Learning, Recommenders, Reinforcement week-2	5	552	August 9, 2022
Practice lab: Deep Learning for Content-Based Filtering Unsupervised Learning, Recommenders, Reinforcement week-2	5	66	July 8, 2024
Content-Based Filtering - Training Data Advanced Learning Algorithms week-2	4	554	December 27, 2022
A doubt in C3_W2_RecSysNN_Assignment Unsupervised Learning, Recommenders, Reinforcement week-3	2	421	July 5, 2023

Content-Based Filtering lab question

Related topics