Redundant instances in User's dataset

Muhammad-Kalim-Ullah · September 21, 2022, 3:48pm

Hello, Could someone please clarify why we used redundant User data (i.e. same user id with same data as well in duplicate rows) in the content-based filtering programming assignment?
Thanks

Wendy · September 21, 2022, 10:49pm

@Muhammad-Kalim-Ullah, this is just an implementation choice, I think to simplify the implementation for the assignment.

Basically, load_data() treats item_train, user_train, and y_train as different portions of one big table where each row represents a user+movie+rating combo (e.g. the first row represents the first user, the first movie, and that user’s rating for that movie, etc).

This implementation makes it easy to match the target user ratings up with the appropriate user+movie pairs. For example, when splitting the training and test sets, as long as you make the same splits for your item_train, user_train, and y_train the same and you’ll be assured you have y values that go with the corresponding user/item pairs.

Muhammad-Kalim-Ullah · September 27, 2022, 10:27am

But it is not required to use the duplicate rows, I think it’s just to illustrate that the user and movie item has an equal number of instances. Am I right?
Because duplicate rows (completely) do not make sense unless balancing the datasets of user_item and movie_item.

rmwkwok · September 27, 2022, 10:42am

Hello @Muhammad-Kalim-Ullah, please check out this post for how we come up with appearently duplicated user rows.

Topic		Replies	Views
C3_W2_RecSysNN_Assignment dataset questions Unsupervised Learning, Recommenders, Reinforcement week-2	9	558	February 27, 2023
C3_W2_Assignment 2_Content based filtering Unsupervised Learning, Recommenders, Reinforcement week-2	2	388	October 25, 2023
C3_W2_Practice Lab 2. same user id gets displayed five times Unsupervised Learning, Recommenders, Reinforcement week-2	5	355	September 10, 2023
C3_W2_RecSysNN_Assignment_Dataset Unsupervised Learning, Recommenders, Reinforcement week-3	3	486	December 29, 2022
Content-Based Filtering - Training Data Advanced Learning Algorithms week-2	4	553	December 27, 2022

Redundant instances in User's dataset

Related topics