What does the test data contain if the training data contain all the user ratings?

abhilash341 · January 25, 2025, 11:39pm

Based on the above image, I have the following questions:

If the training set includes all the ratings provided by users in the dataset, what does the test set consist of? Additionally, in this scenario, is the test set a subset of the training set, or is it distinct from it?
The image mentions that some ratings are repeated to increase the number of training examples for underrepresented genres. What does the repetition of ratings specifically refer to in this context? Are the user records duplicated, or are the movie records duplicated to address the issue of underrepresentation?

TMosh · January 26, 2025, 2:50am

The test set is a set of examples that you did not use during training. These are used to verify how well your system makes predictions on examples it has never seen before.

Sometimes a data set will be “augmented” to artificially make the data set larger, without the cost of collecting more data. Often this will consist of resizing, rotating or mirroring images.

abhilash341 · February 3, 2025, 12:46am

Thanks for your response. The lab notes mention that the training set includes all the ratings made by users in the dataset. However, my understanding is that a user’s ratings should be split between the training and test datasets. How is it that the training set contains all of a user’s ratings?

TMosh · February 3, 2025, 1:05am

I’ll review the assignment in more detail and report back later.

TMosh · March 28, 2025, 5:41am

Sorry for the delay in replying, I lost track of this thread.

I think the issue is that the title of the section “3.1. Training Data” is incorrect. It should be more like “3.1 Data Set”, because the text discusses how the data set is split into the training set and the test set.

Topic		Replies	Views
C3_W2_Assignment 2_Content based filtering Unsupervised Learning, Recommenders, Reinforcement week-module-2	2	390	October 25, 2023
A doubt in C3_W2_RecSysNN_Assignment Unsupervised Learning, Recommenders, Reinforcement week-module-3	2	421	July 5, 2023
C3_W2_Practice Lab 2. same user id gets displayed five times Unsupervised Learning, Recommenders, Reinforcement week-module-2	5	357	September 10, 2023
C3_W2_RecSysNN_Assignment dataset questions Unsupervised Learning, Recommenders, Reinforcement week-module-2	9	563	February 27, 2023
Collaborative filtering training dataset question Unsupervised Learning, Recommenders, Reinforcement week-module-3	4	489	November 19, 2022

What does the test data contain if the training data contain all the user ratings?

Related topics