Standard scaling for user vector in Content Based Filtering

bhavanamalla · August 6, 2023, 10:00am

I have a doubt about user_train scaling values.

Let’s consider the first user in user_train in the original scale.

When we observe in the original scale, we could see that the first few rows of user_train belong to user 2. Lets now consider only user 2

We could see that all these rows belonging to user 2 are similar because the user vector is the same for all the movies the user rated. And hence the similar rows for each user.

So, from my understanding, the rows of user_train of user 2 represent the reviews/ratings of user 2 for each movie the user 2 rated.

Let’s now consider the first few rows of item_train.

According to me, the matrix item_train contains, the respective movies which were rated by user 2, where each row contains one hot encoding for each genre(duplicated rows for each of the movies rated by user 2 with different encodings for the genre)

If my understanding is correct, So now comes to the actual question.

If the user_train contains the same rows and same feature values for user 2 for all the movies,

when we perform standard scaling of input features(which is performed column-wise), then values in each of the columns for this user 2 be the same right?

Cuz, in standard scaling, the mean and std for a column are calculated, and the same mean and std are subtracted from all the feature values in a column, right? How come they are different?

Kindly correct me if I comprehended it wrongly

rmwkwok · August 7, 2023, 2:19am

Hello @bhavanamalla,

Yes!

So you are comparing between before scaling and after scaling. There are two things:

We have shuffled the dataset (when doing train/test splitting), so the table is no longer ordered by user id.
The user id is also scaled, and rounded off when displayed, so the two 1s in the first and the fouth rows there do not necessarily represent the same user.

Cheers,
Raymond

bhavanamalla · August 7, 2023, 2:55pm

Totally missed that shuffling is performed when splitting the dataset. Thank you again!

Topic		Replies	Views
C3_W2_Lab2_Ex1_indices for users? Unsupervised Learning, Recommenders, Reinforcement week-module-2	5	499	November 5, 2022
Few doubts in C3_W2_RecSysNN_Assignment Unsupervised Learning, Recommenders, Reinforcement week-module-3	3	555	August 27, 2023
Normalization v.s. Standardize Neural Networks and Deep Learning coursera-platform	1	582	October 6, 2021
C3_W2_Practice Lab 2. same user id gets displayed five times Unsupervised Learning, Recommenders, Reinforcement week-module-2	5	358	September 10, 2023
C3_W2_RecSysNN_Assignment (user and item vectors shape) Unsupervised Learning, Recommenders, Reinforcement week-module-2	3	549	December 28, 2022

Standard scaling for user vector in Content Based Filtering

Related topics