Few doubts in C3_W2_RecSysNN_Assignment

Naren_babu_R · April 15, 2023, 9:31am

Why do we normalize even after applying StandardScaler
whats the difference between content-based filtering and Similar Items.
Both gives the recommended movie.
what is item_vecs varaible. How is it different from item_train

Screenshot from 2023-04-15 14-54-401213×57 10.8 KB

Mujassim_Jamal · April 18, 2023, 4:39pm

We used StandardScaler to standardize our data columns to have a mean of 0 and standard deviation of 1. Additionally, we used L2 norm to normalize the rows (feature vectors) to have a unit norm. Each feature vector represents a user’s average rating per genre. For example, a user may have rated only action/thriller movies, which can result in sparse data or different scales e.g., many zero entries for genres other than action/thriller. By applying L2 norm, we rescaled each feature vector to have a unit norm and scales to similar range, which effectively gives equal importance to all features and improving the model’s performance and generalization ability.

Content-based filtering and Similar Items are both methods for making recommendations. Content-based filtering recommends items based on their features, such as genre or actors. Similar Items recommends items based on the behavior of users who have viewed the current item. For example, if a user is currently viewing a movie, Similar Items might recommend other movies frequently viewed by users who also viewed the current movie. Similar Items uses user behavior data to find patterns and similarities in how users interact with items, rather than focusing on the features of the items themselves.

item_vecs and item_train both contains movies metadata. item_train is used to feed into model during training, while item_vecs later used to generate and replicate user vector for making prediction.

I hope this resolves your queries
Best,
Mujassim

bhavanamalla · August 6, 2023, 2:02pm

@Mujassim_Jamal - Can you please elaborate on this?

Item_train is of shape (46549, 17) whereas item_vecs is of shape (1883, 17). What’s the need of having new item_vecs when we already have item_train which also contains metadata of items? I understand that item_train is used for training but why are the item_vecs needed at all? And, why it has only 1883 rows?

Mujassim_Jamal · August 27, 2023, 4:51am

Hi @bhavanamalla,

Sorry for my late reply. I was very busy due to my final year examinations.

The item_vecs, which have a total of 1883 rows, are taken from both the training and test sets. I believe this vector sample has been used to predict how both ‘new and existing users’ might rate each movie/item in the item_vecs.

Furthermore, I am unsure about the reason for having precisely 1883 rows.

Topic		Replies	Views
Double Question: L2 norm-ing item and user vectors, and similar movies using \|\|v_k - v_j\|\|^2 Unsupervised Learning, Recommenders, Reinforcement week-module-2	3	255	December 18, 2023
Computing feature vector for user and item/movies for content-based filtering Unsupervised Learning, Recommenders, Reinforcement week-module-2	1	464	January 26, 2023
C3_W2_RecSysNN_Assignment (user and item vectors shape) Unsupervised Learning, Recommenders, Reinforcement week-module-2	3	549	December 28, 2022
Standard scaling for user vector in Content Based Filtering Unsupervised Learning, Recommenders, Reinforcement week-module-2	2	401	August 7, 2023
C3_W2_Lab2_Ex1_indices for users? Unsupervised Learning, Recommenders, Reinforcement week-module-2	5	499	November 5, 2022

Few doubts in C3_W2_RecSysNN_Assignment

Related topics