Few doubts in C3_W2_RecSysNN_Assignment

  1. Why do we normalize even after applying StandardScaler

  2. whats the difference between content-based filtering and Similar Items.
    Both gives the recommended movie.

  3. what is item_vecs varaible. How is it different from item_train

Hi @Naren_babu_R,

We used StandardScaler to standardize our data columns to have a mean of 0 and standard deviation of 1. Additionally, we used L2 norm to normalize the rows (feature vectors) to have a unit norm. Each feature vector represents a user’s average rating per genre. For example, a user may have rated only action/thriller movies, which can result in sparse data or different scales e.g., many zero entries for genres other than action/thriller. By applying L2 norm, we rescaled each feature vector to have a unit norm and scales to similar range, which effectively gives equal importance to all features and improving the model’s performance and generalization ability.

Content-based filtering and Similar Items are both methods for making recommendations. Content-based filtering recommends items based on their features, such as genre or actors. Similar Items recommends items based on the behavior of users who have viewed the current item. For example, if a user is currently viewing a movie, Similar Items might recommend other movies frequently viewed by users who also viewed the current movie. Similar Items uses user behavior data to find patterns and similarities in how users interact with items, rather than focusing on the features of the items themselves.

item_vecs and item_train both contains movies metadata. item_train is used to feed into model during training, while item_vecs later used to generate and replicate user vector for making prediction.

I hope this resolves your queries

@Mujassim_Jamal - Can you please elaborate on this?

Item_train is of shape (46549, 17) whereas item_vecs is of shape (1883, 17). What’s the need of having new item_vecs when we already have item_train which also contains metadata of items? I understand that item_train is used for training but why are the item_vecs needed at all? And, why it has only 1883 rows?

Hi @bhavanamalla,

Sorry for my late reply. I was very busy due to my final year examinations.

The item_vecs, which have a total of 1883 rows, are taken from both the training and test sets. I believe this vector sample has been used to predict how both ‘new and existing users’ might rate each movie/item in the item_vecs.

Furthermore, I am unsure about the reason for having precisely 1883 rows.