C3_W2 Content-based filtering assignment - Predictions for new user way off in exercise 5.1

trandromeda · January 27, 2024, 11:42pm

I have noticed that the model’s predictions for a new user are quite suspicious and off. Given these features for example:

new_user_id = 5001
new_rating_ave = 0.0
new_action = 0.0
new_adventure = 0.0
new_animation = 5.0
new_childrens = 5.0
new_comedy = 5.0
new_crime = 0.0
new_documentary = 0.0
new_drama = 0.0
new_fantasy = 0.0
new_horror = 0.0
new_mystery = 0.0
new_romance = 0.0
new_scifi = 0.0
new_thriller = 0.0
new_rating_count = 3

user_vec = np.array([[new_user_id, new_rating_count, new_rating_ave,
                      new_action, new_adventure, new_animation, new_childrens,
                      new_comedy, new_crime, new_documentary,
                      new_drama, new_fantasy, new_horror, new_mystery,
                      new_romance, new_scifi, new_thriller]])

Produces these predictions:

In fact, I’ve tried several combinations of ratings and the documentary films are always at the top of the list, the Fog of War being #1 no matter what I do.

Is this to be expected or perhaps I am misunderstanding the premise of this exercise?

TMosh · January 28, 2024, 1:40am

The previous instance of this course (prior to DLAI) had a similar quirk. It almost always recommended that everyone watch a movie called “Santa with Muscles”.

The reason was that this movie had only one rating, but it was highly ranked for all categories. The recommender algorithm had no requirement about how many users had reviewed a movie before it would be recommended to others.

I have not investigated this specific issue you are discussing, but I am pretty comfortable with this being a quirk of the dataset.

I will look into this soon if I find some time.

trandromeda · January 28, 2024, 3:10am

Thank you, I appreciate the quick reply and clarification! That is an interesting conundrum. I’m curious to understand what’s going on, but at least I can rest easy now knowing it wasn’t just me.

trandromeda · January 29, 2024, 1:52am

Hi @TMosh, I think the problem I observed earlier was a red herring - I should’ve specified that when I ran the example above, it was in a notebook running on my machine locally (Macbook Air with M2 chip) using Tensorflow 2.15.0. I was also using tf.keras.optimizers.legacy.Adam for the optimizer and the model’s evaluated loss was 0.1253.

I re-ran the the examples in Coursera’s environment and the model’s loss was much lower, 0.0815, and the predictions were much more accurate as well. Those documentaries (e.g Fog of War) were no longer showing up at the top of the rankings between predictions.

I don’t know what exactly between the different packages and hardware caused such different results in each environment, but it doesn’t seem to be due to the dataset after all (I couldn’t spot anything quirky about it after looking through the data).

TMosh · January 29, 2024, 5:38am

Keras makes lots of changes that are not backward-compatible. Sometimes they even re-scale the cost values using a different algorithm, or they change which options are used as the defaults.

It makes swapping versions extremely problematic.

That’s why the versions used by the notebooks are fixed to a specific set hosted by Coursera.

Topic		Replies	Views
Mean normalization and movie rating logic Unsupervised Learning, Recommenders, Reinforcement week-2	1	512	August 7, 2022
Prediction in Collabrative filtring Unsupervised Learning, Recommenders, Reinforcement week-2	9	537	March 11, 2023
Content-based filtering: If i don't use one of the feature on training data, would the accuracy will be wreck? Unsupervised Learning, Recommenders, Reinforcement week-3	5	280	November 29, 2023
Machine learning Spec. recommender systems week2 Unsupervised Learning, Recommenders, Reinforcement week-1	4	449	May 25, 2023
Content-Based Filtering - Training Data Advanced Learning Algorithms week-2	4	554	December 27, 2022

C3_W2 Content-based filtering assignment - Predictions for new user way off in exercise 5.1

Related topics