C3_W2 Content-based filtering assignment - Predictions for new user way off in exercise 5.1

I have noticed that the model’s predictions for a new user are quite suspicious and off. Given these features for example:

new_user_id = 5001
new_rating_ave = 0.0
new_action = 0.0
new_adventure = 0.0
new_animation = 5.0
new_childrens = 5.0
new_comedy = 5.0
new_crime = 0.0
new_documentary = 0.0
new_drama = 0.0
new_fantasy = 0.0
new_horror = 0.0
new_mystery = 0.0
new_romance = 0.0
new_scifi = 0.0
new_thriller = 0.0
new_rating_count = 3

user_vec = np.array([[new_user_id, new_rating_count, new_rating_ave,
                      new_action, new_adventure, new_animation, new_childrens,
                      new_comedy, new_crime, new_documentary,
                      new_drama, new_fantasy, new_horror, new_mystery,
                      new_romance, new_scifi, new_thriller]])

Produces these predictions:

In fact, I’ve tried several combinations of ratings and the documentary films are always at the top of the list, the Fog of War being #1 no matter what I do.

Is this to be expected or perhaps I am misunderstanding the premise of this exercise?

The previous instance of this course (prior to DLAI) had a similar quirk. It almost always recommended that everyone watch a movie called “Santa with Muscles”.

The reason was that this movie had only one rating, but it was highly ranked for all categories. The recommender algorithm had no requirement about how many users had reviewed a movie before it would be recommended to others.

I have not investigated this specific issue you are discussing, but I am pretty comfortable with this being a quirk of the dataset.

I will look into this soon if I find some time.

Thank you, I appreciate the quick reply and clarification! That is an interesting conundrum. I’m curious to understand what’s going on, but at least I can rest easy now knowing it wasn’t just me.

Hi @TMosh, I think the problem I observed earlier was a red herring - I should’ve specified that when I ran the example above, it was in a notebook running on my machine locally (Macbook Air with M2 chip) using Tensorflow 2.15.0. I was also using tf.keras.optimizers.legacy.Adam for the optimizer and the model’s evaluated loss was 0.1253.

I re-ran the the examples in Coursera’s environment and the model’s loss was much lower, 0.0815, and the predictions were much more accurate as well. Those documentaries (e.g Fog of War) were no longer showing up at the top of the rankings between predictions.

I don’t know what exactly between the different packages and hardware caused such different results in each environment, but it doesn’t seem to be due to the dataset after all (I couldn’t spot anything quirky about it after looking through the data).

Keras makes lots of changes that are not backward-compatible. Sometimes they even re-scale the cost values using a different algorithm, or they change which options are used as the defaults.

It makes swapping versions extremely problematic.

That’s why the versions used by the notebooks are fixed to a specific set hosted by Coursera.