Content-based filtering | find similar movie

alexshagiev · September 11, 2024, 8:25pm

In this @7:30 seconds of this video timestamp it talks about ranking movie similarities using the following formula:

|| V(k)_m - V(i)_m ||^2
where
- V(k)_m - V(i)_m results in another vector lets say V_d whose Norm-2 length represents how far apart the two vectors we started with, both in terms of magnitude and their angle.
- || V_d || I assume just a notation for laking Norm-2 length of V_d which is in expanded form is sqrt(V_d1^2 + V_d2^2 + ... V_dn^2)

Question:

Q1: Why do we square || V_d || ? is that because sqrt can be both positive & negaitive and so we eliminate the negative option by using sqrt

gent.spah · September 12, 2024, 6:42am

Normally you would use L2 norm because its more punitive ie. its more sensitive than L1 norm, even if there is small difference (>1) between these two vectors if squared the penalty would become large unless they are close to each other, like in a range [0…1] the penalty would be reduced by a power of 2!

Nevermnd · September 12, 2024, 7:34am

@alexshagiev I will defer to @gent.spah because I have not taken this exact class here-- But, as I recall, like SVD will basically implode your sparse matrix… But I don’t recall us having to deal with any norm there.

gent.spah · September 12, 2024, 7:36am

@Nevermnd I dont remember that class either, but I think thats the principle behind it…

TMosh · September 12, 2024, 1:39pm

If I recall correctly, this exercise uses the norm squared in order to avoid computing the square root. It’s a notational trick that is intended to lead you to just computing the sum of the squares of the differences.

Since the square root is monotonic increasing, it will not change the relative comparison between different movies. So computing the square root would consume some calculation resources but add no value.

Topic		Replies	Views
Calculating sq_dist for Course 3 Week 2 MLS Resources	0	247	December 5, 2022
Deep Learning for Content-Based Filtering problem in assignment Unsupervised Learning, Recommenders, Reinforcement week-2	8	727	January 9, 2023
Issue in Programming Assignment: Deep Learning for Content-Based Filtering Unsupervised Learning, Recommenders, Reinforcement week-2	9	833	January 13, 2023
Double Question: L2 norm-ing item and user vectors, and similar movies using \|\|v_k - v_j\|\|^2 Unsupervised Learning, Recommenders, Reinforcement week-2	3	254	December 18, 2023
Why is the distance between vectors for the triplet loss calculated as such? Convolutional Neural Networks	5	597	January 12, 2022

Content-based filtering | find similar movie

Related topics