Why is Cosine Similarity calculated V2.V1_T and not V1.V2_T in C3W3_Modified_Triplet_Loss?

rocki · November 1, 2024, 4:48pm

When calculating cosine similarity between matrices V1 and V2, your implementation takes the V2.V1_T approach while it might appear logical to take V1.V2_T approach. Since these two approaches lead to transposed results, can you explain the reasons why V2.V1_T approach is the correct option to use?

TMosh · November 1, 2024, 9:11pm

Can you say why you feel either approach is more correct than the other?

rocki · November 2, 2024, 12:29am

Good question. V2 is being compared against V1, and thus the implementation. Is that right?

TMosh · November 2, 2024, 1:17am

Mathematically, does it matter which is compared against which? The computation is of the comparison, I don’t think the order is significant.

rocki · November 2, 2024, 2:29am

Theoretically, can the model be trained with loss calculated with either approach? Yes.

For the purpose of scoring in the assignment loss calculation based on the similarity matrices using those two approaches are not the same. Swapping the v1 and v2 in the notebooks provides different results. So, why force v2.v1_T approach? Are there any dependencies?

TMosh · November 2, 2024, 3:02am

The grader only checks for one of the results. The selection was rather arbitrary by the course designers.

rocki · November 2, 2024, 4:19am

Thanks for the clarification.

Deepti_Prasad · November 3, 2024, 10:59am

hi @rocki

the reason behind v2 being mentioned v1 is because the calculation is columnwise, this has been explained in the assignment book, I am sharing a screenshot which explains this

Also for better calculative understanding I am sharing a comment by mentor @arvyzukai who explained stepwise calculation for this which will help you understand it more thoroughly

arvyzukai’s comment

Regards
DP

rocki · November 4, 2024, 2:25am

Hi @Deepti_Prasad,

It seems the only reason the implementation took V_2.V_1^T approach was to compare a vector from V_2 against the vectors in V_1, which is also confirmed in the text “For example, consider row 2 in the score matrix. This row has the cosine similarity between V_2[2] and all four vectors in V_1.”

The material in reference does not addresses the question why V_1.V_2^T could not be used instead.

I agree with @Tmosh that the choice seems an arbitrary design approach in designating what was being compared against what.

Deepti_Prasad · November 4, 2024, 4:36am

hi @rocki

I didn’t disagree with other mentor, I was only providing information from the assignment about why v2 v1.T was used.

Regards
DP

SNaveenMathew · November 4, 2024, 5:28am

Agree with you, @rocki, because dot product is commutative for vectors. However, if V_1 and V_2 are sets of normalized vectors (matrices), say A=V_2V_1^T, then A^T=(V_2V_1^T)^T=(V_1^T)^TV_2^T=V_1V_2^T. We know A \neq A^T for any arbitrary square matrix A. Therefore, V_2V_1^T=(V_1V_2^T)^T\neq V_1V_2^T \forall (V_1, V_2). The choice of V_2V_1^T over V_1V_2^T may be for consistency of matrix notation and dimensionality matching between the ‘Two Vectors’ case and the ‘Two Batches of Vectors’ case.

rocki · November 5, 2024, 6:49am

Appreciate your inputs - @Deepti_Prasad and @SNaveenMathew

Topic		Replies	Views
Use columns of the similarity matrix in Triplet Loss? NLP with Sequence Models week-4	5	584	December 5, 2022
Week 2 - cosine similarity Sequence Models	6	540	May 18, 2023
C5W2A1 Ex4 Operations_on_word_vectors_v2a (different result) Sequence Models week-2	13	215	April 13, 2024
Incorrect cosine similarity equation in video and notes Sequence Models	3	535	March 1, 2023
C5 W2 A2: small -> smaller :: larger -> smaller? Sequence Models	2	521	April 8, 2022

Why is Cosine Similarity calculated V2.V1_T and not V1.V2_T in C3W3_Modified_Triplet_Loss?

Related topics