Why is Cosine Similarity calculated V2.V1_T and not V1.V2_T in C3W3_Modified_Triplet_Loss?

When calculating cosine similarity between matrices V1 and V2, your implementation takes the V2.V1_T approach while it might appear logical to take V1.V2_T approach. Since these two approaches lead to transposed results, can you explain the reasons why V2.V1_T approach is the correct option to use?

Can you say why you feel either approach is more correct than the other?

Good question. V2 is being compared against V1, and thus the implementation. Is that right?

Mathematically, does it matter which is compared against which? The computation is of the comparison, I don’t think the order is significant.

Theoretically, can the model be trained with loss calculated with either approach? Yes.

For the purpose of scoring in the assignment loss calculation based on the similarity matrices using those two approaches are not the same. Swapping the v1 and v2 in the notebooks provides different results. So, why force v2.v1_T approach? Are there any dependencies?

The grader only checks for one of the results. The selection was rather arbitrary by the course designers.

Thanks for the clarification.

hi @rocki

the reason behind v2 being mentioned v1 is because the calculation is columnwise, this has been explained in the assignment book, I am sharing a screenshot which explains this

Also for better calculative understanding I am sharing a comment by mentor @arvyzukai who explained stepwise calculation for this which will help you understand it more thoroughly

arvyzukai’s comment

Regards
DP

Hi @Deepti_Prasad,

It seems the only reason the implementation took V_2.V_1^T approach was to compare a vector from V_2 against the vectors in V_1, which is also confirmed in the text “For example, consider row 2 in the score matrix. This row has the cosine similarity between V_2[2] and all four vectors in V_1.”

The material in reference does not addresses the question why V_1.V_2^T could not be used instead.

I agree with @Tmosh that the choice seems an arbitrary design approach in designating what was being compared against what.

hi @rocki

I didn’t disagree with other mentor, I was only providing information from the assignment about why v2 v1.T was used.

Regards
DP

Agree with you, @rocki, because dot product is commutative for vectors. However, if V_1 and V_2 are sets of normalized vectors (matrices), say A=V_2V_1^T, then A^T=(V_2V_1^T)^T=(V_1^T)^TV_2^T=V_1V_2^T. We know A \neq A^T for any arbitrary square matrix A. Therefore, V_2V_1^T=(V_1V_2^T)^T\neq V_1V_2^T \forall (V_1, V_2). The choice of V_2V_1^T over V_1V_2^T may be for consistency of matrix notation and dimensionality matching between the ‘Two Vectors’ case and the ‘Two Batches of Vectors’ case.

Appreciate your inputs - @Deepti_Prasad and @SNaveenMathew