Use columns of the similarity matrix in Triplet Loss?

misterjonathansands · November 27, 2022, 9:36pm

In the triplet loss formulation we only consider rows of the similarity matrix (to calculate the mean negative, and the closest negative lost).
But columns hold the same kind of information right ? Why not use columns too when calculating these two losses ?

arvyzukai · December 1, 2022, 7:26am

Hi @misterjonathansands

I’m not sure what are you thinking

Are you suggesting to use them instead of rows? (Then why? What would be the advantage?)
Or are you suggesting to use the columns with the rows? (Then also how (sum?) and why?)

These are cosine similarity scores. It does not matter if you get them (v1, v2) or (v2, v1) (this would flip the columns vs. rows).

Or I did not understand what you are asking?

misterjonathansands · December 1, 2022, 10:21am

I didn’t realize the similarity matrix was symmetric. How exactly is it calculated from the 2 batches and why it symmetric ? (I thought, if S denotes the similarity matrix, S[i,j] would be similarity of row i of batch 1 with row j of batch 2, and S[j, i] would be similarity of row j of batch 1 with row i of batch 2. Which are only the same on the diagonal)

arvyzukai · December 1, 2022, 1:24pm

Ok, to make things concrete… Let’say the batch size is 256 like in the Assignment. Then the data_generator returns a tuple of numpy arrays, for example, let’s say it happens to be that the questions have max_len of 64, then the output from data_generator is a tuple (b1, b2), both of the same shape:

np.array(b1).shape
(256, 64)

np.array(b2).shape
(256, 64)

The Siamese model receives this tuple of inputs (b1, b2). And does the following for each strand:

Embedding d_model=128 → (256, 64, 128) →
LSTM → (256, 64, 128) →
tl.Mean(axis=1) → (256, 128) →
tl.Fn(‘Normalize’… → (256, 128) →

So the output of the model is a tuple of (v1, v2):

v1.shape
(256, 128)

v2.T.shape
(128, 256)

When you dot product the v1 with v2.T, you get scores of shape (256, 256) - the similarities between each 256 questions. As far as I understand here is your initilial question about columns?

The diagonal values of scores are “positives” (similarities between duplicate questions, where i = j), every other value are the “negatives” (similarities between row question and other questions, where i != j). When we calculate mean_negative, we average each “negative” similarity - so the columns disapear but we get what we want - what is the mean similarity to every other question that is not the duplicate.

misterjonathansands · December 3, 2022, 12:30pm

Let me reformulate.
I understand how the output of the model is calculated.
Say we feed (b1, b2) to the model and (v1, v2) are the outputs.
Each row element in v is an encoding of a question at the same row of b.

To compute the similarity matrix, we take the dot product of v1 and v2.T and as you said, the shape is (batch_size, batch_size). I will denote S the resulting similarity matrix.

S[i, j] is the similarity of question i in b1 and question j in b2.
S[j, i] is the similarity of question j in b1 and question i in b2.

For any y, b1[y] and b2[y] are duplicate questions and one can expect that, if the model learns correctly, v1[y] and v2[y] to be similar. But that doesn’t mean the exact same.
So S[y1, y2] = dot(v1[y1], v2[y2]) should in essence be similar at the end to S[y2, y1] = dot(v1[y2], v2[y1]), for any y1 != y2

So now back to the actual question. When we calculate the loss, we compare for each row i, the positive element S[i,i] with elements S[i,j], for every j. Meaning we compare how the similarity computed for the duplicate questions b1[i] and b2[i] fares with the non duplicates b1[i] and b2[j], for every j. (reference question is b1[i] and we compare it with non duplicate questions of b2)

What I was wondering is why we don’t take into account the columns too. Meaning, why don’t we compare how the similarity computed for the duplicates b1[i] and b2[i] fares with the non duplicates b1[j] and b2[i], for every j. (this time the reference question is b2[i] and we compare it with non duplicate questions of b1).

The loss I had in mind would look something like this.

Cost1 = max(
-S[i,i] + (sum over j != i (S[i,j]) + sum over j != i (S[j,i])) / (2 * batch_size - 2) + margin,
0)

Cost2 = max(
( - 2 * S[i,i] + closest neg over j != i (S[i,j]) + closest neg over j != i (S[j,i]) ) / 2 + margin,
0)

Cost = Cost1 + Cost2

arvyzukai · December 5, 2022, 7:24am

I understand your question better now - what essentially you are asking is - why we only focus on b_1 instead of treating both batches equally.

This is a good question and I can only offer my take on it - I think the course creators had in mind that current TripletLossFn implementation is complicated enough, so complicating it further would be hard on learners. (It is one of the hardest exercises in the course already).

But you’re right by asking - since we’ve done most of the work/computations and now, having similarity scores, why not find the closest_negatives and mean_negatives for b2 too (questions on the right side, since they have the same meaning but are not exactly the same words) and then adjust (sum) the losses accordingly. This should have been more efficient.

As an alternative (but still not as efficient loss calculation), we could have incorporated this behavior (switching randomly q_1 with q_2) in data_generator (so that input1 would randomly be appended by q_1 or q_2 and not necessarily q_1; the other q_x would go for input2).

Topic		Replies	Views
Why is Cosine Similarity calculated V2.V1_T and not V1.V2_T in C3W3_Modified_Triplet_Loss? NLP with Sequence Models week-3	11	55	November 5, 2024
C3W3 Triplet Loss value is wrong NLP with Sequence Models week-3	4	263	August 26, 2024
C3W3 Issues with Triplet Loss NLP with Sequence Models week-3	9	41	February 15, 2025
Week4 - Assignment 1 / 3.2. Triplet Loss Convolutional Neural Networks coursera-platform	4	578	March 3, 2022
Rows or Columns? NLP with Classification and Vector Spaces week-4	3	209	April 12, 2024

Use columns of the similarity matrix in Triplet Loss?

Related topics