C3W4 - Something sounds wrong with the similarity of vectors

Roee_Sfaradi · July 9, 2022, 6:37am

In week4 the similarity between vectors at the output of the Siamese network was defined as s(v_1,v_2)=cos(v_1,v_2). In addition, it was stated that the similarity with the positive example should be trained to be 1 and the similarity with the negative example should be trained as -1.

Having these definitions - If we take all possible triplets from the database, doesn’t it converges all of the output vectors from this model to be on the same axis? meaning that there will be only two possible vectors at the outputs of this model and they will be the negative version of each-other.

This doesn’t makes sense as it divides the world of possible sentences to two groups - all the sentences in the first group has the same meaning that is different from the sentences of the other group that has the same meaning between themselves.

Something is not right here. How is this logical?

Thanks
Roee

reinoudbosch · July 10, 2022, 10:04pm

Hi Roee,

I am not sure I fully understand your question, but maybe this clarifies:

The Siamese network is used to determine similarity and dissimilarity. In theory this is a singular axis. So ideally, similar vectors should point in the exact same (positive) direction while dissimilar vectors should point in the exact opposite (negative) direction of the same axis. It is not about determining meaning in the output, it’s only about similarity and dissimilarity.

Roee_Sfaradi · July 11, 2022, 7:59pm

I’ll try to explain my question by the following example:
suppose we have 6 sentences: s_1,s_2,s_3,s_4,s_5,s_6 such that s_1 and s_2 are similar, s_3 and s_4 are similar, and s_5 and s_6 are similar. Suppose the Siamese network component is f(x) such that f(s_i)=v_i.

If the model has converged to its optimal solution then we have v_1 \cdot v_2^t=1 for every consecutive sentences and v_i \cdot v_j^t=-1 for all the others. Because v_1 \cdot v_2^t=1 and all vectors are normalized it is safe to say that v_1=v_2. For the same reasoning, we can say that v_i=-v_j for all non consecutive vectors.

So in the optimal state we have v_1=-v_3 ,v_3=-v_5 therefore v_1=v_5. But v_1 and v_5 are not consecutive and therefore should not be equal but the negative of each other. So this state of convergence is mathematically impossible - which I suspect is non-stable for such systems - as the optimizer will continue switching vectors values.

Ofcourse that this method eventually works so I think something is helping it.

Following the content of the course, what I think helps in this case are two things:

the operations on the batch sort of reduces this effect (as having the contradicting constraints together may reduce the instability)
The fact that we are not using this similarity measure as is but applying a relu function with an offset that is less than 1 on it - so we don’t really demand to converge to the extreme point.

reinoudbosch · July 12, 2022, 12:37am

Hi again Roee,

Now I get your question.

It seems to me that if the network has been trained sufficiently, then if v1 is found to be dissimilar to v3 and v3 dissimilar to v5, it does not follow that the network will find v1 to be similar to v5. In case v5 has some similarity to v1, the network should have been trained to distinguish between the two vectors.

You mention the operations of the batch and the relu function as contributing to this discriminatory functionality. The dimensionality of the model would seem to be another factor, with the final single dimension of similarity/dissimilarity being an artificial projection on a single axis. So before projection on this single axis, the full dimensional representation of s1 is dissimilar to that of s3 as well as to that of s5, resulting in v1 being dissimilar to both v3 and v5, even if v3 is also dissimilar to v5 (due to the full dimensional representation of s3 being dissimilar to that of s5).

Roee_Sfaradi · July 16, 2022, 7:28am

Hi @reinoudbosch ,

Thank you for your answers!

I understand your intuition behind the dimensionality aspect - but this is not an issue of projection on another axis. When the inner product of two normalized vectors equal to 1 (or the angle between them is zero) - it means that they are the same vector - in all of their dimensions (not just a projection).

In any case, I think I now understands the idea behind this similarity concept.

Topic		Replies	Views
Sequential Model-Quiz2 Sequence Models coursera-platform	2	558	June 28, 2021
Why use siamese network vs. batch processing (with some order)? NLP with Sequence Models week-module-4	1	503	September 20, 2022
Siamese Networks For Comparison AI Discussions	1	59	February 2, 2022
Strange-looking model plot for the Siamese network Custom Models, Layers and Loss Functions with TF week-module-1	8	688	August 3, 2023
Siamese Model Finds Middle Distance AI Discussions	2	41	October 17, 2022

C3W4 - Something sounds wrong with the similarity of vectors

Related topics