# C3W4 - Something sounds wrong with the similarity of vectors

In week4 the similarity between vectors at the output of the Siamese network was defined as s(v_1,v_2)=cos(v_1,v_2). In addition, it was stated that the similarity with the positive example should be trained to be 1 and the similarity with the negative example should be trained as -1.

Having these definitions - If we take all possible triplets from the database, doesn’t it converges all of the output vectors from this model to be on the same axis? meaning that there will be only two possible vectors at the outputs of this model and they will be the negative version of each-other.

This doesn’t makes sense as it divides the world of possible sentences to two groups - all the sentences in the first group has the same meaning that is different from the sentences of the other group that has the same meaning between themselves.

Something is not right here. How is this logical?

Thanks
Roee

Hi Roee,

I am not sure I fully understand your question, but maybe this clarifies:

The Siamese network is used to determine similarity and dissimilarity. In theory this is a singular axis. So ideally, similar vectors should point in the exact same (positive) direction while dissimilar vectors should point in the exact opposite (negative) direction of the same axis. It is not about determining meaning in the output, it’s only about similarity and dissimilarity.

I’ll try to explain my question by the following example:
suppose we have 6 sentences: s_1,s_2,s_3,s_4,s_5,s_6 such that s_1 and s_2 are similar, s_3 and s_4 are similar, and s_5 and s_6 are similar. Suppose the Siamese network component is f(x) such that f(s_i)=v_i.

If the model has converged to its optimal solution then we have v_1 \cdot v_2^t=1 for every consecutive sentences and v_i \cdot v_j^t=-1 for all the others. Because v_1 \cdot v_2^t=1 and all vectors are normalized it is safe to say that v_1=v_2. For the same reasoning, we can say that v_i=-v_j for all non consecutive vectors.

So in the optimal state we have v_1=-v_3 ,v_3=-v_5 therefore v_1=v_5. But v_1 and v_5 are not consecutive and therefore should not be equal but the negative of each other. So this state of convergence is mathematically impossible - which I suspect is non-stable for such systems - as the optimizer will continue switching vectors values.

Ofcourse that this method eventually works so I think something is helping it.

Following the content of the course, what I think helps in this case are two things:

1. the operations on the batch sort of reduces this effect (as having the contradicting constraints together may reduce the instability)
2. The fact that we are not using this similarity measure as is but applying a relu function with an offset that is less than 1 on it - so we don’t really demand to converge to the extreme point.

Hi again Roee,