LSTM Layer in Siamese Network

Could someone please explain the intuition behind why we add an LSTM layer in a Siamese network? In other words, on a high (intuitive level), what is the added value of that layer compared to going directly from the embedding layer to the calculation of the similarity?

Thanks in advance.

Hi @Davit_Khachatryan

Please take a look at my response what is embedding layer doing (so I would not need to repeat it here).

On a high level LSTM “process” the sequence (after embedding layer “processed” the words). As you will see later, there are more sophisticated ways to process the sequence (“Attention” layers and it’s variations) but the whole point of the LSTM (or other RNN architecture) is to account for words sequence succession.

If it was just the embedding layer, you would always get the same output for the same word (“Refrigerator” would always get you eg. [3.14, 2.5, -0.2, 0.1, 1.2]).
But what LSTM layer does, is that it “looks” at the words before (if its one directional) and changes the output accordingly (for eg. output for “Refrigerator” down the sequence would be the same, only and only when all the words prior were the same). Most of the time the output from LSTM when it encounters the word “Refrigerator” would be slightly different because most of the time sentences are not word for word.

Cheers