Why use siamese network vs. batch processing (with some order)?

Could you explain why we rely on Siamese network vs. ordered batch in “simplex” network ?

As the 2 branches do the same computations, we could apply the model to the batch, compute the loss based on the batch result (as we have a similarity matrix from the question index).

Handling of the batches and indexes might be a little less straightforward, but the result would be the same.
I would expect that it is slightly better in memory, as only one circuit needs to be kept.

Hi Florent_Tho,

You can find your answer here (scroll to the bottom).