# Siamese network, sublayer output dimensions

What are the shapes of outputs of each of the layers inside the Siamese network, as I cannot find a simple way to check that in trax?

let’s say we input [2, 10, 20] for [pair of batches, batches, max_len], so what would each layer produce?

• Parallel would remove 1st dimension, so [10, 20]
• Embedding would add 128 in a new dimension, so [10, 20, 128]
• LSTM would remove the 2nd dimension, so [10, 128]
• Mean would reduce the 2nd dimension, so [10, 1]
• so finally, what vector can Normalize act on? certainly on across the batch dimension.

What am I missing here since the Normalize layer should ideally be receiving something like [10, x] where x>1

What you are missing is that LSTM does not “reduce the 2nd dimension”.

In your example LSTM would return you [10, 20, 128], then Mean would reduce “the second dimension” [10, 128] and finally Normalize would “act on” last dimension and would return [10, 128].

Also I’m not sure if you understand the inputs to the model, so just in case I will elaborate:

let’s say we input [2, 10, 20] for [pair of batches, batches, max_len]

In your case ([2, 10, 20]) is 10 pairs of sentences which each is padded/truncated to length 20.

For example, with a single pair (when comparing just a pair of sentences, not a batch of pairs, so the batch size is 1):
Input:
q1 - [‘When’, ‘will’, ‘I’, ‘see’, ‘you’, ‘?’]
q2 - [‘When’, ‘can’, ‘I’, ‘see’, ‘you’, ‘again’, ‘?’]

encoding:
Q1 - [585, 76, 4, 46, 53, 21]
Q2 - [585, 33, 4, 46, 53, 7280, 21]

data_generator - padding and batch dimension:
Q1 - array([[585, 76, 4, 46, 53, 21, 0, 0]])
Q2 - array([[ 585, 33, 4, 46, 53, 7280, 21, 0]])

in this case the input to the model would be a tuple:
(array([[585, 76, 4, 46, 53, 21, 0, 0]]), array([[ 585, 33, 4, 46, 53, 7280, 21, 0]]))

• Parallel would split the tuple

For each strand:

• Embedding - inputs.shape - (1, 8) → outputs.shape - (1, 8, 128)
• LSTM - inputs.shape - (1, 8, 128) → outputs.shape - (1, 8, 128)
• Mean - inputs.shape - (1, 8, 128) → outputs.shape - (1, 128)
• Normalize - inputs.shape - (1, 128) → outputs.shape - (1, 128)
1 Like

Thanks.

can you please confirm, we are using the LSTM in ‘return sequence’ mode, which is why the dimensions are [10, 20,128]?
as a reference, this is what I mean: LSTM layer
Trax documentation is quite terse I’m afraid.

thanks for the elaboration!

My knowledge of Keras is modest and rusty so I’m not sure, but I guess you are right. Anyways, don’t take my word for it

In this case trax default behaviour is more like PyTorch (check the “Outputs” section):

What is kept after each step (word/token) is one output - h (other hidden state c (long memory) is dropped).

For me, it’s easier to understand form trax code than the documentation. If you would follow the code carefully, you could see what LSTM (layer) does:

``````      return cb.Serial(
cb.Scan(LSTMCell(n_units=n_units), axis=1, mode=mode),
cb.Select([0], n_in=2),  # Drop RNN state.
name=f'LSTM_{n_units}', sublayers_to_print=[])
``````

The Scan function applies LSTMCell function progressively and keeps only the first output (`cb.Select([0], n_in=2)`)

That first output is `new_h` that the LSTMCell `forward` method produces.

Also, you might want to check my attempt at explaining how LSTM matrix calculations are done here.