Siamese network, sublayer output dimensions

What are the shapes of outputs of each of the layers inside the Siamese network, as I cannot find a simple way to check that in trax?

let’s say we input [2, 10, 20] for [pair of batches, batches, max_len], so what would each layer produce?

  • Parallel would remove 1st dimension, so [10, 20]
  • Embedding would add 128 in a new dimension, so [10, 20, 128]
  • LSTM would remove the 2nd dimension, so [10, 128]
  • Mean would reduce the 2nd dimension, so [10, 1]
  • so finally, what vector can Normalize act on? certainly on across the batch dimension.

What am I missing here since the Normalize layer should ideally be receiving something like [10, x] where x>1

Hi @Mohammad_Atif_Khan

What you are missing is that LSTM does not “reduce the 2nd dimension”.

In your example LSTM would return you [10, 20, 128], then Mean would reduce “the second dimension” [10, 128] and finally Normalize would “act on” last dimension and would return [10, 128].

Also I’m not sure if you understand the inputs to the model, so just in case I will elaborate:

let’s say we input [2, 10, 20] for [pair of batches, batches, max_len]

In your case ([2, 10, 20]) is 10 pairs of sentences which each is padded/truncated to length 20.

For example, with a single pair (when comparing just a pair of sentences, not a batch of pairs, so the batch size is 1):
Input:
q1 - [‘When’, ‘will’, ‘I’, ‘see’, ‘you’, ‘?’]
q2 - [‘When’, ‘can’, ‘I’, ‘see’, ‘you’, ‘again’, ‘?’]

encoding:
Q1 - [585, 76, 4, 46, 53, 21]
Q2 - [585, 33, 4, 46, 53, 7280, 21]

data_generator - padding and batch dimension:
Q1 - array([[585, 76, 4, 46, 53, 21, 0, 0]])
Q2 - array([[ 585, 33, 4, 46, 53, 7280, 21, 0]])

in this case the input to the model would be a tuple:
(array([[585, 76, 4, 46, 53, 21, 0, 0]]), array([[ 585, 33, 4, 46, 53, 7280, 21, 0]]))

  • Parallel would split the tuple

For each strand:

  • Embedding - inputs.shape - (1, 8) → outputs.shape - (1, 8, 128)
  • LSTM - inputs.shape - (1, 8, 128) → outputs.shape - (1, 8, 128)
  • Mean - inputs.shape - (1, 8, 128) → outputs.shape - (1, 128)
  • Normalize - inputs.shape - (1, 128) → outputs.shape - (1, 128)
1 Like

Thanks.

can you please confirm, we are using the LSTM in ‘return sequence’ mode, which is why the dimensions are [10, 20,128]?
as a reference, this is what I mean: LSTM layer
Trax documentation is quite terse I’m afraid.

thanks for the elaboration!

Hi @Mohammad_Atif_Khan

My knowledge of Keras is modest and rusty so I’m not sure, but I guess you are right. Anyways, don’t take my word for it :slight_smile:

In this case trax default behaviour is more like PyTorch (check the “Outputs” section):

What is kept after each step (word/token) is one output - h (other hidden state c (long memory) is dropped).

For me, it’s easier to understand form trax code than the documentation. If you would follow the code carefully, you could see what LSTM (layer) does:

      return cb.Serial(
          cb.Scan(LSTMCell(n_units=n_units), axis=1, mode=mode),
          cb.Select([0], n_in=2),  # Drop RNN state.
          name=f'LSTM_{n_units}', sublayers_to_print=[])

The Scan function applies LSTMCell function progressively and keeps only the first output (cb.Select([0], n_in=2))

That first output is new_h that the LSTMCell forward method produces.

Also, you might want to check my attempt at explaining how LSTM matrix calculations are done here.