Siamese network, sublayer output dimensions

Mohammad_Atif_Khan · November 25, 2022, 2:52am

What are the shapes of outputs of each of the layers inside the Siamese network, as I cannot find a simple way to check that in trax?

let’s say we input [2, 10, 20] for [pair of batches, batches, max_len], so what would each layer produce?

Parallel would remove 1st dimension, so [10, 20]
Embedding would add 128 in a new dimension, so [10, 20, 128]
LSTM would remove the 2nd dimension, so [10, 128]
Mean would reduce the 2nd dimension, so [10, 1]
so finally, what vector can Normalize act on? certainly on across the batch dimension.

What am I missing here since the Normalize layer should ideally be receiving something like [10, x] where x>1

arvyzukai · November 25, 2022, 7:51am

Hi @Mohammad_Atif_Khan

What you are missing is that LSTM does not “reduce the 2nd dimension”.

In your example LSTM would return you [10, 20, 128], then Mean would reduce “the second dimension” [10, 128] and finally Normalize would “act on” last dimension and would return [10, 128].

Also I’m not sure if you understand the inputs to the model, so just in case I will elaborate:

let’s say we input [2, 10, 20] for [pair of batches, batches, max_len]

In your case ([2, 10, 20]) is 10 pairs of sentences which each is padded/truncated to length 20.

For example, with a single pair (when comparing just a pair of sentences, not a batch of pairs, so the batch size is 1):
Input:
q1 - [‘When’, ‘will’, ‘I’, ‘see’, ‘you’, ‘?’]
q2 - [‘When’, ‘can’, ‘I’, ‘see’, ‘you’, ‘again’, ‘?’]

encoding:
Q1 - [585, 76, 4, 46, 53, 21]
Q2 - [585, 33, 4, 46, 53, 7280, 21]

data_generator - padding and batch dimension:
Q1 - array([[585, 76, 4, 46, 53, 21, 0, 0]])
Q2 - array([[ 585, 33, 4, 46, 53, 7280, 21, 0]])

in this case the input to the model would be a tuple:
(array([[585, 76, 4, 46, 53, 21, 0, 0]]), array([[ 585, 33, 4, 46, 53, 7280, 21, 0]]))

Parallel would split the tuple

For each strand:

Embedding - inputs.shape - (1, 8) → outputs.shape - (1, 8, 128)
LSTM - inputs.shape - (1, 8, 128) → outputs.shape - (1, 8, 128)
Mean - inputs.shape - (1, 8, 128) → outputs.shape - (1, 128)
Normalize - inputs.shape - (1, 128) → outputs.shape - (1, 128)

Mohammad_Atif_Khan · November 27, 2022, 2:32am

Thanks.

can you please confirm, we are using the LSTM in ‘return sequence’ mode, which is why the dimensions are [10, 20,128]?
as a reference, this is what I mean: LSTM layer
Trax documentation is quite terse I’m afraid.

thanks for the elaboration!

arvyzukai · November 28, 2022, 8:58am

Hi @Mohammad_Atif_Khan

My knowledge of Keras is modest and rusty so I’m not sure, but I guess you are right. Anyways, don’t take my word for it

In this case trax default behaviour is more like PyTorch (check the “Outputs” section):

What is kept after each step (word/token) is one output - h (other hidden state c (long memory) is dropped).

For me, it’s easier to understand form trax code than the documentation. If you would follow the code carefully, you could see what LSTM (layer) does:

      return cb.Serial(
          cb.Scan(LSTMCell(n_units=n_units), axis=1, mode=mode),
          cb.Select([0], n_in=2),  # Drop RNN state.
          name=f'LSTM_{n_units}', sublayers_to_print=[])

The Scan function applies LSTMCell function progressively and keeps only the first output (cb.Select([0], n_in=2))

That first output is new_h that the LSTMCell forward method produces.

Also, you might want to check my attempt at explaining how LSTM matrix calculations are done here.

Topic		Replies	Views
Trax and mean layer NLP with Sequence Models week-module-1	4	596	December 3, 2022
C5: W2: Emojify_v2: LSTM Layers Clarification Sequence Models coursera-platform	2	657	November 22, 2021
LSTM Layer in Siamese Network NLP with Sequence Models week-module-4	1	641	September 14, 2022
Max_len different for each batch in Siamese network assignment NLP with Sequence Models week-module-4	3	540	November 25, 2022
Number of LSTM units in Trax NLP with Sequence Models week-module-3	12	1314	January 12, 2023

Siamese network, sublayer output dimensions

Related topics