Siamese network, sublayer output dimensions

arvyzukai · November 28, 2022, 8:58am

My knowledge of Keras is modest and rusty so I’m not sure, but I guess you are right. Anyways, don’t take my word for it

In this case trax default behaviour is more like PyTorch (check the “Outputs” section):

What is kept after each step (word/token) is one output - h (other hidden state c (long memory) is dropped).

For me, it’s easier to understand form trax code than the documentation. If you would follow the code carefully, you could see what LSTM (layer) does:

      return cb.Serial(
          cb.Scan(LSTMCell(n_units=n_units), axis=1, mode=mode),
          cb.Select([0], n_in=2),  # Drop RNN state.
          name=f'LSTM_{n_units}', sublayers_to_print=[])

The Scan function applies LSTMCell function progressively and keeps only the first output (cb.Select([0], n_in=2))

That first output is new_h that the LSTMCell forward method produces.

Also, you might want to check my attempt at explaining how LSTM matrix calculations are done here.

Topic		Replies	Views
Max_len different for each batch in Siamese network assignment NLP with Sequence Models week-4	3	532	November 25, 2022
Understanding the loss of this many to many architecture with LSTM layer NLP with Sequence Models week-3	1	544	April 1, 2022
Trax and mean layer NLP with Sequence Models week-1	4	570	December 3, 2022
C3_W4 UNQ_C5 : problem with loading the weights NLP with Sequence Models week-4	10	740	October 25, 2023
C5: W2: Emojify_v2: LSTM Layers Clarification Sequence Models	2	645	November 22, 2021

Siamese network, sublayer output dimensions

Related topics