Matrix size for every step

arvyzukai · July 28, 2023, 12:19pm

In the Assignment, after part " 4 - Testing", you can print the model to see what it looks like:

> Accelerate_in2_out2[
>   Serial_in2_out2[
>     Select[0,1,0,1]_in2_out4
>     Parallel_in2_out2[
>       Serial[
>         Embedding_33300_1024
>         LSTM_1024
>         LSTM_1024
>       ]
>       Serial[
>         Serial[
>           ShiftRight(1)
>         ]
>         Embedding_33300_1024
>         LSTM_1024
>       ]
>     ]
>     PrepareAttentionInput_in3_out4
>     Serial_in4_out2[
>       Branch_in4_out3[
>         None
>         Serial_in4_out2[
>           _in4_out4
>           Serial_in4_out2[
>             Parallel_in3_out3[
>               Dense_1024
>               Dense_1024
>               Dense_1024
>             ]
>             PureAttention_in4_out2
>             Dense_1024
>           ]
>           _in2_out2
>         ]
>       ]
>       Add_in2
>     ]
>     Select[0,2]_in3_out2
>     LSTM_1024
>     LSTM_1024
>     Dense_33300
>     LogSoftmax
>   ]
> ]

In order to get the actual weight matrices (and shapes of them) you can play around with the model variable, like:

This is the embedding weights of shape (33300, 1024), or (vocab_size, dim_for_LSTM).

Embedding layer uses vocab_size to initiate this weight matrix (each token is a vector of 1024 numbers).

Other layers, like LSTM has more complicated weight matrices (as you know, the LSTMs are more complicated calculations). For example:
model.sublayers[0].sublayers[1].sublayers[0].sublayers[1].weights[1][0][0].shape would result in (2048, 4096), which means that one weight matrix hold parameters for more than one component. You can find this post interesting (about embedding and LSTM calculations),

Cheers

Topic		Replies	Views
How does trax word embedding layer work? NLP with Sequence Models week-module-1	5	750	July 29, 2023
How does attention work NLP with Attention Models course-related , week-module-1 , conceptual-question	1	263	May 1, 2024
Embedding Layer input and output meaning Natural Language Processing in TensorFlow week-module-2 , week-module-3 , week-module-4	5	771	April 17, 2022
Week2, emojify v_2, Embedding layer Sequence Models coursera-platform	2	534	April 21, 2022
Wk2 doubt about dimensions specified in Embedding() Sequence Models coursera-platform	14	695	October 9, 2022

Matrix size for every step

Related topics