Matrix size for every step

Peixi_Zhu · July 28, 2023, 8:35am

Hi,

In the neural machine translation assignment, I want to know the matrix size after each layer. Is there a reference?

In particular, for the pre-attention decoder what is the size after embedding layer? Also what does embedding layer use vocab_size param for?

arvyzukai · July 28, 2023, 12:19pm

Hi, @Peixi_Zhu

In the Assignment, after part " 4 - Testing", you can print the model to see what it looks like:

> Accelerate_in2_out2[
>   Serial_in2_out2[
>     Select[0,1,0,1]_in2_out4
>     Parallel_in2_out2[
>       Serial[
>         Embedding_33300_1024
>         LSTM_1024
>         LSTM_1024
>       ]
>       Serial[
>         Serial[
>           ShiftRight(1)
>         ]
>         Embedding_33300_1024
>         LSTM_1024
>       ]
>     ]
>     PrepareAttentionInput_in3_out4
>     Serial_in4_out2[
>       Branch_in4_out3[
>         None
>         Serial_in4_out2[
>           _in4_out4
>           Serial_in4_out2[
>             Parallel_in3_out3[
>               Dense_1024
>               Dense_1024
>               Dense_1024
>             ]
>             PureAttention_in4_out2
>             Dense_1024
>           ]
>           _in2_out2
>         ]
>       ]
>       Add_in2
>     ]
>     Select[0,2]_in3_out2
>     LSTM_1024
>     LSTM_1024
>     Dense_33300
>     LogSoftmax
>   ]
> ]

In order to get the actual weight matrices (and shapes of them) you can play around with the model variable, like:

This is the embedding weights of shape (33300, 1024), or (vocab_size, dim_for_LSTM).

Embedding layer uses vocab_size to initiate this weight matrix (each token is a vector of 1024 numbers).

Other layers, like LSTM has more complicated weight matrices (as you know, the LSTMs are more complicated calculations). For example:
model.sublayers[0].sublayers[1].sublayers[0].sublayers[1].weights[1][0][0].shape would result in (2048, 4096), which means that one weight matrix hold parameters for more than one component. You can find this post interesting (about embedding and LSTM calculations),

Cheers

Peixi_Zhu · July 29, 2023, 7:43am

Hi, @arvyzukai

Thanks for the response.

Please correct me if I am wrong. It seems to me model.sublayers[0].sublayers [1].sublayers[0].sublayers[0].weights.shape gives me the size of the embedding layer weight. Say the input for embedding layer has n tokens. Does the layer generates output of shape [n, 1024]? Basically from the [33000, 1024] matrix it extracts the rows corresponding to the input token indexes, correct?

In general, does TRAX tell you the input/output size of each layer?

Is the embedding layer a trainable layer?

arvyzukai · July 31, 2023, 6:40am

Hi @Peixi_Zhu

Yes, you understand that correctly. In addition, there is usually a batch_size in front.

In other words, if we have [n_sentences, n_tokens_padded] input (n_sentences here is equivalent to batch_size), then the output from the embedding layer is [n_sentences, n_tokens_padded, embedding_size] (for example, (32, 64, 1024)). A simple example.

I’m not sure I understand. In general, you are the one who tells trax what size of each layer you want (and you are the one who has to make sure they are reasonable).

Yes, absolutely. Under the hood it is very similar to Dense (linear) layer, like you said in the first question - it takes n’th token (for example 54) and returns some vector (for example 1024 long row of numbers) which are updated according to the loss (during training).

Cheers

Peixi_Zhu · August 1, 2023, 8:10pm

Thanks, @arvyzukai.

“I’m not sure I understand. In general, you are the one who tells trax what size of each layer you want (and you are the one who has to make sure they are reasonable).”

For this part, I am not saying the hyper parameters like number of neurons, etc. I am asking that once those hyper params are fixed and with given input data, is there a way to check the dimensions of the data (immediate or output) when it pass through each layer. That will help me better understand details of the model, e.g., the attention layer.

Topic		Replies	Views
Embedding Layer input and output meaning Natural Language Processing in TensorFlow week-2 , week-3 , week-4	5	697	April 17, 2022
Support with C4W1 assignment - NLP with attention models NLP with Attention Models feedback , week-1	2	193	May 31, 2024
Dimension of weight matrices NLP with Attention Models week-1	1	489	December 5, 2022
NMT with Attention Model NLP with Attention Models	2	375	January 2, 2024
C3_W4 UNQ_C5 : problem with loading the weights NLP with Sequence Models week-4	10	741	October 25, 2023

Matrix size for every step

Related topics