Matrix size for every step

Hi, @Peixi_Zhu

In the Assignment, after part " 4 - Testing", you can print the model to see what it looks like:

> Accelerate_in2_out2[
>   Serial_in2_out2[
>     Select[0,1,0,1]_in2_out4
>     Parallel_in2_out2[
>       Serial[
>         Embedding_33300_1024
>         LSTM_1024
>         LSTM_1024
>       ]
>       Serial[
>         Serial[
>           ShiftRight(1)
>         ]
>         Embedding_33300_1024
>         LSTM_1024
>       ]
>     ]
>     PrepareAttentionInput_in3_out4
>     Serial_in4_out2[
>       Branch_in4_out3[
>         None
>         Serial_in4_out2[
>           _in4_out4
>           Serial_in4_out2[
>             Parallel_in3_out3[
>               Dense_1024
>               Dense_1024
>               Dense_1024
>             ]
>             PureAttention_in4_out2
>             Dense_1024
>           ]
>           _in2_out2
>         ]
>       ]
>       Add_in2
>     ]
>     Select[0,2]_in3_out2
>     LSTM_1024
>     LSTM_1024
>     Dense_33300
>     LogSoftmax
>   ]
> ]

In order to get the actual weight matrices (and shapes of them) you can play around with the model variable, like:
image

This is the embedding weights of shape (33300, 1024), or (vocab_size, dim_for_LSTM).

Embedding layer uses vocab_size to initiate this weight matrix (each token is a vector of 1024 numbers).

Other layers, like LSTM has more complicated weight matrices (as you know, the LSTMs are more complicated calculations). For example:
model.sublayers[0].sublayers[1].sublayers[0].sublayers[1].weights[1][0][0].shape would result in (2048, 4096), which means that one weight matrix hold parameters for more than one component. You can find this post interesting (about embedding and LSTM calculations),

Cheers

1 Like