Hi, @Peixi_Zhu
In the Assignment, after part " 4 - Testing", you can print the model to see what it looks like:
> Accelerate_in2_out2[
> Serial_in2_out2[
> Select[0,1,0,1]_in2_out4
> Parallel_in2_out2[
> Serial[
> Embedding_33300_1024
> LSTM_1024
> LSTM_1024
> ]
> Serial[
> Serial[
> ShiftRight(1)
> ]
> Embedding_33300_1024
> LSTM_1024
> ]
> ]
> PrepareAttentionInput_in3_out4
> Serial_in4_out2[
> Branch_in4_out3[
> None
> Serial_in4_out2[
> _in4_out4
> Serial_in4_out2[
> Parallel_in3_out3[
> Dense_1024
> Dense_1024
> Dense_1024
> ]
> PureAttention_in4_out2
> Dense_1024
> ]
> _in2_out2
> ]
> ]
> Add_in2
> ]
> Select[0,2]_in3_out2
> LSTM_1024
> LSTM_1024
> Dense_33300
> LogSoftmax
> ]
> ]
In order to get the actual weight matrices (and shapes of them) you can play around with the model
variable, like:
This is the embedding weights of shape (33300, 1024), or (vocab_size, dim_for_LSTM).
Embedding layer uses vocab_size to initiate this weight matrix (each token is a vector of 1024 numbers).
Other layers, like LSTM has more complicated weight matrices (as you know, the LSTMs are more complicated calculations). For example:
model.sublayers[0].sublayers[1].sublayers[0].sublayers[1].weights[1][0][0].shape
would result in (2048, 4096), which means that one weight matrix hold parameters for more than one component. You can find this post interesting (about embedding and LSTM calculations),
Cheers