C5W2A2 How to calculate LSTM parameters?

Rishan_Tan · April 15, 2024, 6:47pm

In model = Emojify_V2(…)
model.summary()

We have two LSTM layers

layer output shape param
lstm_2 (LSTM) (None, 10, 128) 91648
lstm_3 (LSTM) (None, 128) 131584

Try to calculate the trainable parameters for LSTM, I am not able to get the right number and am appreciated if someone can tell me

which LSTM cell structure is used in this assignment, Is the following one being used?

image823×467 75.7 KB
can one show me how to get 91648 and 131584?

paulinpaloalto · April 15, 2024, 6:51pm

Sure, in principle, but it’s not immediately clear why that is a good use of anyone’s time. But if you want to pursue that, I can see the following parameters in the above picture:

W_c and b_c
W_u and b_u
W_f and b_f
W_o and b_o

What are the sizes of those various weight and bias objects? Of course you need to know the size of the hidden states in order to compute those.

Also note that the list above does not cover everything: it omits the softmax layer that produces \hat{y}^{<t>}.

Rishan_Tan · April 16, 2024, 2:38am

here is what I think

Size of Wc, Wu, Wf and Wo are the same: (na, na+nx)
size of bc, bu, bf b0 are the same: (na, 1)

So I have a formula for parameters = 4 * {na * (na+nx) +na} + {(ny*na) + ny}
the second term is for SoftMax output if we want to count them.

we have 128 internal states, so na = 128
embedding output to LSTM, nx= m * Tx, where Tx =10 and not sure m

looking into the print out for embedding layer and assume 30 is nx
[‘Embedding’, (None, 4, 2), 30]

for the first LSTM layer, there are no softmax output,
parameters = 4 * {na * (na+nx) +na} = 4 (128 * (128 + 30) + 128) = 81408

let me know where I did wrong. Thanks

paulinpaloalto · April 16, 2024, 3:52am

You’re on the right track, but the n_x values are incorrect. For LSTM1, the point is that the inputs are word embeddings and we are using a GloVe model with 50 dimensional embeddings. So the dimensions of the matrices are:

W_f is (128, 128 + 50)
b_f is (128, 1)

So we have 128 * 179 = 22912
And 4 * 22912 = 91648

For LSTM2, the n_x value is 128, since it is the output of LSTM1.

W_f is (128, 128 + 128)
b_f is (128, 1)

So we have 128 * 257 = 32896
And 4 * 32896 = 131584

The way they structured the model, the softmax shows up in the Dense layer that is later.

Rishan_Tan · April 16, 2024, 4:48am

Thank you for the detail explanation. I got it.

Rishan_Tan · April 16, 2024, 5:30am

Thank for the detail explanation

Rishan

| paulinpaloalto Super Mentor - DLS
April 16 |

| - |

You’re on the right track, but the n_x values are incorrect. For LSTM1, the point is that the inputs are word embeddings and we are using a GloVE model with 50 dimensional embeddings. So the dimensions of the matrices are: