Hidden states of LSTM cells

Barb · October 11, 2021, 12:20am

Hello everyone,

I’m currently doing the Jazz Generation assignment, and I do not really understand what do the hidden states of a LSTM cell represent.

I’ve been confused because we named n_a the number of hidden states and we name a the input of the softmax function. Is there a link between those ? And does the n_a impact the dimension of a ?

Best regards

edwardyu · October 11, 2021, 1:09am

Hi @Barb ,

I guess following description is more clear.

a: hidden state
n_a: dimensions of hidden state

so, if we’ve m examples, the shape of hidden state a is (m, n_a).
To generate the categorical output, we transform the hidden state a to probability distribution y^ through dense+softmax.

Barb · October 11, 2021, 1:58am

Hi @edwardyu thanks for the answer.

I’ve been looking at other posts and saw this picture:

Input X is of size B x D where D is the size of one vector. In the Jazz assignment I guess it would be 90.

However, it seems that D is lost in h_t ( I guess a_t using this course’s notation) as well as c_t.

What happens to the outputs of the LSTM cell so they fit y^ expected dimensions ?

Is the dense layer you mention used to redimension the activation state to get the 90 dimension back ?

Links:

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

edwardyu · October 11, 2021, 7:13am

Hi @Barb ,

Yes, you’re right, the dimensions of output y^ is shaped by the output dense layer, which is Step 2D in our exercise.

BTW, in your post, H = n_a.

Barb · October 11, 2021, 7:24am

Hi @edwardyu,

Thanks for the confirmation, I’m glad I got it right because this was a bit confusing.

Do you know what would happen, in theory, if our densor didn’t use n_values but another number ?

What would that represent for the reality of our data ?

edwardyu · October 11, 2021, 12:54pm

Hi @Barb ,

In practice (not only for RNN but for all multi-classes classification tasks), the number of units in the output dense layer depends on the number of classes in the task. In our Jazz generation task, there are 90 types of unique “chord”, thus, the output dense layer has 90 units.

Topic		Replies	Views
W1 A1 dimensions of n_a and n_y? Sequence Models week-1	2	7	December 13, 2024
RNN Model Wa dimension Sequence Models	1	529	August 20, 2022
Week 1 Jazz djmodel LSTM function Sequence Models	2	410	July 31, 2023
Visualizing what a multi-dimension hidden state looks like Sequence Models	1	472	April 3, 2023
Week1 - What is the weight Waa? Sequence Models week-2	1	52	June 22, 2024

Hidden states of LSTM cells

Related topics