Hidden states of LSTM cells

Hello everyone,

I’m currently doing the Jazz Generation assignment, and I do not really understand what do the hidden states of a LSTM cell represent.

I’ve been confused because we named n_a the number of hidden states and we name a the input of the softmax function. Is there a link between those ? And does the n_a impact the dimension of a ?

Best regards

Hi @Barb ,

I guess following description is more clear.

image
a: hidden state
n_a: dimensions of hidden state

so, if we’ve m examples, the shape of hidden state a is (m, n_a).
To generate the categorical output, we transform the hidden state a to probability distribution y^ through dense+softmax.

Hi @edwardyu thanks for the answer.

I’ve been looking at other posts and saw this picture:

image

Input X is of size B x D where D is the size of one vector. In the Jazz assignment I guess it would be 90.

However, it seems that D is lost in h_t ( I guess a_t using this course’s notation) as well as c_t.

What happens to the outputs of the LSTM cell so they fit y^ expected dimensions ?

Is the dense layer you mention used to redimension the activation state to get the 90 dimension back ?

Links:

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Hi @Barb ,

Yes, you’re right, the dimensions of output y^ is shaped by the output dense layer, which is Step 2D in our exercise.


BTW, in your post, H = n_a.

Hi @edwardyu,

Thanks for the confirmation, I’m glad I got it right because this was a bit confusing.

Do you know what would happen, in theory, if our densor didn’t use n_values but another number ?

What would that represent for the reality of our data ?

Hi @Barb ,

In practice (not only for RNN but for all multi-classes classification tasks), the number of units in the output dense layer depends on the number of classes in the task. In our Jazz generation task, there are 90 types of unique “chord”, thus, the output dense layer has 90 units.