I’m currently doing the Jazz Generation assignment, and I do not really understand what do the hidden states of a LSTM cell represent.
I’ve been confused because we named
n_a the number of hidden states and we name a the input of the softmax function. Is there a link between those ? And does the n_a impact the dimension of a ?
I guess following description is more clear.
a: hidden state
n_a: dimensions of hidden state
so, if we’ve
m examples, the shape of hidden state
To generate the categorical output, we transform the hidden state
a to probability distribution
@edwardyu thanks for the answer.
I’ve been looking at other posts and saw this picture:
Input X is of size B x D where D is the size of one vector. In the Jazz assignment I guess it would be
However, it seems that D is lost in h_t ( I guess a_t using this course’s notation) as well as c_t.
What happens to the outputs of the LSTM cell so they fit y^ expected dimensions ?
Is the dense layer you mention used to redimension the activation state to get the
90 dimension back ?
Answer by Ashutosh Choudhary, M.S. Natural Language Processing & Deep Learning, University of Massachusetts, Amherst (2018)
Yes, you’re right, the dimensions of output
y^ is shaped by the output
dense layer, which is
Step 2D in our exercise.
BTW, in your post,
H = n_a
Thanks for the confirmation, I’m glad I got it right because this was a bit confusing.
Do you know what would happen, in theory, if our densor didn’t use
n_values but another number ?
What would that represent for the reality of our data ?
In practice (not only for RNN but for all multi-classes classification tasks), the number of units in the output dense layer depends on the number of classes in the task. In our Jazz generation task, there are 90 types of unique “chord”, thus, the output dense layer has 90 units.