I’m currently doing the Jazz Generation assignment, and I do not really understand what do the hidden states of a LSTM cell represent.
I’ve been confused because we named n_a the number of hidden states and we name a the input of the softmax function. Is there a link between those ? And does the n_a impact the dimension of a ?
so, if we’ve m examples, the shape of hidden state a is (m, n_a).
To generate the categorical output, we transform the hidden state a to probability distribution y^ through dense+softmax.
In practice (not only for RNN but for all multi-classes classification tasks), the number of units in the output dense layer depends on the number of classes in the task. In our Jazz generation task, there are 90 types of unique “chord”, thus, the output dense layer has 90 units.