(Jazz Improvisation with LSTM) Why pass LSTM hidden state to dense layer to generate prediction?

Hi, in the week 3 assignment 3, the code uses the LSTM hidden state to be passed as input to the dense layer to generate predictions. My question is why does it use the LSTM hidden state instead of the LSTM cell outputs?

based on the assignment you are pointing the prediction is based the last dense layer output where it uses softmax activation.

Dense layer
Propagate the LSTM’s hidden state through a dense+softmax layer using densor

It’s an interesting question. Here are the diagrams and the explanations from the previous “Building Your RNN Step by Step” assignment:



My interpretation is that they took the base RNN architecture and then added the cell “memory state” c^{<t>} and the various “gates” to make it easier for the network to learn more sophisticated behavior. You can see that the a^{<t>} values are derived from the c^{<t>} values using the gates and so both of them do influence the \hat{y}^{<t>} values. It’s just that the influence of the c values is a bit more indirect and happens through the a values.

1 Like