(Jazz Improvisation with LSTM) Why pass LSTM hidden state to dense layer to generate prediction?

mightymagnus · December 24, 2024, 7:23am

Hi, in the week 3 assignment 3, the code uses the LSTM hidden state to be passed as input to the dense layer to generate predictions. My question is why does it use the LSTM hidden state instead of the LSTM cell outputs?

Deepti_Prasad · December 24, 2024, 1:59pm

based on the assignment you are pointing the prediction is based the last dense layer output where it uses softmax activation.

Dense layer
Propagate the LSTM’s hidden state through a dense+softmax layer using densor

paulinpaloalto · December 24, 2024, 5:10pm

It’s an interesting question. Here are the diagrams and the explanations from the previous “Building Your RNN Step by Step” assignment:

My interpretation is that they took the base RNN architecture and then added the cell “memory state” c^{<t>} and the various “gates” to make it easier for the network to learn more sophisticated behavior. You can see that the a^{<t>} values are derived from the c^{<t>} values using the gates and so both of them do influence the \hat{y}^{<t>} values. It’s just that the influence of the c values is a bit more indirect and happens through the a values.

Topic		Replies	Views
Hidden states of LSTM cells Sequence Models coursera-platform	5	580	October 11, 2021
Week 1, Assignment 1, Ex3, No 'dense layer' implemented? Sequence Models coursera-platform	1	531	August 15, 2021
What is the dense layer for in week 3 assignment? NLP with Sequence Models week-module-3	3	594	April 11, 2022
Which line of code uses the trained model? Sequence Models coursera-platform	4	515	May 4, 2023
Problem with understanding tl.Serial NLP with Sequence Models week-module-3	3	602	July 1, 2022

(Jazz Improvisation with LSTM) Why pass LSTM hidden state to dense layer to generate prediction?

Related topics