Hi,
I have completed successfully the first programming assignment from week 3, but I’m trying to fully understand the concepts before moving on.
The main thing I’m not sure if I understood correctly is the parameters “return_sequence” and “return_state” of the LSTM layers.
My current understanding is:
- We set “return_sequence” to True in the pre-attention LSTM so that we get all a_t’ to calculate all the attention weights (alpha)
- In the post-attention LSTM, we set “return_state” to true so that we can get the cell state and leave “return_sequence” as false because we only need one output to calculate one y_pred at a time, instead of calculating them all together.
Are my understandings correct?
And why do we need the cell state if we never use it in the model?