I am having trouble understanding LSTM layer’s return_sequences argument. The following is from the official documentation:
return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence. Default: False
.
What is the output sequence? When should we use the value True and when should we use the value False?
Thank you for your attention.
1 Like
The question is whether you return the sequence of all the hidden states from all the “timesteps” or whether you return only the last hidden state of the last element of the sequence. For example, here we have 10 “timesteps”, because each input sentence is limited to 10 words. We have two layers of LSTM and the first one feeds into the second one. They tell you in the comments in the template code whether you are supposed to return the full sequence or only the last hidden state.
The fundamental idea is that in this case we are trying to distill the answer down to a single value which is the selection of the appropriate emoji for the meaning of the sentence. So it makes sense that the later LSTM layer only outputs one state, but the input to that second LSTM layer benefits from being able to learn from all the transitions in the complete sequence of hidden states of the first LSTM layer.
1 Like
I would just add, as at first I found this change in language a bit confusing, by ‘hidden states’-- The True / False relates to ‘give me all the activations/or not’.
I mean yes, they are ‘hidden’. I think we know that by now. But that is what you are actually getting.
*LSTM @paulinpaloalto is talking about might be more complicated-- I don’t know, will it also give you the intermediate cell states ?
1 Like
Yes, maybe calling them “hidden states” is not really the right terminology. That was what we called them in C5 Week 1. Calling them intermediate cell states is a better term, but it’s the same thing.
1 Like
I had issues with my code as well due to this seemingly cryptic statement in the documentation. What I eventually understood, as you may have already realized, is that setting True returns the full sequence while setting False will only return the output sequence. Returning full sequence indicates returning the sequences at all time stamps, hence the dimension is 3D, while returning just 1 sequence makes the dimension 2D.
2 Likes