Could anyone please clarify why the 2nd layer of LSTM has return_sequences= False?
So what is the output we are expecting from this layer and where is it passed?
Also the graph of the model points the output of this layer outside. Hence there is some confusion. Request some clarity here.
I have the same question, so I am reopening this thread as it wasn’t really answered.
What is the purpose of the return_sequences parameter? This is not explained well in the documentation.
Why do we set it to True in the first LSTM layer, and False in the second?
Related to this question (and possibly the answer) - in a Week 1 assignment, we used loop to build a recurrent model. Here we don’t do it, but I assume that Keras somehow does it automatically.
Little more detailed explanation and answer to the original post would be appreciated.
I was able to figure out the answer myself.
Since you have the same query, let me clarify the same for you.
The assignment is a classification problem since we want to classify each of the input sentences to an emoji.
Let’s say we keep the model simple by having only 1 LSTM layer. When the output of the embedding layer is passed as an input to LSTM, a single output is expected, hence the argument ‘return_sequences =False’.
In this assignment, the model is a bit more complex, with an additional LSTM layer. The LSTM layer accepts inputs which corresponds to the length of input sentence, so it expects a sequence as an input rather than a single value. Hence the first layer mentions ‘return_sequences= True’. This gets passed as input to the 2nd LSTM layer but here we mention ‘return_sequence= False’ as want a single output ie the output from the last unit of the last timestep of LSTM layer.
I hope this clarifies your doubt as well.
To my second question - how do activations get propagated across the LSTM nodes in the sequence? (In a week 1 assignment we had to do it programatically, in a loop)?
I don’t think it has anything to do with the shape of the input. From the documentation:
“By default, the output of a RNN layer contains a single vector per sample… The shape of this output is (batch_size, units)”
" A RNN layer can also return the entire sequence of outputs for each sample (one vector per timestep per sample), if you set return_sequences=True . The shape of this output is (batch_size, timesteps, units)"
So the reason we set return_sequences=True is because we want to output a vector for each word (aka timestep) in each sentence sample, not a vector for each sentence sample by default.
And that begs the question why we need this form of output in the first LSTM but not the second? It would be great if anyone can share their thought on this.
Yes, you have rightly explained why we are setting return_sequences as True in the first place.
[quote="realnoob, post:8, topic:73088"]
So the reason we set `return_sequences=True` is because we want to output a vector for each word (aka timestep) in each sentence sample, not a vector for each sentence sample by default.
[/quote]