Why didn’t we add mean layer after the embedding layer in the model in this week’s assignment?
The whole point of RNNs is to capture the information in the sequence - the information flow.
If we use the mean layer after embedding - we would just average the “meanings” (embeddings) of each word and there would be no sequence information as a result - the same thing as Bag of Words.
In other words we don’t want the mean of the whole sequence, we want to the whole sequence step by step.
But in the previous week’s assignment C3_W1_Assignment, we had used mean layer after the embedding layer, aren’t we losing information in that case since we are reducing our trainable parameters?
You are correct @Ashish_Siwach . In the previous week the “Bag of Words” approach was used. The reasons behind this, I think, is to introduce learners to trax library by introducing simple model with just one linear layer to predict the sentiment of the tweet.
In other words, in the previous week the tweet was reduced to single embedding vector and in result we lost some information (tweets “true, the movie was not good” and “not true, the movie was good” would result in same embedding vector (because the mean layer would reduce to the same values) and would result in same sentiment prediction).
If the goal was the performance of the model, then the architecture of the model would have definitely been more sophisticated.
Thanks @arvyzukai for the detailed explanations. Got good clarity of a few concepts I was confused with. Really appreciate the effort.