In Exercise2, why we need to specify the initial_state for the post-attention LSTM layer and include the initial hidden state and cell state as Input layer, while not for the pre-attention bi-direction LSTM layer? I think they are both initialized with zeros and no need to specify specifically, right?
In neural network architectures with attention mechanisms, specifying the initial state for the post-attention LSTM layer is often necessary to enhance the model’s ability to generate meaningful outputs. While the pre-attention bi-directional LSTM layers commonly initialize states to zeros and adapt during training, the post-attention layer benefits from an informed initialization based on the attention context. By setting the initial state using the context vectors obtained from the attention mechanism, the post-attention LSTM can start decoding with relevant information, allowing the network to focus on specific parts of the input sequence and potentially improving overall performance.
Thank you so much for your reply!
So setting s0 and c0 to zeros in the 32th code cell, is just for grading purpose or to simplize this exercise. Normally s0 and c0 would be more informed. Am I understanding correctly?