When we implement the model, why do we only Define s0 (initial hidden state) and c0 (initial cell state) for the decoder LSTM (post-attention LSTM )with shape (n_s,), but not for the pre-attention LSTM?
For the pre-attention LSTM, we didn’t define the initial hidden state and initial cell state explicitly .