Week 3 Assignment 1 Neural_machine_translation_with_attention

When we implement the model, why do we only Define s0 (initial hidden state) and c0 (initial cell state) for the decoder LSTM (post-attention LSTM )with shape (n_s,), but not for the pre-attention LSTM?

For the pre-attention LSTM, we didn’t define the initial hidden state and initial cell state explicitly .

Hopefully you were able to resolve your question.