Why use Bi-directional LSTM in encoder and not within Pre-attention decoder

Within C4W1 Assignment 3rd exercise it states that we need to use Bi-directional LSTM only in encoder and not within pre-attention decoder. I am failing to understand why can’t we use Bi-directional LSTM within it as well?

In sequence-to-sequence models with attention mechanisms, the choice of using a Bi-directional LSTM (BiLSTM) in the encoder but not in the decoder is primarily influenced by the roles these components play in processing the input and generating the output.

Encoder

  • Role: The encoder processes the entire input sequence and creates a context or representation that summarizes the input information.
  • Bi-directional LSTM: Using a BiLSTM in the encoder is beneficial because it allows the model to capture dependencies from both past and future contexts for each time step in the input sequence. This is particularly useful for understanding the entire input sequence before generating any output.

Decoder

  • Role: The decoder generates the output sequence one element at a time, often in a sequential manner.
  • Uni-directional LSTM: In the decoder, the generation of each output token depends on the previously generated tokens. A unidirectional LSTM is typically used because it processes the sequence in a forward-only manner, which aligns with the causal nature of sequence generation. Each step in the decoder depends only on past outputs, not future ones.
1 Like