Why use Bi-directional LSTM in encoder and not within Pre-attention decoder

Karan_Bari · November 17, 2024, 6:53am

Within C4W1 Assignment 3rd exercise it states that we need to use Bi-directional LSTM only in encoder and not within pre-attention decoder. I am failing to understand why can’t we use Bi-directional LSTM within it as well?

gent.spah · November 17, 2024, 7:04am

In sequence-to-sequence models with attention mechanisms, the choice of using a Bi-directional LSTM (BiLSTM) in the encoder but not in the decoder is primarily influenced by the roles these components play in processing the input and generating the output.

Encoder

Role: The encoder processes the entire input sequence and creates a context or representation that summarizes the input information.
Bi-directional LSTM: Using a BiLSTM in the encoder is beneficial because it allows the model to capture dependencies from both past and future contexts for each time step in the input sequence. This is particularly useful for understanding the entire input sequence before generating any output.

Decoder

Role: The decoder generates the output sequence one element at a time, often in a sequential manner.
Uni-directional LSTM: In the decoder, the generation of each output token depends on the previously generated tokens. A unidirectional LSTM is typically used because it processes the sequence in a forward-only manner, which aligns with the causal nature of sequence generation. Each step in the decoder depends only on past outputs, not future ones.

Topic		Replies	Views
Bidirectional vs vanilla LSTM NLP with Attention Models week-module-1	6	403	March 4, 2024
Number of LSTM layers in the decoder? NLP with Attention Models week-module-1	1	608	May 21, 2022
Pre and Post-attention LSTM cells in Week 3 Assignment 1 Sequence Models coursera-platform	2	547	January 14, 2023
What context direction of Encoder-Decoder AKA Seq-to-Seq model Generative AI with Large Language Models week-module-1	2	432	July 22, 2023
Decoder Network: Bi-directional RNN over Beam Search Sequence Models coursera-platform	9	279	December 30, 2023

Why use Bi-directional LSTM in encoder and not within Pre-attention decoder

Encoder

Decoder

Related topics