Question about the use of Bidirectional LSTM for Text Generation

Amir_Fekrazad · October 12, 2022, 2:26am

Hello!

Just finished week 4 material.

I am confused by the use of bidirectional LSTM in the model for text generation. When generating text, the model always looks forward, using the previous words (Let’s say words 0, 1, …, t-1) to predict the next (word t). But the words after that (t+1, t+2, …) do not yet exist to help with the prediction of word t. So how can a bidirectional model be of any use for making predictions?

Am I missing something here?

balaji.ambresh · October 12, 2022, 5:57am

Let’s say you input 5 words to your text generation model. An LSTM layer looks at these words in the order of 0 -> t-1 to generate the output at the next timestep, t. Bidirectional LSTM looks the inputs from both directions i.e. 0 -> t-1 and t-1 -> 0 to generate the output at the next timestep, t.

Amir_Fekrazad · October 12, 2022, 3:05pm

Thank you, Balaji, for the quick response.

I understand that but my point is that the whole point of using bidirectional cells is to allow context from what comes later in the sequences. When predicting word t, if you do’t have words t+1, t+2, …, I guess you just use vectors of 0s for them which defeats the purpose of using bidirectional.

balaji.ambresh · October 12, 2022, 5:54pm

Sure. Bidirectional LSTM would be far more effective than vanilla LSTM when it comes to predicting a word, say, in the middle of a sentence.

However, for end of sequence prediction (i.e. text generation), vanilla LSTM is a good choice over bidirectional LSTM since vanilla LSTM in is faster to train and has fewer parameters than a bidirectional LSTM layer, all for the same level of accuracy as you pointed out.

balaji.ambresh · October 13, 2022, 7:05pm

One detail. Bidirectional layers are meant to carry information over longer input sequences than their unidirectional variants. You can read about it here. It’s worth experimenting to see if training with a bidirectional layer is worth the extra parameters.

talwrii · October 17, 2024, 1:52pm

Yeah, I thought this.

It would be different if you had more than layer. E.g.

tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(100, return_sequences=True))
tf.keras.layers.LSTM(100)

Because then the later words in the input string would help work out the meaning of earlier words in the seconds LSTM. Then the second LSTM would pass this information forward to the last cell.

I don’t really think C3_W4_Lab_1.ipynb should be using bidirectional with only one LSTM layer.

balaji.ambresh · October 17, 2024, 5:23pm

There’s one detail to note here.

Bidirectional layers lead to more stable gradient propagation across longer sentences since the sentence is processed from both directions.

So, there’s nothing wrong with using them in place of unidirectional LSTMs.
See this as well.

talwrii · October 17, 2024, 6:39pm

Sure. My point is that if you have one bidirectional LSTM and you are only predicting the last token, then only one cell of the backward LSTM gets used.

balaji.ambresh · October 18, 2024, 3:10am

Please don’t mix the concept of viewing an unravelled view of an RNN type and the actual representation of an RNN type.

While the unravelled view is meant for understanding purposes, the same RNN setup is used across all timesteps (see BPTT).

Since the same RNN type cell is use, it helps with better learning, especially across longer sequences.

Topic		Replies	Views
Bidirectional vs vanilla LSTM NLP with Attention Models week-module-1	6	356	March 4, 2024
LSTM vs bidirectional Sequence Models week-module-2 , coursera-platform	3	27	February 4, 2025
Bidirectional layer for time series forecasting Sequences, Time Series and Prediction week-module-3	3	494	July 21, 2023
Course3 Week4. Why was kernel regularizer, Bi-LSTM used? Natural Language Processing in TensorFlow week-module-4	1	416	December 21, 2021
Why use Bi-directional LSTM in encoder and not within Pre-attention decoder NLP with Sequence Models week-module-1	1	50	November 17, 2024

Question about the use of Bidirectional LSTM for Text Generation

Related topics