C5W3A1 Neural Machine Translation

wanglanbuaasae · January 11, 2024, 11:41pm

In Exercise2, why we need to specify the initial_state for the post-attention LSTM layer and include the initial hidden state and cell state as Input layer, while not for the pre-attention bi-direction LSTM layer? I think they are both initialized with zeros and no need to specify specifically, right?

Muhammad_John_Abbas · January 12, 2024, 2:32am

Hi @wanglanbuaasae
In neural network architectures with attention mechanisms, specifying the initial state for the post-attention LSTM layer is often necessary to enhance the model’s ability to generate meaningful outputs. While the pre-attention bi-directional LSTM layers commonly initialize states to zeros and adapt during training, the post-attention layer benefits from an informed initialization based on the attention context. By setting the initial state using the context vectors obtained from the attention mechanism, the post-attention LSTM can start decoding with relevant information, allowing the network to focus on specific parts of the input sequence and potentially improving overall performance.

Regards

wanglanbuaasae · January 12, 2024, 8:06am

Thank you so much for your reply!

So setting s0 and c0 to zeros in the 32th code cell, is just for grading purpose or to simplize this exercise. Normally s0 and c0 would be more informed. Am I understanding correctly?

Topic		Replies	Views
Week 3 Assignment 1 Neural_machine_translation_with_attention Sequence Models	1	661	April 21, 2022
W3A1 several questions Sequence Models	1	620	December 4, 2022
No initialization of a0 in input to Bi-LSTM, week 3, ex 1 Sequence Models	2	511	October 10, 2022
W1 A1 - Is the initial hidden state a0 learned in RNN/LSTM? Sequence Models week-1	3	158	April 10, 2024
C4W1_Assignment exercise 3 - decoder NLP with Attention Models week-1	4	278	May 24, 2024

C5W3A1 Neural Machine Translation

Related topics