W3A1 several questions

Chixing_Wei · November 23, 2022, 10:48am

For the pre-attention Bi-LSTM step, dont we need to initialize the hidden states a<0> to zeros? But we do have to initialize the hidden state s<0> and cell state c<0> for the post-attention LSTM. Why is it so?
Exercise 3, it says “outputs[i][j] is the true label of the jth character in the ith training example.” However, after the code: “outputs = list(Yoh.swapaxes(0,1))”, should it be: ith character in the jth training example? since the shape of outputs has turned to be (10, 10000, 11)

Thank you!

reinoudbosch · December 4, 2022, 12:44am

Hi Chixing_Wei,

a is defined in the following line:

a = Bidirectional(LSTM(units=n_a, return_sequences=True))(X)

As described here, if not provided with a list of initial state tensors, the first call of the LSTM creates zero-filled initial state tensors.

As to your second question,

outputs = list(Yoh.swapaxes(0,1))

turns Yoh into a list of length 10 (Ty) containing arrays with shape (10000, 11) (m, number of characters). So outputs[0] contains the first array in the list, i.e., the first training example, while outputs[0][0] contains the true label of the first character in the first training example, outputs[0][1] the true label of the second character in the first training example, and so on.

Topic		Replies	Views
C5W3A1 Neural Machine Translation Sequence Models week-3	2	266	January 12, 2024
No initialization of a0 in input to Bi-LSTM, week 3, ex 1 Sequence Models	2	511	October 10, 2022
Week 3 Assignment 1 Neural_machine_translation_with_attention Sequence Models	1	661	April 21, 2022
The input of bi-directional layer Sequence Models	8	582	June 25, 2021
Week 3, Neural_machine_translation_with_attention Sequence Models	2	582	July 3, 2021

W3A1 several questions

Related topics