W3A1 several questions

  1. For the pre-attention Bi-LSTM step, dont we need to initialize the hidden states a<0> to zeros? But we do have to initialize the hidden state s<0> and cell state c<0> for the post-attention LSTM. Why is it so?

  2. Exercise 3, it says “outputs[i][j] is the true label of the jth character in the ith training example.” However, after the code: “outputs = list(Yoh.swapaxes(0,1))”, should it be: ith character in the jth training example? since the shape of outputs has turned to be (10, 10000, 11)

Thank you!

Hi Chixing_Wei,

a is defined in the following line:

a = Bidirectional(LSTM(units=n_a, return_sequences=True))(X)

As described here, if not provided with a list of initial state tensors, the first call of the LSTM creates zero-filled initial state tensors.

As to your second question,

outputs = list(Yoh.swapaxes(0,1))

turns Yoh into a list of length 10 (Ty) containing arrays with shape (10000, 11) (m, number of characters). So outputs[0] contains the first array in the list, i.e., the first training example, while outputs[0][0] contains the true label of the first character in the first training example, outputs[0][1] the true label of the second character in the first training example, and so on.