-
For the pre-attention Bi-LSTM step, dont we need to initialize the hidden states a<0> to zeros? But we do have to initialize the hidden state s<0> and cell state c<0> for the post-attention LSTM. Why is it so?
-
Exercise 3, it says “outputs[i][j] is the true label of the jth character in the ith training example.” However, after the code: “outputs = list(Yoh.swapaxes(0,1))”, should it be: ith character in the jth training example? since the shape of outputs has turned to be (10, 10000, 11)
Thank you!