W1A1: Hidden State in RNN

I got bitten by the (at least to me) unexpected hidden state in the W1A1 assignment rnn_forward function. I expected the a_prev value for t0 (or t1 in the lecture) to be all zeros. I got it working, but still don’t understand why it is needed, even with the explanation in the exercise text.

I think, in the RNN history, it started with zero initialization. It worked actually. But many researchers found that it caused huge loss in the early phase. This means “real” loss for model learning was hidden in here. Then, researchers started with random initialization, and some researchers made them as trainable variables.

Here is an article that we can refer. Non-Zero Initial States for Recurrent Neural Networks

The above chart is from the article linked. “Zero” means “zero initialization”, “Variable” means trainable variable, and “Noisy” means random initialization.

Hope this helps.