Building_a_Recurrent_Neural_Network_Step_by_Step: what is the purpose of da0 before return gradients

Hey there,

I figure out da0 (the one before return gradients) doesn’t make any sense, or does it has a special job?

rnn_backward()
lstm_backward()

Correct me if I am wrong :sweat:

Do you still have an open question here?