In Building_a_Recurrent_Neural_Network_Step_by_Step, the given equation comupute derivative of a lot of variables. But I don’t understand why we need to compute “dxt”. We don’t need to change the value of xt, right?
I think it is given for completeness. For a fully connected layer, we usually compute dA0=dX as well, even though we don’t always need that information.