Backpropagation Through Time and Vanishing Gradient (RNN)

paulinpaloalto · September 4, 2022, 7:33pm

This is just my interpretation, which is probably worth exactly what you paid for it, but I’d say that the point is not that it needs to be “propagated” from t = 10 back to t = 2, it’s that gradients get generated by the errors at every time step, right? And then we apply them (as you say) to the one shared set of weights. Of course as we discussed very recently on this other thread, the manner in which we are actually applying the gradients is arguably a bit sloppy. But it seems to work. “Close enough for jazz” apparently …

The point about state being coordinated between two disparate timesteps is what LSTM is specifically designed to facilitate. Of course the weights for the various LSTM “gates” are included in what we are updating.

Topic		Replies	Views
Vanishing Gradient RNN Sequence Models	7	536	April 6, 2022
W1, A1, Ex. 6, Vanishing Gradients Sequence Models	1	408	July 13, 2023
Derivation of Backpropagation in RNNs Sequence Models week-1	4	106	May 26, 2024
Derivation of backpropagation of RNN Sequence Models	2	802	June 5, 2022
RNN parameters understanding and vanishing gradient Sequence Models	2	504	September 16, 2022

Backpropagation Through Time and Vanishing Gradient (RNN)

Related topics