Backpropagation of LSTM

Lawal_Samsudeen_O · September 15, 2022, 3:48am

When implementing lstm_backward function, why do we use only the dc_prev and not include the dc of that timestep? I was thinking the lstm_forward function should be having (da, dc, dy, caches) as the arguments, where da is the loss computed with respect to a, dc with respect to c, and dy with respect to y, and they are da, dc, and dy are all three-dimensional array. If these arguments were given, I was thinking the argument to be passed to lstm_cell_backward would have been dc[:, :, t] + dc_prev, rather than just dc_prev. Can anyone please help clarify where I am wrong

reinoudbosch · October 17, 2022, 10:03pm

Hi Lawal_Samsudeen_O,

c_next at time step Tx is not relevant to calculating the gradient, as the calculation of the gradient starts from the final output which does not depend on c_next at Tx.

Topic		Replies	Views
C5W1 - Assignment 1 - Optional Part - lstm_backward - missing parameter Sequence Models coursera-platform	5	695	December 7, 2023
Backpropagation in LSTM Sequence Models week-module-1 , coursera-platform	11	74	January 1, 2025
Course 5 Week 1 Assignment 1 Where is dc_next for lstm_backwards Sequence Models coursera-platform	4	644	October 18, 2021
C5W1 A1 (Ex8) lstm_backward, dc_next missing? Sequence Models coursera-platform	13	741	May 4, 2023
[C5W1A1] wrong results of lstm_backward Sequence Models coursera-platform	14	1354	August 2, 2024

Backpropagation of LSTM

Related topics