When implementing lstm_backward function, why do we use only the dc_prev and not include the dc of that timestep? I was thinking the lstm_forward function should be having (da, dc, dy, caches) as the arguments, where da is the loss computed with respect to a, dc with respect to c, and dy with respect to y, and they are da, dc, and dy are all three-dimensional array. If these arguments were given, I was thinking the argument to be passed to lstm_cell_backward would have been dc[:, :, t] + dc_prev, rather than just dc_prev. Can anyone please help clarify where I am wrong

Hi Lawal_Samsudeen_O,

c_next at time step Tx is not relevant to calculating the gradient, as the calculation of the gradient starts from the final output which does not depend on c_next at Tx.