LSTM backpropagation confusion

JiaminXu · October 22, 2024, 12:03am

I am confused about the W1A1 “Building your Recurrent Neural Network - Step by Step”, 3.2 - LSTM Backward Pass. In the two backward functions
def lstm_cell_backward(da_next, dc_next, cache):
and def lstm_backward(da, caches),
I did not see the gradient of Wy and by, i.e., (dWy and dby), which I believe should be updated. I checked out other resources and did see the backpropagation equation for dWy and dby (LSTM Back-Propagation Derivation | Kartik Shenoy | Medium). Did anybody also have this confusion? If I missed anything on the notebook, please let me know and I really appreciate it.

paulinpaloalto · October 23, 2024, 9:00pm

They explained in the beginning of the back prop section that they aren’t really covering the full path here. Here’s the relevant quote:

Note that this notebook does not implement the backward path from the Loss 'J' backwards to 'a'. This would have included the dense layer and softmax, which are a part of the forward path. This is assumed to be calculated elsewhere and the result passed to rnn_backward in 'da'. It is further assumed that loss has been adjusted for batch size (m) and division by the number of examples is not required here.

That applies to both the RNN and LSTM sections.

JiaminXu · November 12, 2024, 6:45am

Thank you so much!

Topic		Replies	Views
Backpropagation of LSTM Sequence Models coursera-platform	1	520	October 17, 2022
Course 5 Week 1 Assignment 1 Where is dc_next for lstm_backwards Sequence Models coursera-platform	4	640	October 18, 2021
Need help with DL specialization Course 5 Lab1 rnn_backward function Sequence Models coursera-platform	4	423	August 6, 2023
C5W1 - Assignment 1 - Optional Part - lstm_backward - missing parameter Sequence Models coursera-platform	5	695	December 7, 2023
Question about backpropagation in W1 Programming Assignment 1 Sequence Models coursera-platform	1	604	August 18, 2021

LSTM backpropagation confusion

Related topics