C5W1A1 LSTM gates gradients WRONG?

Dennis_Greenberg · October 28, 2022, 7:54pm

Hello, I ran into an issue during the derivation process of LSTM.
Is the formula given in the assignment wrong?
Any input will be appreciated.

sonnh1902 · October 30, 2022, 9:32am

Can you tell me where the place that you found suspicious is, please?

Dennis_Greenberg · November 3, 2022, 10:27am

yes, I tried to derive the LSTM gradients (even though I hadn’t had much experience with Matrix calculus) and noticed that the formula for dgammau given in section “3.2 - LSTM Backward Pass”, “gates gradients” subsection
is actually comes out as equal to the following sum of derivative chains:
(dJ/da)(da/dc)(dc/dGAMMAu)(dGAMMAu/dgammau) + (dJ/dc)(dc/dGAMMAu)(dGAMMAu/dgammau)
which equals to 2dJ/dgammau
because
dJ/dgammau =
(dJ/da)(da/dc)(dc/dGAMMAu)(dGAMMAu/dgammau) = (dJ/dc)(dc/dGAMMAu)(dGAMMAu/dgammau) according to the chain rule.
i.e. if I’m not mistaken, the given formula is two times the actual derivative, which is still going to work I suppose, I just don’t get why that’s done.

Dennis_Greenberg · November 3, 2022, 10:31am

Here’s a more detailed description of the thing I found suspicious

Topic		Replies	Views
C5 W1 A1 lstm_cell_backward, issue with dc_prev calculation Sequence Models	2	510	May 17, 2023
Week 1: Excersie 7 - lstm_cell_backed Sequence Models week-1	1	14	November 11, 2024
Explanation for derived gradients for LSTM back-prop? Sequence Models	3	678	September 6, 2021
LSTM NN Derivatives Sequence Models	1	483	October 31, 2022
C5W1 Exercise 8 lstm_backward Sequence Models	3	695	January 22, 2024

C5W1A1 LSTM gates gradients WRONG?

Related topics