Forget gate vs. Update gate in LSTM

Abigail_Gurin · December 19, 2021, 9:48am

Hi,
Could you please elaborate on the differences between the two?
The formulas for these two are quite similar in structure, but differ on using different parameters (W, b).
The “forget” is responsible on which units from previous layer to pass on to the next, and
The “update” is responsible on which units to pass on too.
I don’t understand why we need two gates doing the same thing?

paulinpaloalto · December 19, 2021, 5:44pm

You might want to consider watching the LSTM lectures again. Prof Ng covered all this in some detail. The instructions in the notebook also explain it. The two gates have their own weights because they are doing different things:

The purpose of the “forget” gate is to detect when some previously saved state (from an earlier “update” gate) is no longer relevant. The example they give is that a subject that was singular in the past has changed from singular to plural.

The purpose of the “update” gate is to figure out which new things that are happening at the current step are relevant to be saved because they may be needed later (and eventually forgotten by a later “forget” gate).

Abigail_Gurin · December 20, 2021, 7:32am

Thank you for the clarification.

Topic		Replies	Views
W1 Quiz: Inconsistent grading logic applied to Question 9: Update Gate and Forget Gate Sequence Models quiz-help , week-module-1 , coursera-platform	3	34	August 11, 2024
Quizz C5-W1 Update and forget gate/ Γu and 1-Γu Sequence Models coursera-platform	2	679	December 13, 2022
Week 1 - Quiz Problem Sequence Models week-module-1 , coursera-platform	1	299	January 20, 2024
Quiz C5 W1 Q9 Update and forget gate: Γu and 1-Γu Sequence Models coursera-platform	2	590	May 2, 2023
LSTM - some fundamental question about the weights of Forget and Update Gates Sequence Models coursera-platform	8	570	December 24, 2022

Forget gate vs. Update gate in LSTM

Related topics