GRU Gates, c<t> vs a<t>

eoin12345abc · May 23, 2023, 12:36am

I am confused about the gates in the GRU model. If Gamma = 0, then c<t> = c<t-1>. Since a<t> = c<t>, then we have a<t> = a<t-1>. This seems strange to me.

How are we capturing long term dependencies when the net-effect is simply that the value of a<t> doesn’t change over time?

paulinpaloalto · May 23, 2023, 3:46am

You’re right that if \Gamma_u is always 0, then things are not very interesting. But the point is that we are training the network and both the \Gamma_u and \Gamma_r values are controlled by learned parameters (weight and bias values), as given in the formulas Prof Ng shows in the lectures. If the training produces something that uninteresting, then there is something wrong with our approach. Either our training data is not expressive enough or we’ve picked the wrong architecture for the GRU network (bad hyperparameter choices) or maybe both.

Topic		Replies	Views
Sequence Models Week 1 Quiz Sequence Models	15	737	December 4, 2024
Course 5, week 1: How is it that -- because the GRU update gate is usually close to 0 -- we do not have a vanishing gradient problem? Sequence Models	5	561	June 26, 2022
Question about GRU Sequence Models	1	419	July 22, 2023
Is Ct doubling? Sequence Models	12	483	August 20, 2023
Week 1 - Quiz Problem Sequence Models week-1	1	296	January 20, 2024

GRU Gates, c<t> vs a<t>

Related topics