Not understanding the structure of GRU

user399 · November 27, 2021, 9:12am

I can understand the structure of LSTM and its goal but not the GRU, since the ‘a’ cells and ‘c’ cells perform different functions in the network.
But in GRU, ‘a’ cell and ‘c’ cell combined together to one cell.
eg:
Current input is x<10> and output is y<10>, if c<10> still contains the information of c<0>, so the output y<10> is only depend on c<0> and x<10>, the information from x<1> to x<9> are all lost.
I’m wondering whether this will result in poor fitting ability because the information from x<1> to x<9> cannot propagate to the further layers.

I think LSTM can overcome this problem because ‘c’ cells are for caching long term information and ‘a’ cells are for caching short term information. But the GRU’s output have to choose whether using long term or short them information.

Besides, Im also wondering how LSTM and GRU overcome the vanishing gradient problem. Professor Ng only said these two structures can help the long term information to propagate to the further layers but why it can solve the vanishing gradient?

TMosh · April 21, 2022, 3:43am

Were you able to find answers to your questions?

ajinkyaathlye · May 1, 2022, 3:29am

Would love to know the answer to this as well. I have the exact same questions!

Rashmi · October 14, 2022, 12:01pm

Hello Yifu and Ajinkya,

Here’s a quick link from this post: Understanding GRU - #3 by piyush23, which can give you a broad idea on how GRU solves the vanishing gradient issue while using an RNN architecture.

DLS mentor Kic has posted a link in one of his replies:

The post aims at solving the vanishing gradient problem which comes with a standard recurrent neural network.

To solve the vanishing gradient problem of a standard RNN, GRU uses, so-called, update gate and reset gate. Basically, these are two vectors which decide what information should be passed to the output. The special thing about them is that they can be trained to keep information from long ago, without washing it through time or remove information which is irrelevant to the prediction.

Topic		Replies	Views
Understanding GRU Sequence Models coursera-platform	3	644	July 28, 2021
GRU and vanishing gradients Sequence Models coursera-platform	6	638	November 7, 2022
Grokking LSTM and GRUs some questions (Week 1 and 2) NLP with Sequence Models week-module-1 , week-module-2	2	50	September 24, 2024
Week 1 - GRU, Why is hidden state and cell memory always same Sequence Models week-module-1 , coursera-platform	7	390	January 20, 2024
Week1 Quiz doubt Sequence Models coursera-platform	1	497	April 16, 2023

Not understanding the structure of GRU

Related topics