Question about GRU

George_Tang · July 22, 2023, 2:25pm

I am a bit confused on how GRU works. Using the example in the lecture, "The cat, which already ate …, was full.‘’ since cat and was are closely related. That means my gate value at cat and was will be 1 and 0 for the rest in between. Does that mean my c value stay the same for the word in between? If c value is the same, then my a value for all these words between c and a will be the same, which clearly doesn’t work like that. What is the problem with my understanding?

paulinpaloalto · July 22, 2023, 2:41pm

It’s not that all of c^{<t>} stays the same: remember that all the state and gate values have multiple of bits. Prof Ng uses the example of 100 x 1, but that size is a hyperparameter. So different bits learn to track different things that have happened and are happening or need to happen in the future. We don’t actually know or specify what the functions are or which bits will learn them, but conceptually it can be states like “we have seen the subject of the sentence and it was plural”. The training and back prop figure out what works based on your training data set. The reason that GRU and LSTM are more powerful than the “plain vanilla” RNN is that the gates give a more explicit mechanism for creating complex state that spans the entire length of the input. That makes it easier for the training to learn the patterns that are needed for language or music or whatever the particular application is.

Topic		Replies	Views
GRU Gates, c<t> vs a<t> Sequence Models coursera-platform	1	474	May 23, 2023
Week 1 - GRU, Why is hidden state and cell memory always same Sequence Models week-1 , coursera-platform	7	390	January 20, 2024
Understanding GRU Sequence Models coursera-platform	3	644	July 28, 2021
Sequence Models Week 1 Quiz Sequence Models coursera-platform	15	742	December 4, 2024
Understanding the Mechanisms of Sequence Prediction Sequence Models coursera-platform	1	510	June 17, 2023

Question about GRU

Related topics