C5-W1-quiz GRU question

beckyjeff · March 9, 2022, 10:24am

My question 8 was:

True/False: In order to simplify the GRU without vanishing gradient problems even when training on very long sequences you should always remove the Γu. I.e., setting Γu =0.

Is this ever actually done in practice? It seems a bit extreme as it would force the dependence to be only on the first element and current element of the sequence at each timestep. What if the key information to keep was the second word? We would lose this by hard forcing Γu =0 from the beginning.

beckyjeff · March 9, 2022, 10:28am

Another comment: in the lectures, the softmax to get yhat is applied after the update gate, which if Γu =0 was forced, would mean that each output at all timesteps would only ever depend on the first input… which really does seem odd. Should the softmax to obtain the output at each timestep be applied to Ctilde ?

TMosh · March 9, 2022, 6:15pm

I’ll pass this along to the course staff.

Mubsi · March 9, 2022, 6:32pm

Hi @beckyjeff,

In the lectures, the “simplified” version is shown to help understand the concept, and I believe, it is mentioned that it is not really used, and that there are many variations to GRU, but almost to the end, Andrew talks about the one which has gamma_r in it; the one which has been widely used after a lot of research (and then again, Andrew also mentions that you can come up with a one on your own).

With that being said, the quiz question is about how to make the GRU simplified, which, as Andrew mentions in the version he has shown, is by making gamma_u = 0.

Your critique here is right, is this used in practice ? Maybe not. But the quiz question is not about whether it can be used or not, but on how to make the GRU more simple.

Hope I have answered your query.

Best,
Mubsi

P.S, since you gave away the answer to the question in your post, I have removed it.

annena · May 16, 2022, 3:45am

Hi @Mubsi,

The correct answer I get is by removing gamma_r (setting gamma_r=1). I am very confused by this question, are you able to help explain why other answers are not correct? Thanks.

Mubsi · May 16, 2022, 7:42am

Hi @annena,

There are 3 variants of each question. Can you DM me about the variant you got to see in your quiz ?

Best,
Mubsi

annena · May 19, 2022, 2:19pm

Hi Mubsi, sorry for the late reply but can you recommend the best way to screenshot the quiz question?

Mubsi · May 24, 2022, 10:53am

Hi @annena,

If you are using a Mac then it is cmd + shift + 3, if you are using windows then it is windows + print screen. I don’t know how it happens in Linux.

Best.
Mubsi

annena · May 24, 2022, 12:05pm

Ohh hmm I always get blank output when trying to screenshot this course’s quiz question, I thought that was intended by Coursera.

Mubsi · May 24, 2022, 12:46pm

I don’t believe that is the case, @annena. I have seen people take screenshot before.

Topic		Replies	Views
Course 5, week 1: How is it that -- because the GRU update gate is usually close to 0 -- we do not have a vanishing gradient problem? Sequence Models	5	558	June 26, 2022
C5_W1’s Quiz: Sarah and Ashely Sequence Models	1	502	September 10, 2023
Week1 Quiz doubt Sequence Models	1	495	April 16, 2023
GRU Gates, c<t> vs a<t> Sequence Models	1	474	May 23, 2023
Sequence Models Week 1 Quiz Sequence Models	15	580	December 4, 2024

C5-W1-quiz GRU question

Related topics