Week 1 quiz question 8 alice/betty

abelian_group_chen · June 12, 2022, 2:15am

I got the version of the question with Alice and Betty proposing simplified versions of GRU but the answers are pretty confusing to me. Why are the choices all talking about the other person’s modification? For example for the two choices that pick Alices’s model (setting gamma_u=0), the second half of the choice then talks about the consequences of gamme_r being either 0 or 1.

Elemento · June 12, 2022, 6:15am

Hey @abelian_group_chen,
The options aren’t referring to the other person’s choices. I guess this is a viable confusion, so let’s clear it right away. Let’s say we consider the first option:

Alice’s model (removing \Gamma_u), because if \Gamma_r \approx 0 for a timestep, the gradient can propagate back through that timestep without much decay.

As per the question, \Gamma_u = 0, so you can’t change it any possible manner. And hence, this option tries to ask you what will happen when the other variable, i.e., \Gamma_r will take on different values, for instance 0 for this particular option.

So, all the options have one of the two variables fixed as defined in the question, and tries to test your understanding as to what will happen when we will vary the other variable. Will the pair of values satisfy the conditions mentioned in the question or not? I hope this helps.

Regards,
Elemento

abelian_group_chen · June 12, 2022, 1:37pm

Thanks Elemento, I am still a bit confused to the choices. We know that gamma_u needs to be zero so c_t and c_(t-1) are highly correlated, but in this case, two choices offer this option in slightly different ways. Choosing Alice’s model will have gamma_u = 0 for sure, but Betty’s model might also benefit when gamma_u is approx 0. I chose the option corresponding to the first option (Alice’s model) and was marked wrong.

Elemento · June 12, 2022, 4:42pm

Hey @abelian_group_chen,
Please check your DM.

Regards,
Elemento

iamcalledayush · July 12, 2023, 12:40pm

Hi @Elemento I have a doubt. If we set gamma_u = 0, then it doesn’t matter what gamma_r is, because gamma_u always = 0 and this implies c_t always = c_(t-1). Hence, Alice model (with gamma_u = 0 and gamma_r can be either 0 or 1 doesn’t matter) should be the answer according to me. As, gamma_u = 0 means no effect of c_tilde_t , hence no effect of gamma_r. Let me know where I’m going wrong.

Elemento · July 12, 2023, 1:23pm

Hey @iamcalledayush,
Welcome, and we are glad that you could become a part of our community

Please check your DM.

Cheers,
Elemento

mahtanir · February 21, 2024, 6:54am

Could someone explain this to me as well.

Isn’t the whole point of setting gamma_0 = 0 to allow the gradient to more easily back propogate. Isn’t the 1 - gamma_u * c^{t-1} essentially the part that prevents vanishing gradients?

TMosh · February 23, 2024, 12:11am

That is one benefit, but the main purpose is to allows the gate to fire at some later time sequence, to help give context to different parts of the input.

That’s what Andrew means in the “GRU (simplified)” lecture at 11:40 when he talks about learning the dependencies.

bwegge · June 30, 2024, 5:55pm

I have come to the same conclusion, and of course it just got marked as “incorrect” :-(.
Is there any explanation of what the question really asks for and why this is wrong? After all, the question only asks for when it will “work without vanishing gradient problems”.

Topic		Replies	Views
Course 5, week 1, Quiz, Question 8 issue Sequence Models	2	554	August 26, 2021
C5-W1-quiz GRU question Sequence Models	9	700	May 24, 2022
C5_W1’s Quiz: Sarah and Ashely Sequence Models	1	555	September 10, 2023
Week 1 - Quiz Sequence Models	1	306	December 21, 2023
Course 5, week 1: How is it that -- because the GRU update gate is usually close to 0 -- we do not have a vanishing gradient problem? Sequence Models	5	561	June 26, 2022

Week 1 quiz question 8 alice/betty

Related topics