Hi, I believe al options in Question 8 are wrong. The indexes of second Gammas are the first two options in the photo should be u. Similarly, those in the options 3 and 4 should be r.
That’s what I’m thinking as well. Though maybe we’re missing something?
Reading through all these again, I think the condition in the second part of the answers “because if […]” is not supposed to be implied by the first part of the answer (Alice’s model or Betty’s model). Rather, you have to read it as an additional case distinction.
E.g. For the last answer, you should read it as: “If you use Betty’s model, AND G_u happens to also be close to 1, then the gradient can propagate back through that timestep without much decay.” And the question then becomes whether “the gradient can propagate back through that timestep without much decay” follows from the first two conditions.
Does that make sense?