Sequence Models - Quiz - W1

Hi,

This question refers to Quiz Question 8 in Week 1 about simplifying the GRU by removing either the update gate (Γᵤ) or the reset gate (Γᵣ), and asking which model is less likely to suffer from vanishing gradients on long sequences.

From the GRU update equation

it seems that long-term gradient flow is controlled by the update gate Γᵤ.

When Γᵤ ≈ 0, we have

c⟨t⟩≈c⟨t−1⟩

so the gradient can propagate backward through that timestep with little decay. This suggests that Sarah’s model (removing Γᵤ) is more robust to vanishing gradients.

However, among the answer choices, the option stating “Sarah’s model because Γᵣ ≈ 0” is incorrect, and the option that explicitly mentions Γᵤ ≈ 0 preserving gradient flow does not appear.

Could you please clarify whether:

  • the intended correct reasoning is indeed based on Γᵤ ≈ 0, and

  • the mismatch is due to wording in the answer choices?

Thanks!

Hello @btabari,

I agree with your reason about your choice of which gate to drop, and I also find the option wordings confusing. I am going to file a ticket for the course team to review this question.

Cheers,
Raymond

1 Like