{*Moderator’s Edit: Quiz Solution Removed*}

Can you kindly elaborate on this explanation?

{*Moderator’s Edit: Quiz Solution Removed*}

Can you kindly elaborate on this explanation?

Hey @vaibhavoutat,

Apologies for the delayed response. The answer to this lies majorly in 2 equations for the GRU network, as follows:

\tilde{c}^{<t>} = tanh (W_c[\Gamma_r * c^{<t-1>}, x^{<t>}] + b_c) \\
c^{<t>} = \Gamma_u * \tilde{c}^{<t>} + (1 - \Gamma_u) * c^{<t-1>}

Now, in order to remove the vanishing gradient problem for long sequences, we need to make sure that the gradients can back-propagate from c^{t} to c^{(t-1)} with as less bottlenecks as possible. Note that the question doesn’t ask us about \Gamma_u, but about \Gamma_r, hence, in equation 2, the only term, we should concern ourselves with is \tilde{c}^{<t>}, since the rest of the terms are not in our control.

Now, we want to make this as much dependent on c^{(t-1)} as possible, and to do this, we can simply set \Gamma_r = 1, hence, the explanation. Let us know if this resolves your error.

**P.S. - Posting solutions publicly is strictly against the community guidelines, so I will be removing the image.**

Cheers,

Elemento