{Moderator’s Edit: Quiz Solution Removed}
Can you kindly elaborate on this explanation?
{Moderator’s Edit: Quiz Solution Removed}
Can you kindly elaborate on this explanation?
Hey @vaibhavoutat,
Apologies for the delayed response. The answer to this lies majorly in 2 equations for the GRU network, as follows:
Now, in order to remove the vanishing gradient problem for long sequences, we need to make sure that the gradients can back-propagate from c^{t} to c^{(t-1)} with as less bottlenecks as possible. Note that the question doesn’t ask us about \Gamma_u, but about \Gamma_r, hence, in equation 2, the only term, we should concern ourselves with is \tilde{c}^{<t>}, since the rest of the terms are not in our control.
Now, we want to make this as much dependent on c^{(t-1)} as possible, and to do this, we can simply set \Gamma_r = 1, hence, the explanation. Let us know if this resolves your error.
P.S. - Posting solutions publicly is strictly against the community guidelines, so I will be removing the image.
Cheers,
Elemento