C5_W1’s Quiz: Sarah and Ashely

HI, I really need help with this question about GRU.
The question is
“which model is more likely to work without gradient vanishing? Sarah’s model is to set gamma_u = 0, Ashely’s model is to set gamma_r = 1.”

I understand that if gamma_u = 0 , then Ct = Ct-1 , right? This should be what meant by highly dependent to the previous memory cell. But how does this help in gradient back-propagating?

If someone could clarify this for me, It would be appreciated.


Yes the question mentions without vanishing gradient problems and in the options it mentions while back propagating without much decay.

this question’s answer is more based on both what questions mentions about Sarah’s and Ashley’s model which need to be correlated with the options and then removing two of the options which can be done clearly once you read the question.

then as you understand that c^t should be dependent on c^t-1, we want the gamma u to be 0 so the sequence remembers the previous cells and does not update. Gamma u is basically tells the sequence when to update.

where as gamma r is reference based which will get updated as per the question as here the gamma_r is 1, so the sequence will get updated to the current memory cell. So with this understanding, you can know which is the correction option.

I am also sharing a link which explains the same. Read the whole post comments as the learner from that post was also confused.


1 Like