Yes at last you got it, more than distortion.
the main difference between gamma u and gamma r is it significance in activation of the sequence at each timestep.
gamma u basically will tell the model when to update the parameter and if you choose 0, you are telling model not to update while propagating without much decay where as gamma r is the relevance of choosing 1 always (based on the sentence in question) to update the sequence if current value is in sequence with previous sequence memory and that is why betty’s model is not accepted. Because the model wants to work without vanishing gradient problems.
Regards
DP