Can someone explain this part of GRU in a simpler version
If the value of gamma is so small how is it helping to overcome the vanishing gradients problem.
And how does c^ decides what subject to pick, like for example the cat, so it can correct it in the later part of the sentence.
It might help to have a bit more context for your question here. This is obviously DLS C5 W1. Are you asking about something that Prof Ng says in the lectures? If so, please give us the name of the lecture and the time offset.