Understanding GRU

Hi everyone,

I need a bit of help understanding GRU. For most of time stamp the c_t= c_t-1 and every time a_t is set equal to c_t. Does that mean activation never really change across different timestamps except at few ones when the gate function value is 1?

Hi @piyush23 ,

Have you found the answer yet? let’s know if we could help.

Hi Kic,

I haven’t found the answer yet. My query is, the fundamental of RNN is that we pass the previous activations to the next timestamp for it to establish a relationship across time stamps. But here for till the update gate updated the c, the activation won’t change. Are we not losing out on the basis of RNNs here to establish relationships across?

Hi @piyush23 ,

One of the problem with RNN is vanishing gradient. GRU is a simple yet effective alternative to solve the vanish gradient issue. As you might recall, some information is important to show the relationship between words of a sentence, but some are not. So why keeping the information that is not relevant?

You may find this article helpful.