Sequence Models Week 1 Quiz

Hello,

Gated recurrent unit is a modification to the RNN hidden layer which makes it much better at capturing long-range connections and helps a lot with the vanishing gradient problems.

the c^t equation in the image the activation function applied to the parameter W_a times the activations for a previous time sediment the current input and then plus the bias.

GRU unit is going to have a new variable called C, which stands for cell memory. What the memory cell do is it will provide a bit of memory.

So c^t is at time t, the memory cell will have some value c of t. GRU unit will actually output an activation value a of t that’s equal to c of t.

So the equation mentioned in the image governs the computation of GRU unit.

The gamma u here acts as gate for this memory cell sequence. Gamma_u this gate value as being always 0 or 1.

Although in practice, your computer with a sigmoid function applied to this. Remember that the sigmoid function looks like this, as value is always between 0 and 1. For most of the possible ranges of the input, the sigmoid function is either very, very close to 0 or very, very close to 1.

For intuition, think of Gamma as being either 0 or 1 most of the time.

The job of the gate, that is gamma u, is to decide when do you update this value.

So basically in the activation sequence gamma u makes you to decide when to update the memory sequence based on 0 or 1. when gamma u is equal to 0, it is telling the memory sequence to not update and remember the initial value.

if gamma u is equal to zero, so it’s just setting C^t equal to the old value even as you scan the model.

So in this question when alice proposes to simply the GRU by removing ru=0 or gamma u equal to 0 for a tilmestep, the gradient back propagate through that timestep with decay as it has been told not to update and will remember its original value or maintaining the initial value.

Where as Betty proposes to keep the gamma u equal to 1 that the gradient can back propagate through that tilmestep without much decay as the memory cell c^t will update at every time step.

So based on this which answer would you choose for this question,??