Hi,
I think there is a typo in the description of the Forget gate.
- If a single value in \mathbf{\Gamma}_f^{\langle t \rangle} is 0 or close to 0, then the product is close to 0.
- This keeps the information stored in the corresponding unit in \mathbf{c}^{\langle t-1 \rangle} from being
remembered forgotten for the next time step.
- Similarly, if one value is close to 1, the product is close to the original value in the previous cell state.
- The LSTM will keep the information from the corresponding unit of \mathbf{c}^{\langle t-1 \rangle}, to be used in the next time step.
So, I think instead of “remembered” should be forgotten (or a similar word which emphazises that the information shouldn’t be remembered).
Am I right?
Henrikh
But \Gamma_f is the “forget” gate, right? If the value is close to 0, then it encourages the state to be forgotten.
But, if a unit in \mathbf{\Gamma}_f^{\langle t \rangle} is close to 0, than the corresponding unit in the cell state \mathbf{c}^{\langle t-1 \rangle} will be multiplied by that unit and would have smaller contribution to the cell state \mathbf{c}^{\langle t \rangle}:
\mathbf{c}^{\langle t\rangle}=\Gamma_{f}^{\langle t\rangle} * \mathbf{c}^{\langle t-1\rangle}+\Gamma_{i}^{\langle t\rangle} * \tilde{\mathbf{c}}^{\langle t\rangle}
So, the corresponding unit in the cell state \mathbf{c}^{\langle t-1 \rangle} should be forgotten. This is why I don’t understand the use of word remembered in the description.
Or maybe I am completely lost 
Henrikh
Please read the relevant sentence again. Being multiplied by a number close to zero prevents the state from being remembered. Which is equivalent to saying “makes it more likely to be forgotten”. Maybe I could go out on a limb here and conjecture that English is not your native language …
Less likely to be remembered is the same as being more likely to be forgotten, right?
Yes indeed. You are right. I missed the preposition “from”. Now everything is clear to me.
Thanks