Hi,

I think there is a typo in the description of the Forget gate.

- If a single value in \mathbf{\Gamma}_f^{\langle t \rangle} is 0 or close to 0, then the product is close to 0.
- This keeps the information stored in the corresponding unit in \mathbf{c}^{\langle t-1 \rangle} from being
~~remembered~~ **forgotten** for the next time step.

- Similarly, if one value is close to 1, the product is close to the original value in the previous cell state.
- The LSTM will keep the information from the corresponding unit of \mathbf{c}^{\langle t-1 \rangle}, to be used in the next time step.

So, I think instead of “remembered” should be **forgotten** (or a similar word which emphazises that the information shouldn’t be remembered).

Am I right?

Henrikh

But \Gamma_f is the “forget” gate, right? If the value is close to 0, then it encourages the state to be forgotten.

But, if a **unit** in \mathbf{\Gamma}_f^{\langle t \rangle} is close to 0, than the **corresponding unit** in the cell state \mathbf{c}^{\langle t-1 \rangle} will be multiplied by that **unit** and would have smaller contribution to the cell state \mathbf{c}^{\langle t \rangle}:

\mathbf{c}^{\langle t\rangle}=\Gamma_{f}^{\langle t\rangle} * \mathbf{c}^{\langle t-1\rangle}+\Gamma_{i}^{\langle t\rangle} * \tilde{\mathbf{c}}^{\langle t\rangle}

So, the **corresponding unit** in the cell state \mathbf{c}^{\langle t-1 \rangle} should be forgotten. This is why I don’t understand the use of word **remembered** in the description.

Or maybe I am completely lost

Henrikh

Please read the relevant sentence again. Being multiplied by a number close to zero prevents the state *from being remembered*. Which is equivalent to saying “*makes it more likely to be forgotten*”. Maybe I could go out on a limb here and conjecture that English is not your native language …

*Less likely* to be remembered is the same as being *more likely* to be forgotten, right?

Yes indeed. You are right. I missed the preposition “from”. Now everything is clear to me.

Thanks