Typo in C5W1A1: remembered -> forgotten


I think there is a typo in the description of the Forget gate.

  • If a single value in \mathbf{\Gamma}_f^{\langle t \rangle} is 0 or close to 0, then the product is close to 0.
    • This keeps the information stored in the corresponding unit in \mathbf{c}^{\langle t-1 \rangle} from being remembered forgotten for the next time step.
  • Similarly, if one value is close to 1, the product is close to the original value in the previous cell state.
    • The LSTM will keep the information from the corresponding unit of \mathbf{c}^{\langle t-1 \rangle}, to be used in the next time step.

So, I think instead of “remembered” should be forgotten (or a similar word which emphazises that the information shouldn’t be remembered).
Am I right?


But \Gamma_f is the “forget” gate, right? If the value is close to 0, then it encourages the state to be forgotten.

But, if a unit in \mathbf{\Gamma}_f^{\langle t \rangle} is close to 0, than the corresponding unit in the cell state \mathbf{c}^{\langle t-1 \rangle} will be multiplied by that unit and would have smaller contribution to the cell state \mathbf{c}^{\langle t \rangle}:

\mathbf{c}^{\langle t\rangle}=\Gamma_{f}^{\langle t\rangle} * \mathbf{c}^{\langle t-1\rangle}+\Gamma_{i}^{\langle t\rangle} * \tilde{\mathbf{c}}^{\langle t\rangle}

So, the corresponding unit in the cell state \mathbf{c}^{\langle t-1 \rangle} should be forgotten. This is why I don’t understand the use of word remembered in the description.

Or maybe I am completely lost :slight_smile:


Please read the relevant sentence again. Being multiplied by a number close to zero prevents the state from being remembered. Which is equivalent to saying “makes it more likely to be forgotten”. Maybe I could go out on a limb here and conjecture that English is not your native language …

Less likely to be remembered is the same as being more likely to be forgotten, right?

Yes indeed. You are right. I missed the preposition “from”. Now everything is clear to me.