Literature says (and copied from C5W1A1):
The “forget gate” is a tensor containing values between 0 and 1.
- If a unit in the forget gate has a value close to 0, the LSTM will forget the stored state in the corresponding unit of the previous cell state.
- If a unit in the forget gate has a value close to 1, the LSTM will mostly remember the corresponding value in the stored state.
Intuitively, it kind of seems off to me. Shouldn’t it be called remember gate instead?
Do we follow this convention due to legacy reasons only or am I missing something?