Significance of sigmoid in an update gate of LSTM cell

gokturk.gezer · July 3, 2023, 9:12pm

Thanks!

I also found this was asked here previously: LSTM architecture - #2 by anon57530071

It seems the combination of tanh and sigmoid is questioned by others, although from a different angle. I was curious why we need sigmoid, whereas the others question why tanh is needed. Unfortunately, I’m still confused after reading explanations on these threads.

To clarify where I stand, I understand the fundamental function of the sigmoid gate and realize that both activations have different weights. I’m also aware that the forget gate has a similar computation, so my question stands for both.

My trouble stems from the fact that multiplying tanh output with a sigmoid output doesn’t change the output range of the initial tanh function. So why couldn’t a tanh function alone learn to output the same? I understand that would fundamentally change the LSTM so that’s why I’m interested in understanding the mathematical significance of sigmoid, and not its designed purpose.

Would it take much more iterations to train a single tanh to learn to output the same state?
Would a single tanh, without being multiplied by a sigmoid output, not exhibit the abstract generalization power of LSTMs and end up just overfitting the training set?

Topic		Replies	Views
Why tanh and sigmoid in forward prop in RNN? Sequence Models	3	507	May 23, 2023
Long Short Term Memory (LSTM) - Coursera Sequence Models	3	623	December 23, 2022
Is Ct doubling? Sequence Models	12	483	August 20, 2023
Is Tanh better than sigmoid? Neural Networks and Deep Learning	5	670	May 11, 2023
Why not use tanh-func for output a^L? Neural Networks and Deep Learning	1	512	August 5, 2021

Significance of sigmoid in an update gate of LSTM cell

Related topics