Why tanh and sigmoid in forward prop in RNN?

someone555777 · May 23, 2023, 6:01pm

I have questions about this slide

Hontestly, I don’t understand how it works. So, we have 1-hot input of X. As I remember we need tanh and sigmoid to classify elements to 2 groups. So one of this classifications is influence on the next classification. Can you explain what happens here?

paulinpaloalto · May 23, 2023, 7:45pm

In binary classification problems, we use sigmoid as the output activation but not tanh. But both of these are perfectly valid activation functions for the internal layers of a network. Of course an RNN is a little different than a multi-layer FC net or CNN. With an RNN, there is just one “cell” and it gets used repeatedly at each “timestep” and it has two outputs: the new “hidden state” a^{<t>} and the actual output of that timestep which is \hat{y}^{<t>}. Of course the other thing about RNNs is that they come in lots of types and it depends on what the output is at each timestep. In the example Prof Ng is showing here, it must be a “yes/no” answer of some sort, but it’s also very common for it to be a softmax output (e.g. in a translation problem). Notice that tanh is used as the activation on the “hidden state”, so those values can be both positive and negative.

paulinpaloalto · May 23, 2023, 7:57pm

Also note that activation functions are always applied “elementwise”, so it really depends on what the \hat{y} values represent at each timestep. You don’t give a reference to which lecture the slide is from. Knowing that might shed a bit more light here. But maybe the best next step is to rewind and watch the lecture again with what I said above in mind. I’ll bet it will make more sense the second time through now that you have a bit more context.

TMosh · May 23, 2023, 10:27pm

No, not necessarily.

Topic		Replies	Views
Why tanh is used in RNN to compute hidden state? Sequence Models coursera-platform	1	799	July 4, 2021
Is Tanh better than sigmoid? Neural Networks and Deep Learning coursera-platform	5	680	May 11, 2023
Question about c1w3 quiz Neural Networks and Deep Learning coursera-platform	2	704	October 30, 2021
Significance of sigmoid in an update gate of LSTM cell NLP with Sequence Models week-module-3	6	682	July 5, 2023
RNN FeedForward - Need for Weight and Bias matrix before softmax on activation for y hat Sequence Models coursera-platform	2	403	July 28, 2023

Why tanh and sigmoid in forward prop in RNN?

Related topics