Is Tanh better than sigmoid?

In 3 week I got this note in quiz

No. As seen in lecture the output of the tanh is between -1 and 1, it thus centers the data which makes the learning simpler for the next layer.

Can you explain, please, how could the choice between -1 and 1 be better than 0 and 1 in sigmoid function for hidden units for future computations? Or give me link on what lecture it was discussed, please.

We use normalization techniques, such as MinMaxScaling, to ensure that the data is on a similar range, such as -1 to 1. This helps to speed up the learning process. The activation function tanh naturally produces outputs that fall between -1 and 1, which acts as a form of normalization and also facilitates the learning process for the next layer.

Please refer to this video where this topic is discussed.

1 Like

In a related question, sometimes people ask if we could even use tanh as the activation at the output layer in a binary classification and then use >= 0 as “Yes” and < 0 as “No”. But the question then is what you would use as a loss function, since cross entropy loss is specifically designed to work with sigmoid. Here’s a thread which discusses that in more detail and also ends up showing that there is a very close relationship mathematically between sigmoid and tanh.

3 Likes
  • If you are doing classification, use sigmoid, because the output range matches the “true” and “false” classes.
  • If you want a real output value, use tanh, because its range includes positive and negative values.

thanks for video. Indeed, NG said that it can be better. But it is still unclear for me — why? I understand, that it is more universal — this is ok. But better? About future computations didn’t understood too. I passed 2 course and can’t understand how calculatoins can be easier with tanh. Is it connected with normalization, as I understand?

Hello Mihail,

Super Mentors Paul and Tom (sirs) have already replied to your answers. You can also go through this thread & this one for deeper understanding on how Sigmoid and Tanh are used in different cases.