Is Tanh better than sigmoid?

In a related question, sometimes people ask if we could even use tanh as the activation at the output layer in a binary classification and then use >= 0 as “Yes” and < 0 as “No”. But the question then is what you would use as a loss function, since cross entropy loss is specifically designed to work with sigmoid. Here’s a thread which discusses that in more detail and also ends up showing that there is a very close relationship mathematically between sigmoid and tanh.

3 Likes