Why do we not use hyperbolic tan function?

Also note that tanh and sigmoid are very closely related mathematically. The primary reason to choose one over the other is what you need the range of the function to be: for the output of a binary classifier, you need (0,1), but for a hidden layer in a network, you may find the range (-1,1) gives better convergence. Or not. There is no “one size fits all” solution for hidden layer activations.

1 Like