Using tanh vs. sigmoid for output layer

Yes, I guess you could think about doing it that way, but that’s not the only thing you have to deal with, right? How do you define your loss function in that case? With the sigmoid outputs looking like probabilities, that gives “cross entropy” as the natural loss function.

But other way to ask the question is why do you think your method would be better? Also note that it turns out that tanh and sigmoid are very closely related.

1 Like