Hi,
I have a (not so) basic question: why do we need activation functions?
And in a multilayer NN, how to choose the “good” activate function for each layer?
In a video Andrew said:
sigmoid only for the output of a 2-classification
tanh is superior if the data are “normalized” (don’t remember of the right word)
Rashmi has covered your question, but it might also be worth having a look at this thread which talks a bit more about how you approach choosing the activation functions for the hidden layers.