About activation functions

I have a (not so) basic question: why do we need activation functions?
And in a multilayer NN, how to choose the “good” activate function for each layer?
In a video Andrew said:

  • sigmoid only for the output of a 2-classification
  • tanh is superior if the data are “normalized” (don’t remember of the right word)
  • ReLU is the must used
  • Leaky ReLU sometimes

Pierre (from France)

Hi, Pierre.

There’s a brilliant thread related to your query where the same topic has been discussed.

Besides, you can check this as well.

If you still face any doubt, you are welcome!

Rashmi has covered your question, but it might also be worth having a look at this thread which talks a bit more about how you approach choosing the activation functions for the hidden layers.