About activation functions

I have a (not so) basic question: why do we need activation functions?
And in a multilayer NN, how to choose the “good” activate function for each layer?
In a video Andrew said:

  • sigmoid only for the output of a 2-classification
  • tanh is superior if the data are “normalized” (don’t remember of the right word)
  • ReLU is the must used
  • Leaky ReLU sometimes

