Activation functions

Hi there,

benefits are:

  • there is a reduced risk of vanishing gradients since the gradient in the positive section of the ReLU function is constant. It does not saturate in contrast to sigmoid or tanh
  • you can describe well non-linearity as stated in this thread: Isn't Relu just a lineer regression function for z>=0 - #6 by Christian_Simonis With a pure linear activation function this would not be possible.
  • ReLU is easy to compute with y = max(0,x) and therefore it is often faster compared to other alternatives

Please let me know if this answers your question!

Best regards and happy new year!


1 Like