Activation functions

Why we use relu activation function in the hidden layers comapre to other activation functions like sigmoid,linear activation,tanh etc.,

Hi there,

benefits are:

  • there is a reduced risk of vanishing gradients since the gradient in the positive section of the ReLU function is constant. It does not saturate in contrast to sigmoid or tanh
  • you can describe well non-linearity as stated in this thread: Isn't Relu just a lineer regression function for z>=0 - #6 by Christian_Simonis With a pure linear activation function this would not be possible.
  • ReLU is easy to compute with y = max(0,x) and therefore it is often faster compared to other alternatives

Please let me know if this answers your question!

Best regards and happy new year!

Christian

1 Like

In case you are interested in reading more information, also with respect to evaluation of different activation functions, feel free to take a look at this paper: https://arxiv.org/pdf/2109.14545.pdf

Best regards
Christian

These threads may be interesting, too:

Best regards
Christian