Why we use relu activation function in the hidden layers comapre to other activation functions like sigmoid,linear activation,tanh etc.,
Hi there,
benefits are:
- there is a reduced risk of vanishing gradients since the gradient in the positive section of the ReLU function is constant. It does not saturate in contrast to sigmoid or tanh
- you can describe well non-linearity as stated in this thread: Isn't Relu just a lineer regression function for z>=0 - #6 by Christian_Simonis With a pure linear activation function this would not be possible.
- ReLU is easy to compute with y = max(0,x) and therefore it is often faster compared to other alternatives
Please let me know if this answers your question!
Best regards and happy new year!
Christian
1 Like
In case you are interested in reading more information, also with respect to evaluation of different activation functions, feel free to take a look at this paper: https://arxiv.org/pdf/2109.14545.pdf
Best regards
Christian
These threads may be interesting, too:
Best regards
Christian