ReLU function as activation function

Mohamed_Hussien1 · July 11, 2023, 12:51am

How we use ReLU as an activation function instead of sigmoid and in course 1 we made up sigmoid in classification problems instead of linear functions because linear function does not fit well for binary outputs ?

Phuc_Kien_Bui · July 11, 2023, 3:05am

A few reasons I think:

ReLU formula is simpler. So it is easier to train.
Experiments shows that ReLU outperforms sigmoid.

TMosh · July 11, 2023, 5:32am

ReLU is only used as a hidden layer. You still need sigmoid to get the classes at the output.

There are tradeoffs:
It’s computationally easy to compute the gradients for ReLU. But since you get no gradients for negative z values, you need more ReLU units than if you used sigmoid().

rmwkwok · July 11, 2023, 9:21pm

Hello @Mohamed_Hussien1,

I just hope to make sure that after the mentors’ replies, you have got the idea that we don’t always consider an activation function in how we use it in the output layer. In constrast, ReLU is more commonly used in the hidden layers and we like its characteristics non-linearity and a simple one it is.

Cheers,
Raymond

Topic		Replies	Views
ReLu activation function Vs sigmoid function Neural Networks and Deep Learning coursera-platform	2	564	June 15, 2022
ReLU vs Sigmoid function Neural Networks and Deep Learning week-module-1 , coursera-platform	2	49	December 24, 2024
Why ReLU and softmax? NLP with Probabilistic Models week-module-4	1	614	November 2, 2021
Activation functions in the hidden layers Advanced Learning Algorithms week-module-2	4	510	July 21, 2022
Activation function in NN NLP with Classification and Vector Spaces week-module-3	3	348	March 30, 2022

ReLU function as activation function

Related topics