ReLU vs Sigmoid function

khld.nsser · December 24, 2024, 6:48pm

Hello everyone,

I’m still in week 1 of the Deep Learning Specialization’s first course. I have a short question on why we now opt to choose ReLU activation functions over Sigmoid functions (I still don’t know if I will get the answer later on in the course). From what was mentioned in the lecture, I understood that Sigmoid functions will result in vanishing gradients (small gradients) which would make the learning process slower and more costly, unlike ReLU functions. My question is how? Another question is: Aren’t these 2 functions semantically different? Sigmoid is usually used to predict probabilities (in logistic regression) unlike ReLU. Why these 2 anyway?

That’s all! Thank you in advance.

TMosh · December 24, 2024, 8:26pm

Advantage of ReLU:
ReLU is extremely easy to compute. If the value is positive, the slope is 1. So they’re quite efficient. Sigmoid() requires that you compute some exponential function, so it takes more math cycles.

Disadvantage of ReLU:
Unfortunately, ReLU gives a zero output for all negative inputs. So you need a lot more ReLU units to do the same job that a single sigmoid() unit can (because some of them will learn a negative weight value, so they provide an output for negative inputs).

paulinpaloalto · December 24, 2024, 9:52pm

In addition to Tom’s points, note that relu and sigmoid are not the only choices for activation functions in the hidden layers of a neural network. In the assignment in C1 W3, you’ll also see tanh used. At the output layer, your choices are constrained by what the output of the network means: if you are implementing a classifier, then the output layer will either use sigmoid (if it’s a binary classification) or softmax (if it’s a multi-class classifier).

Please “hold that thought” and Prof Ng will discuss these issues more as you complete DLS C1 and in the later DLS courses (C2, C4 and C5).

Topic		Replies	Views
ReLu activation function Vs sigmoid function Neural Networks and Deep Learning coursera-platform	2	565	June 15, 2022
ReLU function as activation function Advanced Learning Algorithms week-module-2	3	422	July 11, 2023
Activation functions Convolutional Neural Networks coursera-platform	3	738	January 4, 2023
Why is relu better than sigmoid activation function AI Discussions	2	97	June 8, 2023
Confusion on sigmoid disadvantages in "Choosing Activation Functions" video in C2W2 Advanced Learning Algorithms week-module-2	5	362	December 3, 2023

ReLU vs Sigmoid function

Related topics