Alternatives to sigmoid function

rmwkwok · June 22, 2022, 9:00am

Indeed it is a great question!

Since I believe you agree the main issue behind is about the output range, and not specific to awareness or any other names, I think we can rephrase the question into: why ReLU over sigmoid (or the other way around) for hidden layers.

This topic isn’t discussed in depth given the scope of the currently released course 1 & 2. I am not sure about course 3 because it is not released yet, but I did a quick scan on the deep learning specialization (DLS) and found that its course 1 week 3 Video “Activation Function” has compared sigmoid and ReLU.

I highly recommend you to watch the video yourself, but in short, comparing to ReLU, (1) Sigmoid is computationally much slower because it involves computing the exponential function, and (2) Sigmoid gives a very small gradient value (close to 0) at the far positive region which makes some parameters only have a very small update step size which results in a slow training process.

A topic that is relevant to the second point above is the “vanishing gradient problem” which is also introduced in the DLS Course 2 Week 1 Video “Vanishing / Exploding gradient”, again please watch it if you want to know more about it, but both the small gradient values and the fact that sigmoid always accept a larger range of input and produces a smaller range of output contribute to the problem, and this problem becomes significant as you grow your neural network deeper.

If DLS mentor happens to see this post, I hope they would share more insights or past discussions with us.

Topic		Replies	Views
In the demand prediction problem, what is the use of ReLU in finding the activation of 'awareness' Advanced Learning Algorithms week-2	2	22	May 15, 2025
Sigmoid activation function issues AI Discussions ai-discussions , ai-question	24	1019	May 7, 2024
Sigmoid Activation function Advanced Learning Algorithms week-1	7	577	January 7, 2023
Linear Regression vs Activation Function Advanced Learning Algorithms week-2	3	29	July 31, 2024
First binary classification model Neural Networks and Deep Learning coursera-platform	5	565	July 12, 2022

Alternatives to sigmoid function

Related topics