How can we use ReLU to approximate sigmoid?

MonsterCookieJar · January 15, 2022, 12:51am

I remember the reason we have a sigmoid function is that the outcome y hat is a probability between 0 and 1. If we use ReLU to approximate the sigmoid function and ReLU is defined as y_hat=max(0,z), and z is not capped by 1. What happens, when z >1 and y_hat >1?

sjfischer · January 15, 2022, 1:17am

Hi @MonsterCookieJar , thanks for your question and welcome to the community, I hope the course is interesting and useful for you.

With respect to your question the ReLU is not approximating the Sigmoid. Sigmoid indeed has a value between 0 and 1, which is useful for binary classification. Tanh is another activation function that is similar to sigmoid, with the exception that it runs from -1 to 1. The problem with both sigmoid and tanh is that the slide/gradient of the curve for large and small values goes rapidly to 0 (the curve of sigmoid is flattening to 1 or 0), which in practice slows down the learning rate. Because with the Relu the slope is fixed to 1 and not going to 0 for larger values of z, in practice it allows the model to learn much faster, which is a reason why Relu is used most often in the hidden layers of a neural network.

paulinpaloalto · January 15, 2022, 3:17am

Exactly. Please note that we do not use ReLU to approximate sigmoid. They are completely different. We only use ReLU as the activation function in the “hidden” layers of a network, where we do not require the behavior of sigmoid that we use in the output layer: that the result of sigmoid looks like the probability that the prediction is “yes”. That is what makes sigmoid the activation function of choice for the output layer anytime the network’s goal is a binary (“yes/no”) classification.

MonsterCookieJar · January 16, 2022, 5:31am

Ah! Very helpful. Thank you so much.

Topic		Replies	Views
ReLu activation function Vs sigmoid function Neural Networks and Deep Learning coursera-platform	2	565	June 15, 2022
ReLU function as activation function Advanced Learning Algorithms week-module-2	3	422	July 11, 2023
Alternatives to sigmoid function Advanced Learning Algorithms week-module-2	9	1024	June 25, 2022
ReLU and sigmoid alternatives in Week 3 assignment Neural Networks and Deep Learning coursera-platform	11	893	July 20, 2022
Activation functions in the hidden layers Advanced Learning Algorithms week-module-2	4	522	July 21, 2022

How can we use ReLU to approximate sigmoid?

Related topics