Can ResNets work (efficiently) with other activation functions rather than ReLU?

Galib_Alili · April 3, 2023, 10:40pm

TL;DR: we should not use sigmoid (or tanh) with ResNets since only Relu complies g(a[l]) = a[l] ?
Hello!
In lecture “Why ResNets Work?” (Course 4 Week 2) Professor Ng explains that g(a[l]) = a[l] which makes sense, because output of a[l] is ReLU(a[l-1]), which is non negative values. So “double applying” ReLU function would be the same for g(a[l]), but does it mean that we can not apply any other activation function with ResNets (or any activation function g() that doesn`t comply with g(a[l]) = a[l]?
P.S. Putting aside the fact that sigmoid is usually not advised to use.

Kic · April 4, 2023, 9:50am

Hi @Galib_Alili ,

The most commonly used activation functions are:

Sigmoid - output range (0 to 1)
Tanh - output range (-1 to 1)
ReLu - output (0, Max)

As you can see, if we only interested in the positive values of the input, then ReLu fits the bill.

Galib_Alili · April 4, 2023, 10:33am

Hi @Kic !
thanks for the reply!

What I meant is for ResNets to work we need a function that should output the “same” value, so even if we apply that function recursively(g(g(x)) we get the same output.

Tanh: tanh(x) function Calculator - High accuracy calculation
Sigmoid: Sigmoid function Calculator - High accuracy calculation
ReLU: ReLU Calculator - High accuracy calculation

İ attach the links for simplicty to check to see what happens when you “double apply” tanh and sigmoid(and relu). It only stays the same with ReLU, but not with sigmoid and tanh

Kic · April 4, 2023, 11:32am

Hi @Galib_Alili ,

As you can see the range of output from different action functions in my first reply. If you “double apply” an activation function to the data(whether that data is input or output of an activation), it will still be limited to the range of values produced by that activation function.
With sigmoid, no mater what the values are given, it will produce values between 0 and 1, nothing else.
For Tanh, no matter what values are given, it will produce values between -1 and 1, and nothing else.
Whist for ReLu, for negative input, it will produce 0 output; for positive input, it will give back the same positive input.

So you can view activation function as a filter, filtering values that are useful to your model.

Below are the different types of activation function:
activation functions
relu graph

Topic		Replies	Views
Course 4, week 2 : why resnets work (relu activation function and output) Convolutional Neural Networks	1	501	February 24, 2022
Week 3, Programming assignment: how were your performances of sigmoid or ReLu? Neural Networks and Deep Learning	1	628	December 6, 2021
How can we use ReLU to approximate sigmoid? Neural Networks and Deep Learning	3	643	January 16, 2022
How to apply relu function in Exercise of week 3(optional).) Neural Networks and Deep Learning	5	538	July 12, 2023
Better Activation Functions (part 2) MLS Resources	1	164	May 20, 2023

Can ResNets work (efficiently) with other activation functions rather than ReLU?

Related topics