TL;DR: we should not use *sigmoid* (or tanh) with ResNets since only Relu complies g(a[l]) = a[l] ?

Hello!

In lecture “Why ResNets Work?” (Course 4 Week 2) Professor Ng explains that g(a[l]) = a[l] which makes sense, because output of a[l] is ReLU(a[l-1]), which is non negative values. So “double applying” ReLU function would be the same for g(a[l]), but does it mean that we can not apply any other activation function with ResNets (or any activation function g() that doesn`t comply with g(a[l]) = a[l]?

P.S. Putting aside the fact that sigmoid is usually not advised to use.

Hi @Galib_Alili ,

The most commonly used activation functions are:

Sigmoid - output range (0 to 1)

Tanh - output range (-1 to 1)

ReLu - output (0, Max)

As you can see, if we only interested in the positive values of the input, then ReLu fits the bill.

Hi @Kic !

thanks for the reply!

What I meant is for ResNets to work we need a function that should output the “same” value, so even if we apply that function recursively(g(g(x)) we get the same output.

Tanh: tanh(x) function Calculator - High accuracy calculation

Sigmoid: Sigmoid function Calculator - High accuracy calculation

ReLU: ReLU Calculator - High accuracy calculation

İ attach the links for simplicty to check to see what happens when you “double apply” tanh and sigmoid(and relu). It only stays the same with ReLU, but not with sigmoid and tanh

Hi @Galib_Alili ,

As you can see the range of output from different action functions in my first reply. If you “double apply” an activation function to the data(whether that data is input or output of an activation), it will still be limited to the range of values produced by that activation function.

With sigmoid, no mater what the values are given, it will produce values between 0 and 1, nothing else.

For Tanh, no matter what values are given, it will produce values between -1 and 1, and nothing else.

Whist for ReLu, for negative input, it will produce 0 output; for positive input, it will give back the same positive input.

So you can view activation function as a filter, filtering values that are useful to your model.

Below are the different types of activation function: