How non linear is ReLU?

1492r · March 11, 2023, 9:48pm

Hi,

It occurred to me recently that compared to sigmoid and tanh, ReLU consists of two linear parts (> and <0).
Can I accept the intuition that ReLU is “more linear” than sigmoid and tanh?
If I can, I understand that non-linear activation is needed to explore more subtle and non-linear boundaries. For ReLU though, it is in fact linear when X stays below 0 or reaches above 0. Wouldn’t that mean some loss (complete loss in some cases?) of the non-linear component from the model?
Thanks!

carlosrl · March 11, 2023, 10:36pm

Hi @1492r ,
I would say that yes, because the ReLU function is a piecewise linear activation function, while the sigmoid and tanh functions are smooth and non-linear.
However, we need to take care of the “linearity” of ReLu, which has some downsides as well. For instance, as you did mention, ReLU can “kill” or deactivate a neuron by setting its output to zero, if its input is negative. This can cause sparsity in the neural network, which can lead to underfitting.
So, the real answer, IMHO, is that it depends on the specific details of your problem.
Keep learning!

rmwkwok · March 12, 2023, 12:51am

Hello @1492r

Let me provide a different perspective of looking into ReLU. To begin with, I seldom measure “the degree of linearity” of ReLU. Instead, if someone asked me about how well ReLU is as an activation function, picture like the below would show up in my mind

The upper red line is the real data that I want to model, whereas the bottom green line is modeled by a layer of 9 nodes backed with the ReLU activation.

ReLU gives us the “curve” (linear piecewise function) with 9 sections of line. It is linear piecewise because it inherits from ReLU which is also linear piecewise. Therefore, to me, ReLU is cool because we can model any curve in a linear piecewise manner, and then I feel satisfied and will forget about how linear ReLU might look.

Cheers,
Raymond

Christian_Simonis · March 12, 2023, 6:05am

Hi @1492r

You can consider the ReLU function as some kind of „filter“ which passes through positives numbers but blocks everything else to zero. The ability of the neural net to describe and learn non-linear characteristics and cause effects is enabled due the combination of many neurons where the non-linearity is emerging from the „transition to negative part“ of the ReLU function. During the training the „best“ parameters (or weights) can be learned to minimize a cost function, see also this tread:

Understanding RELU deeply - #2 by Christian_Simonis

Best regards
Christian

1492r · March 17, 2023, 8:53pm

I see. Thank you guys for the explanation!

Topic		Replies	Views
Understanding RELU deeply Neural Networks and Deep Learning coursera-platform	6	906	February 5, 2023
C1_W3-Non-Linear_Activation_Function Neural Networks and Deep Learning coursera-platform	1	550	May 18, 2021
Why is ReLU any better than Linear Neural Networks and Deep Learning coursera-platform	7	882	May 2, 2021
Relu activation NLP with Probabilistic Models week-module-2	1	565	March 14, 2023
Choice of activation function Advanced Learning Algorithms week-module-2	7	688	November 21, 2022

How non linear is ReLU?

Related topics