Understanding RELU deeply

Christian_Simonis · February 4, 2023, 6:36am

welcome to the community and thanks for your first post.

You can consider the ReLU as some kind of „filter“ which passes through positives numbers but blocks everything else to zero. The ability of the neural net to describe and learn non-linear characteristics and cause effects is enabled due the combination of many neurons where the non-linearity is emerging from the negative part of the ReLU function. During the training the „best“ parameters (or weights) can be learned to minimize a cost function, see also:

You are right that other non linear activation functions can be used like sigmoid as you mentioned, see also this thread.

When it comes to polynomial activation functions, I guess one drawback here is also that they have large differences in the gradients which results in issues as stated in this thread, e.g. a cubic function would probably tend to explode gradient-wise e.g. if the activation input is too large.

Regarding your question on 2nd degree functions: besides the bad suitability concerning exploding gradients (e.g. if activation input is too large) here also a function like x^2 is non-monotonic (for all real numbers as input). But a good activation function should be monotonic to support an effective training w/ back-prop and a consistent „pull“ towards to actual optimum (e.g. if input increases the output should increase), see also this source.

Best regards
Christian

Topic		Replies	Views
Differences between ReLU and linear for positive values Advanced Learning Algorithms week-2	4	700	January 16, 2023
Relu activation NLP with Probabilistic Models week-2	1	562	March 14, 2023
Why do we need Activation function Neural Networks and Deep Learning	4	544	February 16, 2023
How non linear is ReLU? Neural Networks and Deep Learning	4	762	March 17, 2023
Activation function in NN NLP with Classification and Vector Spaces week-3	3	327	March 30, 2022

Understanding RELU deeply

Related topics