Isn't Relu just a lineer regression function for z>=0

mehmet_baki_deniz · December 24, 2022, 12:04pm

hi everybody,

if alll the Z values in the neural network is bigger than 0, then g(z)=0 for Relu. Am I rigth?
If so, then relu is nothing but lineer regression . So why do Professor Andrew still advice us using Relu as activation function for hidden layers ?

Elemento · December 24, 2022, 12:17pm

Hey @mehmet_baki_deniz,
I assume you meant to say smaller than 0, instead of bigger than 0?

mehmet_baki_deniz · December 24, 2022, 12:20pm

thank you for your interest. Yes. I mistyped it. I meant if z values are bigger than 0 then g(z)=z.
so for z>=0 g(z) is a lineer regression function. Am I right?

shanup · December 24, 2022, 12:24pm

Hello @mehmet_baki_deniz

There are 2 ways that we can model any function:
#1. Using a single mathematical equation
#2. Using multiple piece-wise linear equations

#1 can be achieved by using a Linear Regression equation in the simplest case, and using n-degree Polynomials in the regression equation for the more convoluted case.

#2. Can be achieved by using a Neural Network with hidden layers and using ReLU as the activation. The ReLU is non-linear over its entire range (-\infty, +\infty). We use it in a switch-off mode in the range (-\infty, 0) and linear in the range (0, +\infty). Thus, by controlling the switch-off region and the linear region of each of the neurons in the hidden layers, the Neural network is able to stitch together the piece-wise linear approximation of the target function.

And, the answer to your specific question is: The ReLU is not linear. We use both the linear and non-linear behaviour of the ReLU to serve our specific purpose.

Elemento · December 24, 2022, 12:43pm

Hey @mehmet_baki_deniz,
I would like to add a little something to @shanup’s explanation. First of all, z >= 0 is not something that always happen, or that you can account for, so for some neurons z < 0 and for some z >= 0, and hence, as per ReLU’s definition, ReLU becomes a non-linear activation as Shanup pointed it out.

Other than that, even if we consider the hypothetical scenario, in which z >= 0 for all neurons, and since we have a single feature, hence, the linear regression formulation that we will be used is w^Tz + b; even in this case, ReLU can only be equivalent to linear regression if w = b = 1.

A straight line in 2D can be represented by a linear regression model, it’s True; but a straight line in 2D is equivalent to a linear regression model, that’s not True. I hope this helps.

Cheers,
Elemento

Christian_Simonis · December 24, 2022, 1:11pm

Hi there,

in addition to the answers:
Where as in a linear regression model, parameters are fitted (and e.g. the gradient is not equal to 1 and as @Elemento pointed out this can be the case in an n-dimensional space)… ReLU has a clear definition as a function of one parameter, which passes through positives numbers (gradient = 1 since y =1x) but blocks everything else to zero. The ability of the neural net to describe and learn non-linear characteristics and cause effects is enabled due the combination of many neurons where the non-linearity is emerging from the negative part of the ReLU function. During the training the „best“ parameters (or weights) can be learned to minimize a cost function.

Since many neurons are assigned with bias and weights, linked with an activation function, by combination of multiple neurons (as the neural net in total does) this allows to learn highly nonlinear behavior, although the activation function of one neuron itself possesses only a piecewise linear activation function in case of ReLU, see also:

Best regards
Christian

mehmet_baki_deniz · December 24, 2022, 1:31pm

I thank all the contributors to this thread. Your explanations cleared my misconception on the algorithm. And as a side note, this is such a great community of people helping each other.I hope someday in future I will pay my share by helping other strangers to learn machine learning

Topic		Replies	Views
Relu activation NLP with Probabilistic Models week-module-2	1	565	March 14, 2023
Differences between ReLU and linear for positive values Advanced Learning Algorithms week-module-2	4	729	January 16, 2023
Is the Relu function not a linear function? Advanced Learning Algorithms week-module-2	5	533	August 20, 2022
Choice of activation function Advanced Learning Algorithms week-module-2	7	687	November 21, 2022
How non linear is ReLU? Neural Networks and Deep Learning coursera-platform	4	799	March 17, 2023

Isn't Relu just a lineer regression function for z>=0

Related topics