if alll the Z values in the neural network is bigger than 0, then g(z)=0 for Relu. Am I rigth?
If so, then relu is nothing but lineer regression . So why do Professor Andrew still advice us using Relu as activation function for hidden layers ?

thank you for your interest. Yes. I mistyped it. I meant if z values are bigger than 0 then g(z)=z.
so for z>=0 g(z) is a lineer regression function. Am I right?

There are 2 ways that we can model any function: #1. Using a single mathematical equation #2. Using multiple piece-wise linear equations

#1 can be achieved by using a Linear Regression equation in the simplest case, and using n-degree Polynomials in the regression equation for the more convoluted case.

#2. Can be achieved by using a Neural Network with hidden layers and using ReLU as the activation. The ReLU is non-linear over its entire range (-\infty, +\infty). We use it in a switch-off mode in the range (-\infty, 0) and linear in the range (0, +\infty). Thus, by controlling the switch-off region and the linear region of each of the neurons in the hidden layers, the Neural network is able to stitch together the piece-wise linear approximation of the target function.

And, the answer to your specific question is: The ReLU is not linear. We use both the linear and non-linear behaviour of the ReLU to serve our specific purpose.

Hey @mehmet_baki_deniz,
I would like to add a little something to @shanup’s explanation. First of all, z >= 0 is not something that always happen, or that you can account for, so for some neurons z < 0 and for some z >= 0, and hence, as per ReLU’s definition, ReLU becomes a non-linear activation as Shanup pointed it out.

Other than that, even if we consider the hypothetical scenario, in which z >= 0 for all neurons, and since we have a single feature, hence, the linear regression formulation that we will be used is w^Tz + b; even in this case, ReLU can only be equivalent to linear regression if w = b = 1.

A straight line in 2D can be represented by a linear regression model, it’s True; but a straight line in 2D is equivalent to a linear regression model, that’s not True. I hope this helps.

in addition to the answers:
Where as in a linear regression model, parameters are fitted (and e.g. the gradient is not equal to 1 and as @Elemento pointed out this can be the case in an n-dimensional space)… ReLU has a clear definition as a function of one parameter, which passes through positives numbers (gradient = 1 since y =1x) but blocks everything else to zero. The ability of the neural net to describe and learn non-linear characteristics and cause effects is enabled due the combination of many neurons where the non-linearity is emerging from the negative part of the ReLU function. During the training the „best“ parameters (or weights) can be learned to minimize a cost function.

Since many neurons are assigned with bias and weights, linked with an activation function, by combination of multiple neurons (as the neural net in total does) this allows to learn highly nonlinear behavior, although the activation function of one neuron itself possesses only a piecewise linear activation function in case of ReLU, see also:

I thank all the contributors to this thread. Your explanations cleared my misconception on the algorithm. And as a side note, this is such a great community of people helping each other.I hope someday in future I will pay my share by helping other strangers to learn machine learning