Activation function in NN

Our loss function for neural network works when our output is in the range of 0 and 1, so if we are using ReLu functions then we could get output as 4.5, 6, etc. so this would cause a negative value inside the Log in loss and cost function. This is clearly wrong so how are we using ReLu functions in NN. And according to the ReLu function taught in the lecture, it’s basically a straight line y = x (g(z) = max(0,z) if z>0 ; g(z) = z). So this would just output whatever we input so nothing will change in the neural network, how is this useful.

Hi @Chinmay_Sahoo,

Here are couple of articles on why ReLU is preferred over sigmoid.

Give them a read when you have time.

Cheers,
Mubsi

Even after reading these articles, I’m still not clear. In week 3’s video “why do you need Non-Linear Activation Functions?” Sir Andrew said that if we use linear activation functions then it is useless to have layers in our neural network as irrespective of the number of layers, the output layer is calculating a linear function (WX + b). And if all our Z are strictly positive a ReLU function is a linear function just like y = x or g(z) = z. So why do we still use the ReLU function?

Hi @Chinmay_Sahoo.

And if all our Z are strictly positive a ReLU function is a linear function just like y = x or g(z) = z. So why do we still use the ReLU function?

Because not all our Z are strictly positive. If it were, then yes - ReLU would make no difference and you would be left with just simple linear transformations (no “deep” part). But it simply isn’t! You could imagine half of your weights randomly generated negative and then “training” picks up from here (where ReLU makes half of your nodes values to zeros).

Actually, weights initialization is very important part of NNs (each activation function benefits from different initialization techniques - for ReLU Kaiming initialization is used, but you don’t need to know that as a beginner because it is more important for very deep neural networks and most DL libraries takes care of it). As an excellent gentle introduction (with interactive part) - please check out weights initialization