Activation function in NN

Chinmay_Sahoo · March 28, 2022, 1:26pm

Our loss function for neural network works when our output is in the range of 0 and 1, so if we are using ReLu functions then we could get output as 4.5, 6, etc. so this would cause a negative value inside the Log in loss and cost function. This is clearly wrong so how are we using ReLu functions in NN. And according to the ReLu function taught in the lecture, it’s basically a straight line y = x (g(z) = max(0,z) if z>0 ; g(z) = z). So this would just output whatever we input so nothing will change in the neural network, how is this useful.

Mubsi · March 29, 2022, 7:16am

Hi @Chinmay_Sahoo,

Here are couple of articles on why ReLU is preferred over sigmoid.

Give them a read when you have time.

Cheers,
Mubsi

Chinmay_Sahoo · March 30, 2022, 4:59am

Even after reading these articles, I’m still not clear. In week 3’s video “why do you need Non-Linear Activation Functions?” Sir Andrew said that if we use linear activation functions then it is useless to have layers in our neural network as irrespective of the number of layers, the output layer is calculating a linear function (WX + b). And if all our Z are strictly positive a ReLU function is a linear function just like y = x or g(z) = z. So why do we still use the ReLU function?

arvyzukai · March 30, 2022, 3:45pm

Hi @Chinmay_Sahoo.

And if all our Z are strictly positive a ReLU function is a linear function just like y = x or g(z) = z. So why do we still use the ReLU function?

Because not all our Z are strictly positive. If it were, then yes - ReLU would make no difference and you would be left with just simple linear transformations (no “deep” part). But it simply isn’t! You could imagine half of your weights randomly generated negative and then “training” picks up from here (where ReLU makes half of your nodes values to zeros).

Actually, weights initialization is very important part of NNs (each activation function benefits from different initialization techniques - for ReLU Kaiming initialization is used, but you don’t need to know that as a beginner because it is more important for very deep neural networks and most DL libraries takes care of it). As an excellent gentle introduction (with interactive part) - please check out weights initialization

Topic		Replies	Views
Activation functions in the hidden layers Advanced Learning Algorithms week-2	4	510	July 21, 2022
Differences between ReLU and linear for positive values Advanced Learning Algorithms week-2	4	721	January 16, 2023
Can Linear Regression Use Neural Networks? Structuring Machine Learning Projects	7	564	October 22, 2021
ReLu activation function Vs sigmoid function Neural Networks and Deep Learning	2	555	June 15, 2022
Why do we need Activation function Neural Networks and Deep Learning	4	544	February 16, 2023

Activation function in NN

Related topics