Can I know in what scenario one would prefer LeakyRelu than Relu activation function?

Deepti_Prasad · September 12, 2023, 12:53pm

Can I know when does one prefer LeakyRelu activation function over Relu activation function?

How these both differ from each other?

Although I understand why Relu is most common choice of activation function over sigmoid activation function, I want to know by example or by any model algorithm when LeakyRelu is preferred than Relu activation Function?

Thank you in advance.

Regards
DP

gent.spah · September 12, 2023, 2:29pm

Hi DP,

Leaky relu is good to avoid diminishng weights and gradients in the negative domain, depending on the application the weights might oscilate in the negative region.

And I quote from google

“This type of activation function is popular in tasks where we may suffer from sparse gradients, for example training generative adversarial networks”

Deepti_Prasad · September 12, 2023, 2:39pm

Hello Gent,

From what I know or understand, relu does not give negative domain!!!

and In case if there is negative domain, the choice should be linear activation function.

so are you telling LeakyRelu would be more a combination of these two activation function???

Now I am confused with your answer

Regards
DP

TMosh · September 12, 2023, 2:59pm

From Wikipedia:

So in the left side of the figure, when x is negative, Leaky ReLU gives small negative values (instead of zero which ReLU would give).

Deepti_Prasad · September 12, 2023, 3:58pm

Thank you Both of you, I understood now how it differs from Relu.

Can anyone give me an example how to determine if my model algorithm would require LeakyRelu activation function based on dataset or model architecture.

Thank you in advance!!

Regards
DP

TMosh · September 12, 2023, 4:29pm

I’ve never used Leaky ReLU, so I don’t have any info but what I find online and from previous discussions.

“Require” is too strong a word. There may be benefits, it depends on the data set (if it has lots of features with negative values), and the complexity of the model (minimizes vanishing gradients, so training may be more efficient if you have a deep network).

The negative region slope is another parameter you can tune (0.01 is just a common value, more completely that’s an “alpha”, which you can adjust). So this gives you more work to do, finding the best alpha value.

Since Leaky ReLU units never become dead (as they still provide some useful output for negative values), you may need fewer units than if you are using ReLU.

But since you have another multiplication to implement, training will be more costly with Leaky ReLU.

Experimentation is the best course.

Deepti_Prasad · September 12, 2023, 4:34pm

So basically you are saying if the dataset contains any of the negative value which could be important in training model or any other features, we could go with LeakyRelu??

Cost part you are stating because of the wide range of probability right?? and that would require more training or model architecture to experiment with?

TMosh · September 12, 2023, 4:42pm

Yes for your first question.

For the second, I don’t think probability is involved. It’s just that Leaky ReLU requires different multipliers for positive and negative values, and that’s going to cost more CPU time to compute. So training will consume more computer resources.

Deepti_Prasad · September 12, 2023, 4:47pm

Now I have one more doubt as you and Gent mentioned because of negative value or region slope, we tend to use LeakyRelu, Why didn’t they go for Linear activation function which would cover wide range of domain with variability!!!

Is it because LeakyRelu can get the non-linearity relation between the parameters and has negative values, Leaky value is used??

TMosh · September 12, 2023, 4:56pm

A non-linear function is required in an NN hidden layer. Both ReLU types are non-linear.

Deepti_Prasad · September 12, 2023, 5:02pm

Thank you both of you @gent.spah and @TMosh TMosh for addressing my query post.

I have much better understanding of LeakyRelu Activation function now.

Regards
DP

Topic		Replies	Views
ReLU activation function Neural Networks and Deep Learning coursera-platform	8	868	May 2, 2021
ReLu activation function Vs sigmoid function Neural Networks and Deep Learning coursera-platform	2	564	June 15, 2022
Leaky RELU Activation Function Neural Networks and Deep Learning coursera-platform	2	560	May 25, 2021
Activation function in NN NLP with Classification and Vector Spaces week-module-3	3	348	March 30, 2022
Activation functions Convolutional Neural Networks coursera-platform	3	727	January 4, 2023

Can I know in what scenario one would prefer LeakyRelu than Relu activation function?

Related topics