How does regularization work on layer with activation "relu" in neural network?

From my understanding, “relu” function is same with linear regression except y = 0 with x <= 0. So it will easily lead to high bias instead of variance. Why do we have to set Lambda for it?

These dense layes have neuron units with weights and biases i.e. (wx+b), and then an activation is used. The regularization happens at the weights before the activation is applied, then that value is passed through the ‘relu’.


The ReLU units each generate a little line segment. If you have too many ReLU units, you can get overfitting. Regularization is one way to fix that.

1 Like

Hi @Tram_Nguyen

this thread touches upon ReLU vs. linear regression, too. I believe it could be interesting for you and should clarify potential doubts.

Have a good one!

Best regards