From my understanding, “relu” function is same with linear regression except y = 0 with x <= 0. So it will easily lead to high bias instead of variance. Why do we have to set Lambda for it?
These dense layes have neuron units with weights and biases i.e. (wx+b), and then an activation is used. The regularization happens at the weights before the activation is applied, then that value is passed through the ‘relu’.
2 Likes
The ReLU units each generate a little line segment. If you have too many ReLU units, you can get overfitting. Regularization is one way to fix that.
1 Like
Hi @Tram_Nguyen
this thread touches upon ReLU vs. linear regression, too. I believe it could be interesting for you and should clarify potential doubts.
Have a good one!
Best regards
Christian