From my understanding, “relu” function is same with linear regression except y = 0 with x <= 0. So it will easily lead to high bias instead of variance. Why do we have to set Lambda for it?
These dense layes have neuron units with weights and biases i.e. (wx+b), and then an activation is used. The regularization happens at the weights before the activation is applied, then that value is passed through the ‘relu’.
The ReLU units each generate a little line segment. If you have too many ReLU units, you can get overfitting. Regularization is one way to fix that.