Why using ReLU doesn't result in high bias error?

dimitrimus · December 8, 2021, 5:14pm

Hello everybody!

I’m revising week 1 lecture called “Why Regularization Reduces Overfitting?” where Andrew explains the intuition for tanh activation function and he says:

if the regularization becomes very large, the parameters W very small, so Z will be relatively small
so the activation function will be relatively linear
and so your whole neural network will be computing something not too far from a big linear function which is therefore pretty simple function rather than a very complex highly non-linear function

I wonder, why don’t we have the same effect for ReLU activation function. It’s linear for all z > 0, not only for small values of z.

Thank you!

nramon · April 25, 2022, 9:12am

Hi, @dimitrimus.

It is almost linear, but non-linear enough. I think this (3.1 Rectifier Neurons) is a really good explanation of why it works:

… the only non-linearity in the network comes from the path selection associated with individual neurons being active or not. For a given input only a subset of neurons are active. Computation is linear on this subset: once this subset of neurons is selected, the output is a linear function of the input (although a large enough change can trigger a discrete change of the active set of neurons). The function computed by each neuron or by the network output in terms of the network input is thus linear by parts. We can see the model as an exponential number of linear models that share parameters (Nair and Hinton, 2010).

Hope you’re enjoying the specialization

Topic		Replies	Views
Why is ReLU any better than Linear Neural Networks and Deep Learning coursera-platform	7	899	May 2, 2021
Isn't Relu just a lineer regression function for z>=0 Supervised ML: Regression and Classification week-module-3	6	690	December 24, 2022
Week 1 - why regularization works with ReLu Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	695	January 14, 2022
Relu activation NLP with Probabilistic Models week-module-2	1	565	March 14, 2023
How does regularization work on layer with activation "relu" in neural network? Advanced Learning Algorithms week-module-3	3	666	January 4, 2023

Why using ReLU doesn't result in high bias error?

Related topics