Question on how regularization helps by making networks closer to linear

Sara · April 25, 2021, 3:59am

In week2 Andrew mentioned that one way in which Frobenius regularization helps is that they make w small and z closer to zero. This makes the layer closer to linear when the activation function is tanh.

My question is that we don’t just apply Frobenius regularization for layers with tanh activation functions. When we use sigmoid, small values would result in a non-linear layer. Wouldn’t that have the opposite affect of what we want?

paulinpaloalto · April 25, 2021, 9:56pm

The shapes of tanh and sigmoid are pretty similar. In fact you can show that tanh is just a scaled and shifted sigmoid. They both have quasi-linear regions in the center of the graphs (for input values near 0). But I don’t think the point of regularization is necessarily to get us into relatively linear regions of the various activation functions. The point of suppressing the values of the weights is that it prevents any one input (at any given layer) from having an outsized influence on the results. It “regularizes” things by “evening out” the influences of the various neurons. I believe that this is the intuition for why regularization prevents overfitting: it prevents any one input at a given layer from dominating the results by having a particularly large coefficient value (corresponding weight).

Topic		Replies	Views
Week 1 - why regularization works with ReLu Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	693	January 14, 2022
Week 1 regularization justification - b isn't small, why ignore it? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	509	November 26, 2022
Why using ReLU doesn't result in high bias error? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	557	April 25, 2022
Course 2, week 1 : Regularization Doubt Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	505	May 12, 2022
Questions about regularization Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	6	37	July 13, 2024

Question on how regularization helps by making networks closer to linear

Related topics