Questions about regularization

paulinpaloalto · July 12, 2024, 9:13pm

Keeping the absolute values of the weights at all layers “suppressed” is a good thing, because even if we use ReLU in the hidden layers, the output layer will be sigmoid (or softmax in the multiclass case) which means we still have to worry about the “flat tails” of the function. When the absolute values of the Z values at the output layer get too large, then the gradients approach zero. The values at all layers contribute to that.

In terms of going deeper into this mathematically, I have not personally tried to do that so I don’t have any direct references that I can give. Here’s a general bibliography thread about textbooks about ML/DL. I’ve heard that the Goodfellow, Bengio et al book is more mathematical. I just checked the ToC and they definitely have a chapter about Regularization.

Topic		Replies	Views
Why regularization reduces overfitting? AI Discussions ai-discussions	2	151	November 26, 2025
C2_W3_Assignment, 6 - Regularization, Exercise 5 Advanced Learning Algorithms week-module-3	7	79	June 19, 2025
NN Regularization for all W parameters Advanced Learning Algorithms week-module-3	3	551	July 15, 2022
Week 1 - why regularization works with ReLu Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	719	January 14, 2022
Course 2, week 1 : Regularization Doubt Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	513	May 12, 2022

Questions about regularization

Related topics