NN Regularization for all W parameters

Thala · July 15, 2022, 3:44am

While calculating the cost function, we want to minimalize the error between the function output of the output layer and the target variable.
Then why during regularization, we regularize for all the W in the neural network? Only regularizing the W in the output layer should work just fine, since it is associated with the output.

rmwkwok · July 15, 2022, 4:59am

Hi @Thala,

The idea that it can avoid overfitting is that the regularization term can push the values of the weights to zero which effectively is “disabling” them. Usually we have an overfit problem because our NN is too large (could be using too many layers, or too many nodes per layer). So, if we keep adding layers but always only do regularization on the output layer, the overfit problem is only going to be more serious. To counter the addition of layers, we need regularization in them.

On the other hand, if you work out the maths on a very simple 2-layer NN, and use only regularization on the output layer, you will find that, because the output layer is regularized, its weights are suppressed, but this will cause the weights in the first hidden layer to increase. So we are just “shifting” the weight from the output layer to the first hidden layer. To counter this “shift”, we need regularization in the hidden layer.

The above points are why we want to do regularization on all layers, but I really cannot comment on what is the best strategy to regularize, meaning that, if I have 20 layers, which layer should I regularize and which shouldn’t I - for this I cannot give you a general answer, but you have full regularization power when you regularize all layers.

At the end of the day, it’s the cv dataset evaluation result that we are relying on in choosing the best model (including where to regularize).

Cheers,
Raymond

Thala · July 15, 2022, 5:45am

Thanks. I understood

Just out of curiosity,
Can we obtain the W values for each neuron in a particular layer?
If so, then it should be possible to get the activation vector produced by each layer’s output (and the next layer’s input) . Am I right?

rmwkwok · July 15, 2022, 5:51am

Yes! @Thala, for example, in C2 W1 Lab: Coffee Roasting in Tensorflow, you can get the weights by

Screenshot from 2022-07-15 13-47-29

You can also create a “sub-model” by using the model’s input as the sub-model’s input, but one of the model’s hidden layer’s output as the sub-model’s output. In this case, your sub-model will be producing the output activation vector of that hidden layer.

Example here.

Raymond

Topic		Replies	Views
C2_W3_Assignment, 6 - Regularization, Exercise 5 Advanced Learning Algorithms week-module-3	7	44	June 19, 2025
Questions about regularization Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	6	37	July 13, 2024
Regularization of output layer in neural network Advanced Learning Algorithms week-module-3	3	612	July 26, 2023
Neural network regularization: What does it mean to regularize a hidden layer? Advanced Learning Algorithms week-module-3	4	552	April 3, 2023
Tips on applying regularization AI Discussions	18	107	November 22, 2023

NN Regularization for all W parameters

Related topics