C2_W3_Assignment, 6 - Regularization, Exercise 5

jiikoo · December 11, 2024, 3:57pm

https://www.coursera.org/learn/advanced-learning-algorithms/programming/ygSag/practice-lab-advice-for-applying-machine-learning

When creating the neural network model it’s not allowed to use regularization for the output layer (layer three). Regularization is only used for the first and second layer.
When we are calculating the cost, we include all the parameters W of all the layers. Why don’t we also use regularization for all the parameters W?

TMosh · December 11, 2024, 9:38pm

Regularization is applied to the weights, not the layers.
The weights are what connect the layers.

So if you have an input layer, a hidden layer, and an output layer, there are two sets of weights. Both of them can include regularization.

rmwkwok · December 12, 2024, 1:57am

Hello @jiikoo,

If I were you, I would first position myself to ask a question: “What if I regularize the output layer?”.

I know, I know, that the assignment wouldn’t allow us to do it because doing so would fail a test, but if we forget about the test, we actually can, can’t we? So I have done some very quick experiments (and so can you!) and my results below:

So, by comparing the 1st & the 2nd results, we know that regularizing the output layer didn’t do any good, but if we tune the lambda value (which we always have to), as in the 3rd result, it wasn’t so bad when comparing to the 1st!

Therefore, the messages I would get from these results are that: (1) we can do it, only we need to jump out of the box (assignment)! (2) we always need to tune lambda for the best result.

Futher on (1), if you remember when we first learned about regularization in previous week, we were doing it to output-layer-only logistic regression, right? In other words, we did apply it to the output layer, because logistic regression only has an output layer and that didn’t cause any problem. Right? So, we must tell ourselves that it is not absolutely wrong to regularize an output layer.

However, though my 3rd result wasn’t bad, it wasn’t either better (than the 1st)! Of course, I only spent like 2 minutes on these experiments, and I wouldn’t be surprised if you could find the better lambda value.

Still, would there be any argument for me to skip regularizing the output layer? Yes! There are two reasons:

Previous layers are already properly and sufficiently (with the right lambda) regularized to counter the overfitting effect. (Note: in output-layer-only logistic regression, there is no previous layer, so the output layer is our only choice for regularization.)
While L2-regularization tends to make weights in the output layer smaller, our friend softmax may try to make some weights larger! You see - L2-regularization always penalizes (large) weight values, but softmax has to make sure the outcome from the output layer to add up to one! In other words, softmax won’t allow regularization to push the weights to too-small or even to just small-enough. In short, softmax can fight L2-regularization!

How can I avoid them fighting each other? I relax it by not using L2-regularization on one of the layers. That does not have to be the output layer, in fact, if I choose only to regularize the first and the output layer, it, again, wasn’t so bad (comparing the 2nd & the 4th results, and comparing the 1st & the 4th):

So, it’s like, I let L2-regularization do its best to two of the layers, while even though softmax acts on all layers, it can act more on the unregularized one to fulfill its ultimate goal of “adding up to 1” and give the other two a chance. Again, I only spent 20 seconds on my above 4th result, so it may still be just suboptimal and it wouldn’t be suprising to find better lambda value for just regularizing Layer 1 and Layer 3.

Having said that relaxing one layer would be my strategy to avoid them fighting each other, I will still choose to relax the output layer rather than the middle layer but my reason would be related to back-propagation which is a critical concept in deep learning but unfortunately not yet fully covered in MLS (but is covered in DLS - the Deep Learning Specialization), so I would choose to skip the reason for this discussion.

So far, I might have gone through a lot for this discussion so it is natural if some readers might want to read it multiple times or even do some experiments like mine to better grasp my ideas, but here are my key take-aways:

nothing absolutely wrong about regularizing an output layer
regularizing or not, tuning for the best lambda value is a practice - we don’t say regularizing it is bad, we say regularizing it without being able to find a set of good lambda values is bad.

Cheers!
Raymond

rmwkwok · December 12, 2024, 2:14am

Oh! Even though we can forget about the test for whatever adventure we want to be in, when we are back in the reality - making a submission, we need to undo any unexpected changes and pass all the tests - as a mentor, I need to remind all readers of my post

jiikoo · December 12, 2024, 6:44am

Thank you very much for explaining this (complex) topic! I actually tried with regularization in all layers with lambda 0.1 and wondered the big errors. This clarifies things a bit and it’s apparently that softmax function to blame .

rmwkwok · December 12, 2024, 7:08am

Cool! Nice try!

shashanknyol · June 17, 2025, 4:49pm

Thanks a lot sir! A really comprehensive and inspiring answer to try things ourselves!

rmwkwok · June 19, 2025, 9:41pm

You are welcome, @shashanknyol

Raymond

Topic		Replies	Views
Regularization of output layer in neural network Advanced Learning Algorithms week-module-3	3	612	July 26, 2023
Why not specify a regularizer for the output layer Advanced Learning Algorithms week-module-3	2	518	August 14, 2022
NN Regularization for all W parameters Advanced Learning Algorithms week-module-3	3	528	July 15, 2022
Questions about regularization Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	6	37	July 13, 2024
Tips on applying regularization AI Discussions	18	107	November 22, 2023

C2_W3_Assignment, 6 - Regularization, Exercise 5

https://www.coursera.org/learn/advanced-learning-algorithms/programming/ygSag/practice-lab-advice-for-applying-machine-learning

Related topics