Week 1: Back propagation with L2 regularization

aman_kumar · June 22, 2021, 9:14pm

Hi,
The L2 regularization changes the cost function by adding the term of scaled Frobenius norm of weights. Then, why, while back propagating, only the formula of dW is getting changed? Since the derivative of every parameter is dependent on the definition of the cost function, dA & dZ should also be affected.
Also, at the same time, their derivative shouldn’t be affected because while differentiating cost function ( J ), the newly added term (Frobenius norm of weights) is a function of weights and partial differentiation w.r.t. A or Z will make that term = 0.
Further, if differentiation of that term w.r.t A or Z is 0, then while back-propagating and applying chain rule to compute dW, the term (lambda/m)*W[l] shouldn’t appear in the equation of dW as well.
I’m just really confused as to how back-propagation equations have been derived with L2 regularization. Kindly explain or share resources from where I can understand it. Any help would be really appreciated!

Thanking you in anticipation.

With best regards,
Aman

nramon · June 24, 2021, 1:27pm

Hi, @aman_kumar.

Now you have two terms in your cost function, J = J_b + J_r. Where J_b is the original cost function and J_r is the L2 regularization term.

Then you calculate \frac{\partial{J}}{\partial{W}} = \frac{\partial{J_b}}{\partial{W}} + \frac{\partial{J_r}}{\partial{W}} = \frac{\partial{J_b}}{\partial{a}} \frac{\partial{a}}{\partial{z}} \frac{\partial{z}}{\partial{W}} + \frac{\partial{J_r}}{\partial{W}}.

That’s where the extra term for dW comes from. As you said, for other parameters the partial derivative of J_r becomes zero, so there is no extra term.

I hope that answers your question

Topic		Replies	Views
dW for L2 regularization Improving Deep Neural Networks: Hyperparameter tun	5	724	November 5, 2024
A doubt on derivative Improving Deep Neural Networks: Hyperparameter tun	14	411	August 15, 2023
Regularization derivative for L2 norm Improving Deep Neural Networks: Hyperparameter tun week-1	1	292	March 16, 2024
Week 1 - Doubt in the Math Improving Deep Neural Networks: Hyperparameter tun	3	551	May 21, 2022
Backpropagation when using dropout and Regularization Improving Deep Neural Networks: Hyperparameter tun	5	601	February 11, 2022

Week 1: Back propagation with L2 regularization

Related topics