Week 1: Back propagation with L2 regularization

The L2 regularization changes the cost function by adding the term of scaled Frobenius norm of weights. Then, why, while back propagating, only the formula of dW is getting changed? Since the derivative of every parameter is dependent on the definition of the cost function, dA & dZ should also be affected.
Also, at the same time, their derivative shouldn’t be affected because while differentiating cost function ( J ), the newly added term (Frobenius norm of weights) is a function of weights and partial differentiation w.r.t. A or Z will make that term = 0.
Further, if differentiation of that term w.r.t A or Z is 0, then while back-propagating and applying chain rule to compute dW, the term (lambda/m)*W[l] shouldn’t appear in the equation of dW as well.
I’m just really confused as to how back-propagation equations have been derived with L2 regularization. Kindly explain or share resources from where I can understand it. Any help would be really appreciated!

Thanking you in anticipation.

With best regards,

Hi, @aman_kumar.

Now you have two terms in your cost function, J = J_b + J_r. Where J_b is the original cost function and J_r is the L2 regularization term.

Then you calculate \frac{\partial{J}}{\partial{W}} = \frac{\partial{J_b}}{\partial{W}} + \frac{\partial{J_r}}{\partial{W}} = \frac{\partial{J_b}}{\partial{a}} \frac{\partial{a}}{\partial{z}} \frac{\partial{z}}{\partial{W}} + \frac{\partial{J_r}}{\partial{W}}.

That’s where the extra term for dW comes from. As you said, for other parameters the partial derivative of J_r becomes zero, so there is no extra term.

I hope that answers your question :slight_smile:

1 Like