The L2 regularization changes the cost function by adding the term of scaled Frobenius norm of weights. Then, why, while back propagating, only the formula of dW is getting changed? Since the derivative of every parameter is dependent on the definition of the cost function, dA & dZ should also be affected.
Also, at the same time, their derivative shouldn’t be affected because while differentiating cost function ( J ), the newly added term (Frobenius norm of weights) is a function of weights and partial differentiation w.r.t. A or Z will make that term = 0.
Further, if differentiation of that term w.r.t A or Z is 0, then while back-propagating and applying chain rule to compute dW, the term (lambda/m)*W[l] shouldn’t appear in the equation of dW as well.
I’m just really confused as to how back-propagation equations have been derived with L2 regularization. Kindly explain or share resources from where I can understand it. Any help would be really appreciated!
Thanking you in anticipation.
With best regards,