Hi, @aman_kumar.

Now you have two terms in your cost function, J = J_b + J_r. Where J_b is the original cost function and J_r is the L2 regularization term.

Then you calculate \frac{\partial{J}}{\partial{W}} = \frac{\partial{J_b}}{\partial{W}} + \frac{\partial{J_r}}{\partial{W}} = \frac{\partial{J_b}}{\partial{a}} \frac{\partial{a}}{\partial{z}} \frac{\partial{z}}{\partial{W}} + \frac{\partial{J_r}}{\partial{W}}.

That’s where the extra term for `dW`

comes from. As you said, for other parameters the partial derivative of J_r becomes zero, so there is no extra term.

I hope that answers your question